This tutorial describes how to aggregate multiple count matrices by concatenating them into a single AnnData object with batch labels for different samples.
This is similar to the Cell Ranger aggr function, however no normalization is performed. cellranger aggr is described at https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/aggregate
Note: See this notebook for a tutorial on how to build custom transcriptome or RNA velocity indices.
1
2
%%time
!kb ref -d human -i index.idx -g t2g.txt
[2020-01-14 22:17:40,464] INFO Downloading files for human from https://caltech.box.com/shared/static/v1nm7lpnqz5syh8dyzdk2zs8bglncfib.gz to tmp/v1nm7lpnqz5syh8dyzdk2zs8bglncfib.gz[2020-01-14 22:19:31,668] INFO Extracting files from tmp/v1nm7lpnqz5syh8dyzdk2zs8bglncfib.gzCPU times: user 578 ms, sys: 77.4 ms, total: 655 msWall time: 2min 32s
The following command will generate an RNA count matrix of cells (rows) by genes (columns) in H5AD format, which is a binary format used to store Anndata objects. Notice we are providing the index and transcript-to-gene mapping we downloaded in the previous step to the -i and -g arguments respectively. Also, these reads were generated with the 10x Genomics Chromium Single Cell v2 Chemistry, hence the -x 10xv2 argument. To view other supported technologies, run kb --list.
The --filter flag is used to filter out barcodes with low UMI counts. This will generate two matrices, one in the counts_unfiltered directory and another in the counts_filtered directory.
Note: If you would like a Loom file instead, replace the --h5ad flag with --loom. If you want to use the raw matrix output by kb instead of their H5AD or Loom converted files, omit these flags.
[2020-01-14 22:55:56,693] INFO Skipping kallisto bus because output files already exist. Use the --overwrite flag to overwrite.[2020-01-14 22:55:56,693] INFO Sorting BUS file sample1/output.bus to tmp/output.s.bus[2020-01-14 22:57:31,354] INFO Whitelist not provided[2020-01-14 22:57:31,354] INFO Copying pre-packaged 10XV2 whitelist to sample1[2020-01-14 22:57:35,347] INFO Inspecting BUS file tmp/output.s.bus[2020-01-14 22:57:47,155] INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist sample1/10xv2_whitelist.txt[2020-01-14 22:58:13,462] INFO Sorting BUS file tmp/output.s.c.bus to sample1/output.unfiltered.bus[2020-01-14 22:58:48,480] INFO Generating count matrix sample1/counts_unfiltered/cells_x_genes from BUS file sample1/output.unfiltered.bus[2020-01-14 22:59:01,129] INFO Converting matrix sample1/counts_unfiltered/cells_x_genes.mtx to h5ad sample1/counts_unfiltered/adata.h5ad[2020-01-14 22:59:11,951] INFO Filtering with bustools[2020-01-14 22:59:11,952] INFO Generating whitelist sample1/filter_barcodes.txt from BUS file sample1/output.unfiltered.bus[2020-01-14 22:59:12,274] INFO Capturing records from BUS file sample1/output.unfiltered.bus to tmp/output.filtered.bus with capture list sample1/filter_barcodes.txt[2020-01-14 22:59:15,831] INFO Sorting BUS file tmp/output.filtered.bus to sample1/output.filtered.bus[2020-01-14 22:59:52,828] INFO Generating count matrix sample1/counts_filtered/cells_x_genes from BUS file sample1/output.filtered.bus[2020-01-14 23:00:03,942] INFO Converting matrix sample1/counts_filtered/cells_x_genes.mtx to h5ad sample1/counts_filtered/adata.h5adCPU times: user 1.24 s, sys: 161 ms, total: 1.4 sWall time: 4min 16s
[2020-01-14 23:00:13,871] INFO Skipping kallisto bus because output files already exist. Use the --overwrite flag to overwrite.[2020-01-14 23:00:13,871] INFO Sorting BUS file sample2/output.bus to tmp/output.s.bus[2020-01-14 23:01:14,475] INFO Whitelist not provided[2020-01-14 23:01:14,475] INFO Copying pre-packaged 10XV2 whitelist to sample2[2020-01-14 23:01:14,681] INFO Inspecting BUS file tmp/output.s.bus[2020-01-14 23:01:21,144] INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist sample2/10xv2_whitelist.txt[2020-01-14 23:01:45,237] INFO Sorting BUS file tmp/output.s.c.bus to sample2/output.unfiltered.bus[2020-01-14 23:01:54,119] INFO Generating count matrix sample2/counts_unfiltered/cells_x_genes from BUS file sample2/output.unfiltered.bus[2020-01-14 23:01:59,981] INFO Converting matrix sample2/counts_unfiltered/cells_x_genes.mtx to h5ad sample2/counts_unfiltered/adata.h5ad[2020-01-14 23:02:03,635] INFO Filtering with bustools[2020-01-14 23:02:03,635] INFO Generating whitelist sample2/filter_barcodes.txt from BUS file sample2/output.unfiltered.bus[2020-01-14 23:02:03,803] INFO Capturing records from BUS file sample2/output.unfiltered.bus to tmp/output.filtered.bus with capture list sample2/filter_barcodes.txt[2020-01-14 23:02:05,366] INFO Sorting BUS file tmp/output.filtered.bus to sample2/output.filtered.bus[2020-01-14 23:02:12,500] INFO Generating count matrix sample2/counts_filtered/cells_x_genes from BUS file sample2/output.filtered.bus[2020-01-14 23:02:17,853] INFO Converting matrix sample2/counts_filtered/cells_x_genes.mtx to h5ad sample2/counts_filtered/adata.h5adCPU times: user 626 ms, sys: 82.7 ms, total: 709 msWall time: 2min 7s