This page provides instructions for how to pre-process the mouse retinal cells SRR8599150 dataset from Koren et al., 2019 using the kallisto | bustools workflow.

Note: command line arguments are preceeded by$. For example, if you see $ cd my_folder then type cd my_folder.

0. Install kb

Install kb from PyPi with pip:

$ pip install kb-python

1. Download the materials

Prepare a folder:

$ mkdir kallisto_bustools_getting_started/; cd kallisto_bustools_getting_started/

Download the following files:

  • Read 1 fastq file SRR8599150_S1_L001_R1_001.fastq.gz
  • Read 2 fastq file SRR8599150_S1_L001_R2_001.fastq.gz
$ wget https://github.com/bustools/getting_started/releases/download/getting_started/SRR8599150_S1_L001_R1_001.fastq.gz
$ wget https://github.com/bustools/getting_started/releases/download/getting_started/SRR8599150_S1_L001_R2_001.fastq.gz

2. Download the index

Download the pre-built mouse index using kb.

$ kb ref -d mouse -i index.idx -g transcripts_to_genes.txt

3. Generate count matrices

The following command will

  1. Pseudoalign the reads into a BUS file.
  2. Correct, sort, and count the BUS file into a gene count matrix.
$ kb count -i index.idx -g transcripts_to_genes.txt -x 10xv2 -o output SRR8599150_S1_L001_R1_001.fastq.gz SRR8599150_S1_L001_R2_001.fastq.gz

Note: kb can convert the final count matrix to a loom file by setting the --loom flag and a h5ad file by setting the --h5ad flag.

4. Load the count matrices into a notebook

See this python notebook for how to load the count matrices into ScanPy for analysis.