This tutorial provides instructions for how to generate a transcriptome index to use with kallisto | bustools using kb.

Note: for the instructions, command line arguments are preceeded by$. For example, if you see $ cd my_folder then type cd my_folder.

0. Download materials

Prepare a folder

$ mkdir transcriptome_index/; cd transcriptome_index/

Download the genomic (DNA) FASTA and GTF annotations for your desired organism from the database of your choice. This tutorial uses mouse reference files downloaded from Ensembl.

$ wget ftp://ftp.ensembl.org/pub/release-98/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
$ wget ftp://ftp.ensembl.org/pub/release-98/gtf/mus_musculus/Mus_musculus.GRCm38.98.gtf.gz

Extract the files

$ gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
$ gunzip Mus_musculus.GRCm38.98.gtf.gz

1. Build the index

kb automatically splits the genome into a cDNA FASTA file and uses that to build a kallisto index.

$ kb ref -i transcriptome.idx -g transcripts_to_genes.txt -f1 cdna.fa Mus_musculus.GRCm38.dna.primary_assembly.fa Mus_musculus.GRCm38.98.gtf

2. Align your reads and generate a count matrix

See this tutorial for how to proceed.