The analysis of single-cell RNA-Seq data involves a series of pre-processing steps that include: (1) association of reads with their cells of origin, (2) collapsing of reads according to unique molecular identifiers (UMIs), and (3) generation of feature counts from the reads to generate a feature-cell matrix.
We recently introduced the BUS file format for single-cell RNA-seq data to facilitate the development of modular workflows for data pre-processing. It consists of a binary representation of barcode and UMI sequences from scRNA-seq reads, along with sets of equivalence classes obtained by pseudoalignment of reads to a reference transcriptome (hence the acronym Barcode, UMI, Set). We have implemented a command in kallisto called
bus that allows for the efficient generation of BUS format from any single-cell RNA-seq technology. Tools for manipulating BUS files are provided as part of the bustools package.
This website provides tutorials and workflows to learn how to use the kallisto and bustools programs together to perform single-cell RNA-seq pre-processing. We suggest beginning with the Getting Started page. See the kallisto and applications Google group for answers to frequently asked questions. The kallisto | bustools workflow is described in detail in
Páll Melsted, A. Sina Booeshaghi, Fan Gao, Eduardo Beltrame, Lambda Lu, Kristján Eldjárn Hjorleifsson, Jase Gehring and Lior Pachter, Modular and efficient pre-processing of single-cell RNA-seq, bioRxiv, 2019.