Cost of short read RNA sequencing
Tags: Experiment Costs, RNAseq, short-read, Data assets
Published:
Why do you do this experiment?
Sequencing RNA enables the identification and quantification of RNA expressed in a cell or a sample (the transcriptome).
Input 100k-1M Live cells, FFPE, frozen cells or 25-250ng RNA
Output Fastq file (20-100M PE reads) -> Gene expression
Strategic Value
By comparing multiple samples, we know the effect of perturbations (drug, disease, knock-out, etc) on the transcriptome of the cell. This can be used to understand gene regulation, how a drug works, or which processes a disease affects.
RNAseq provides the sequence of all expressed genes, meaning variants (e.g. SNPs, gene fusions) can be called but coverage will be biased towards highly expressed genes. In the context of cancer and with deep enough RNAseq, sub-clonal exonic mutations can be detected for most genes.
Cost & Scale
- Variable per run: \$150/sample. Range: \$118 - \$236
- Cost breakdown:
- RNA extraction: \$56
- Short read library preparation: \$50
- Sequencing (20-100M reads, 4-30Gb): \$12-\$120
- Capex: Thermocycler (\$10-20k), TapeStation (\$6-30k), NGS Sequencer (\$50k-1M)
Experimental Modules
- RNA extraction (2h30, 40’ hands-on)
- Sequencing library preparation (6h, 2h hands-on)
- Sequencing run (4-24h depending on the sequencer)
Ops & Throughput
Turnaround: 3+ days (day 1 extraction, day 2 library prep, day 3 or later sequencing)
Hands-on time: 4h
Parallelizability: High. All steps can be done in parallel for as many samples as needed.
Bottlenecks: availability of Tapestation (16 lanes) and thermocycler (96 wells).
Batching: 1 to 16 samples per technician.
Automation readiness: Full, with commercial solutions available.
Outsourceability: Yes.
Data scale: 20-100M reads/sample, ~30Gb/sample
Data API
Raw format: FASTQ Processed format: count matrix Resolution: gene-level expression, single nucleotide variant
Analysis Ecosystem
- QC and cleaning
- Alignement:
- STAR aligner
- bowtie2
- kallisto: Transcript quantification via pseudo-alignement
- Salmon: Transcript quantification via quasi-alignement
- Gene expression quantification:
- htseq-count: Gene-read overlap counts
- Differential expression
Public datasets
- The Cancer Genome Atlas (TCGA): RNAseq (2x50bp) and WES for more than 20k tumors
- Genotype-Tissue Expression (GTEx): RNAseq from all major organs from >700 individuals
- Gene Expression Omnibus (GEO): Repository of sequencing data from publications
- European Nucleotide Archive (ENA): Repository of sequencing data from publications
- recount3: data from TCGA and GTEx reprossed with a uniform pipeline See also this list
Pitfalls & Failure Modes
- Don’t skip the ribo-depletion or polyA enrichment step, they represent most of the extration cost but are there for a reason. >90%[^1] of RNA in a cell are rRNA or tRNA. Sequencing total RNA from a cell without size selection with short read sequencing would yield around 70% of rRNA reads and 15% of tRNA reads which are not very interesting populations (unless you look at base modifications, which is not done in short read). With the cheap cost of sequencing nowdays you should systematically go for ribo-depletion over polyA. Batch correction can integrate your ribo-depleted data with a polyA cohorts without problems.
- Most protocols for RNAseq are optimized for the extraction of RNA longer than 20bp and will size select the sequencing library to 300-500bp. This will exclude small RNA populations (tRNA, miRNA, snoRNA, etc). If you are interested in those populations use dedicated kit (e.g Qiagen miRNAeasy) and remove the size selection steps.
[^1] https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2015.00002/full
Related publications
- NEB RNA protocol (section 4)
Order list
Plenty of suppliers exist for this kind of protocol and you can mostly mix an match suppliers to your liking for each step. I used NEB as a convenient example as their documentation is quite clear.
Protocol variations
- RNA extraction should yield 10-30pg of RNA/cell
- Ultra-low-input protocols based on direct reverse transcription enable RNAseq from as low as 10 cells input (e.g from Thermo-Fischer).
This post is part of a series on the cost of experiments. All costs are orders of magnitude and are susceptible to have changed between the post and your order date. All costs assume you perform the whole pipeline in house and do not include labor costs. For outsourcing a decent first estimate is to double the indicated costs. Cheap consumables are not always included if they affect less than 1% of the cost. Always check the protocols coming with the kits for the complete list of consumables to order.