I'd like to thank all co-authors for their valuable input to this project James Lancaster, @plbaldoni.bsky.social , @qgouil.bsky.social , @lewseylab.bsky.social , @mritchieau.bsky.social , @nadia-davidson.bsky.social .
We've also made the transcriptome assemblies generated from this work publicly available (zenodo.org/records/1753...), and the QC pipeline (github.com/alexyfyf/den...). Feel free to use them for your own research.
Practical takeaway: among de novo pipelines we tested, RNA-Bloom2 + Corset (transcript clustering) performed best overall for quantification accuracy, computational efficiency and differential gene & transcript analysis.
Key result: long reads can yield longer transcripts for reference-free analysis, but there are still clear limitations vs reference-guided approaches (computation burden, error profile, and redundancy remain challenges).
We evaluated in realistic scenarios: sequencing depths 6–60M reads, platforms including ONT cDNA (sequin spike-ins), ONT direct RNA, and PacBio 10× single-cell, and multiple species (human and Pisum sativum)
What we did: benchmarked long-read de novo assemblers (RATTLE, RNA-Bloom2, isONform) and compared to short-read (Trinity) and hybrid approaches (rnaSPAdes, RNA-Bloom2-hybrid) across multiple datasets and metrics.
Why this matters: long-read de novo transcriptome assembly is developing quickly, but benchmarking and best-practice guidance has lagged behind.
I'm so glad to share that my paper from
@nadia-davidson.bsky.social
lab on the evaluation of long-read de novo transcriptome assembly is finally online. If you’re doing reference-free long-read RNA-seq, this one’s for you 👇 link.springer.com/article/10.1...