open access publication here www.nature.com/articles/s41... @nanoporetech.com @pacbio.bsky.social
20.07.2025 23:35 β π 0 π 0 π¬ 0 π 0@isidrolauscher.bsky.social
open access publication here www.nature.com/articles/s41... @nanoporetech.com @pacbio.bsky.social
20.07.2025 23:35 β π 0 π 0 π¬ 0 π 0Kudos to Sonia Zumalave in my lab for working out how to flag and remove such fold-back-like artifacts, with key contributions from @hbelrick.bsky.social @carolinmsa.bsky.social and @jevalleinclan.bsky.social. Thread on our algorithm bsky.app/profile/isid... and ..
20.07.2025 23:35 β π 1 π 0 π¬ 1 π 0Very glad to see this preprint by @lh3lh3.bsky.social and Meyerson labs www.biorxiv.org/content/10.1... confirming our finding of artifactual fold-back inv in long reads (Fig S1 in our @natmethods.nature.comβ¬ paper presenting SAVANA, which filters such artifacts to improve SV calling π
20.07.2025 23:35 β π 3 π 1 π¬ 1 π 0Work by researchers in the group of @isidrolauscher.bsky.social at EMBL-EBI, the R&D lab of @genomicsengland.bsky.social, in collaboration with clinical partners at @ucl.ac.uk, Royal National Orthopaedic Hospital, Instituto de Medicina Molecular JoΓ£o Lobo Antunes, and Boston Childrenβs Hospital.
29.05.2025 08:57 β π 3 π 1 π¬ 1 π 0Very well done indeed @hbelrick.bsky.social ! π
30.05.2025 12:41 β π 0 π 0 π¬ 0 π 0SAVANA is out in the wild π¦! #SAVANA detects haplotype-resolved somatic structural variants (SVs), copy number aberrations, and calculates tumour purity and ploidy using long-read data. Together with it, a robust, data-driven benchmarking effort! Below is a thread with all the advantages π
29.05.2025 19:08 β π 5 π 3 π¬ 0 π 0Link to full text (open access) www.nature.com/articles/s41...
28.05.2025 20:47 β π 4 π 1 π¬ 0 π 0and huge thanks to our funders π @curesarcoma.bsky.social CTOS, @embl.org and others!
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0HT @jevalleinclan.bsky.social nclan.bsky.social, and other lab members at @ebi.embl.org bl.org, our great collaborators at @bostonchildrens.bsky.social s.bsky.social Melanie Tanguy and Greg Elgar @genomicsengland.bsky.social sengland.bsky.social IMM Lisbon...
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0SAVANA was developed by two superstars in the lab @hbelrick.bsky.social & Carolin Sauer in close collaboration once again with Prof. Flanagan and team at @ucl.ac.uk with key contributions from...
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0In sum, we establish best practices for benchmarking SV detection methods for somatic (eg cancer) genome analysis, and show that SAVANA enables the application of long-read sequencing to detect SVs and SCNAs reliably in clinical samples.
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0Finally, SAVANA predictive modelling framework incorporates Conformal Prediction, a mathematically sound method to control the error rate of predictions (hence the βreliableβ in the title). Conformal prediction is a robust method we've used in other contexts as well eg pubs.acs.org/doi/10.1021/...
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0Importantly, #SAVANA harnesses read-phasing information during model training & provides haplotype-resolved SV calls, facilitating the assembly of complex SVs at single-haplotype resolution, eg our work on LTA chromothripsis in osteosarcoma @cellpress.bsky.social
www.cell.com/cell/fulltex...
In practice, this means that we can now study, reliably, complex genomic rearrangements (e.g. #chromothripsis) and clinically relevant events causing tumour suppressor gene loss using long reads (left) with comparable accuracy to Illumina (right):
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0Moreover, using #SAVANA, we can estimate tumour purity and ploidy with comparable accuracy to illumina data (using the fantastic pipeline developed by the Hartwig Medical Foundation @ecuppen.bsky.social @danielisskeptical.bsky.social for clinical reports) even WITHOUT a germline control!
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0Using SAVANA, we recover most of the SVs detected in short-read data (note the higher than two-fold diff in coverage between long and short reads here!!), and most of the SVs detected using long reads are detected in illumina data (note that we are not using ultra-long reads)
28.05.2025 17:36 β π 0 π 0 π¬ 2 π 0Now that we have a robustly-validated algorithm we can address the question you are all waiting for (and which many colleagues have asked us many times): what is the relative performance of long & short reads to analyze human cancer genomes?
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0What underpins the higher performance of SAVANA? A key innovation of SAVANA is the use of machine learning to distinguish true somatic signal from artefacts. The key challenge here was to curate a large training set (see details in the paper).
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0In sum, these data indicate that SAVANA delivers SV results consistent with tumour biology, and the differences in SV rates across algorithms are caused by variable algorithmic performance, rather than true biological signal (see other analysis in support of this conclusion in the paper)
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0For example, existing methods detect 100s to 1000s of SVs in each sample mapping to microsatellite regions (#SAVANA doesnβt). The tumour types we analysed (sarcomas and glioblastomas) rarely show such levels of repeat instability, which we confirmed for our sample using illumina
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0We found the same when using simulated sequencing replicates of the blood samples we use as germline controls. So, what are the false positive SVs called by some algorithms and not by others? What drives such strong differences in performance?
28.05.2025 17:36 β π 1 π 0 π¬ 1 π 0Using sequencing replicates of the normal cell line COLO829BL, we found that SAVANA shows 13- and 82-times higher specificity than the second and third-best performing algorithms (391x higher than the worse performing one). In practice, this means 10s-1000s less false positives..
28.05.2025 17:36 β π 1 π 1 π¬ 1 π 0Still not convinced? We also reasoned the following. If you use the same sample as both the tumour and matched germline sample to look for somatic SVs, how many would you detect? The answer is: 0, as you are comparing the same sample against itself. In other words: 1-1=0
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0Well, nullius in verba, let the data speak. SAVANA shows uniform and much higher replication rates across SV clonality levels, sizes, types, samples and genomic regions. Thus, SAVANA's performance if driven by higher sensitivity and specificity!
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0The replication rate of #SAVANA was much higher as compared to existing methods πOne could argue that lowAF SVs might be missed in one replicate if reads supporting them are assigned to one replicate, and that the many SVs detected by some tools is due to higher sensitivity.
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0First, we simulated replicates in silico by randomly splitting the sequencing reads from each tumour, reaching even coverage per replicate (note that flowcell yield is variable for ONT). The key idea is: true SVs are detected in both replicates, false positives in just one
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0We propose to harness a cornerstone of experimental design, which is.. replication! Specifically, we propose the following genomewide & data-driven framework for benchmarking algorithms π
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0SV detection methods have been benchmarked using small SV truth sets from cell lines. This is limited in many ways (see preprint), so the critical question now arises - how can we benchmark SV detection algorithms across the entire genome in an unbiased and data-driven manner??π€
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0Next, we applied existing SV detection methods designed for germline SV discovery (Sniffles2, cuteSV, SVIM) and for cancer genome analysis (Severus, NanomonSV,SVision-Pro). Sv rates varied across 2 orders of magnitude β so how can we benchmark these tools?
28.05.2025 17:36 β π 0 π 0 π¬ 1 π 0So, first, we generated matched illumina (~120x) and @nanoporetech.com (~50x) for 99 tumours with complex rearrangement profiles: 57 diverse soft tissue #sarcomas, 28 #osteosarcomas and 14 #glioblastomas. All long-read data are accessible via the EGA:
ega-archive.org/studies/EGAS...