's Avatar

@isidrolauscher.bsky.social

226 Followers  |  508 Following  |  35 Posts  |  Joined: 05.12.2024  |  1.9616

Latest posts by isidrolauscher.bsky.social on Bluesky

Preview
SAVANA: reliable analysis of somatic structural variants and copy number aberrations using long-read sequencing - Nature Methods SAVANA is a tool to detect somatic structural variants and copy number aberrations using long-read sequencing data, offering high sensitivity, specificity and compatibility with or without germline co...

open access publication here www.nature.com/articles/s41... @nanoporetech.com @pacbio.bsky.social

20.07.2025 23:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Kudos to Sonia Zumalave in my lab for working out how to flag and remove such fold-back-like artifacts, with key contributions from @hbelrick.bsky.social @carolinmsa.bsky.social and @jevalleinclan.bsky.social. Thread on our algorithm bsky.app/profile/isid... and ..

20.07.2025 23:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Very glad to see this preprint by @lh3lh3.bsky.social and Meyerson labs www.biorxiv.org/content/10.1... confirming our finding of artifactual fold-back inv in long reads (Fig S1 in our @natmethods.nature.com‬ paper presenting SAVANA, which filters such artifacts to improve SV calling πŸ‘‡

20.07.2025 23:35 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Work by researchers in the group of @isidrolauscher.bsky.social at EMBL-EBI, the R&D lab of @genomicsengland.bsky.social, in collaboration with clinical partners at @ucl.ac.uk, Royal National Orthopaedic Hospital, Instituto de Medicina Molecular JoΓ£o Lobo Antunes, and Boston Children’s Hospital.

29.05.2025 08:57 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Very well done indeed @hbelrick.bsky.social ! πŸ˜€

30.05.2025 12:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

SAVANA is out in the wild 🦁! #SAVANA detects haplotype-resolved somatic structural variants (SVs), copy number aberrations, and calculates tumour purity and ploidy using long-read data. Together with it, a robust, data-driven benchmarking effort! Below is a thread with all the advantages πŸ‘‡

29.05.2025 19:08 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
SAVANA: reliable analysis of somatic structural variants and copy number aberrations using long-read sequencing - Nature Methods SAVANA is a tool to detect somatic structural variants and copy number aberrations using long-read sequencing data, offering high sensitivity, specificity and compatibility with or without germline co...

Link to full text (open access) www.nature.com/articles/s41...

28.05.2025 20:47 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

and huge thanks to our funders πŸ™ @curesarcoma.bsky.social CTOS, @embl.org and others!

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Bluesky

HT @jevalleinclan.bsky.social nclan.bsky.social, and other lab members at @ebi.embl.org bl.org, our great collaborators at @bostonchildrens.bsky.social s.bsky.social Melanie Tanguy and Greg Elgar @genomicsengland.bsky.social sengland.bsky.social IMM Lisbon...

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

SAVANA was developed by two superstars in the lab @hbelrick.bsky.social & Carolin Sauer in close collaboration once again with Prof. Flanagan and team at @ucl.ac.uk with key contributions from...

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In sum, we establish best practices for benchmarking SV detection methods for somatic (eg cancer) genome analysis, and show that SAVANA enables the application of long-read sequencing to detect SVs and SCNAs reliably in clinical samples.

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks Deep learning architectures have proved versatile in a number of drug discovery applications, including the modeling of in vitro compound activity. While controlling for prediction confidence is essen...

Finally, SAVANA predictive modelling framework incorporates Conformal Prediction, a mathematically sound method to control the error rate of predictions (hence the β€˜reliable’ in the title). Conformal prediction is a robust method we've used in other contexts as well eg pubs.acs.org/doi/10.1021/...

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Ongoing chromothripsis underpins osteosarcoma genome complexity and clonal evolution Ongoing chromothripsis shapes genomic evolution in a diverse range of sarcomas and carcinomas. High-grade osteosarcoma is predominantly driven by the mechanism of loss-translocation-amplification chro...

Importantly, #SAVANA harnesses read-phasing information during model training & provides haplotype-resolved SV calls, facilitating the assembly of complex SVs at single-haplotype resolution, eg our work on LTA chromothripsis in osteosarcoma @cellpress.bsky.social
www.cell.com/cell/fulltex...

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In practice, this means that we can now study, reliably, complex genomic rearrangements (e.g. #chromothripsis) and clinically relevant events causing tumour suppressor gene loss using long reads (left) with comparable accuracy to Illumina (right):

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Moreover, using #SAVANA, we can estimate tumour purity and ploidy with comparable accuracy to illumina data (using the fantastic pipeline developed by the Hartwig Medical Foundation @ecuppen.bsky.social @danielisskeptical.bsky.social for clinical reports) even WITHOUT a germline control!

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Using SAVANA, we recover most of the SVs detected in short-read data (note the higher than two-fold diff in coverage between long and short reads here!!), and most of the SVs detected using long reads are detected in illumina data (note that we are not using ultra-long reads)

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Now that we have a robustly-validated algorithm we can address the question you are all waiting for (and which many colleagues have asked us many times): what is the relative performance of long & short reads to analyze human cancer genomes?

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What underpins the higher performance of SAVANA? A key innovation of SAVANA is the use of machine learning to distinguish true somatic signal from artefacts. The key challenge here was to curate a large training set (see details in the paper).

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In sum, these data indicate that SAVANA delivers SV results consistent with tumour biology, and the differences in SV rates across algorithms are caused by variable algorithmic performance, rather than true biological signal (see other analysis in support of this conclusion in the paper)

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

For example, existing methods detect 100s to 1000s of SVs in each sample mapping to microsatellite regions (#SAVANA doesn’t). The tumour types we analysed (sarcomas and glioblastomas) rarely show such levels of repeat instability, which we confirmed for our sample using illumina

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We found the same when using simulated sequencing replicates of the blood samples we use as germline controls. So, what are the false positive SVs called by some algorithms and not by others? What drives such strong differences in performance?

28.05.2025 17:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Using sequencing replicates of the normal cell line COLO829BL, we found that SAVANA shows 13- and 82-times higher specificity than the second and third-best performing algorithms (391x higher than the worse performing one). In practice, this means 10s-1000s less false positives..

28.05.2025 17:36 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Still not convinced? We also reasoned the following. If you use the same sample as both the tumour and matched germline sample to look for somatic SVs, how many would you detect? The answer is: 0, as you are comparing the same sample against itself. In other words: 1-1=0

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Well, nullius in verba, let the data speak. SAVANA shows uniform and much higher replication rates across SV clonality levels, sizes, types, samples and genomic regions. Thus, SAVANA's performance if driven by higher sensitivity and specificity!

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The replication rate of #SAVANA was much higher as compared to existing methods πŸ‘‡One could argue that lowAF SVs might be missed in one replicate if reads supporting them are assigned to one replicate, and that the many SVs detected by some tools is due to higher sensitivity.

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

First, we simulated replicates in silico by randomly splitting the sequencing reads from each tumour, reaching even coverage per replicate (note that flowcell yield is variable for ONT). The key idea is: true SVs are detected in both replicates, false positives in just one

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We propose to harness a cornerstone of experimental design, which is.. replication! Specifically, we propose the following genomewide & data-driven framework for benchmarking algorithms πŸ‘‡

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

SV detection methods have been benchmarked using small SV truth sets from cell lines. This is limited in many ways (see preprint), so the critical question now arises - how can we benchmark SV detection algorithms across the entire genome in an unbiased and data-driven manner??πŸ€”

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Next, we applied existing SV detection methods designed for germline SV discovery (Sniffles2, cuteSV, SVIM) and for cancer genome analysis (Severus, NanomonSV,SVision-Pro). Sv rates varied across 2 orders of magnitude – so how can we benchmark these tools?

28.05.2025 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Nanopore whole-genome sequencing of human sarcomas - EGA European Genome-Phenome Archive We performed single and multi-region nanopore whole-genome sequencing on human sarcoma samples from adult and paediatric patients.

So, first, we generated matched illumina (~120x) and @nanoporetech.com (~50x) for 99 tumours with complex rearrangement profiles: 57 diverse soft tissue #sarcomas, 28 #osteosarcomas and 14 #glioblastomas. All long-read data are accessible via the EGA:
ega-archive.org/studies/EGAS...

28.05.2025 17:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@isidrolauscher is following 20 prominent accounts