Botond Sipos's Avatar

Botond Sipos

@sbotond.bsky.social

EMBL::EBI::EnsEMBL::Compara GitHub: https://github.com/botond-sipos Scholar: https://bit.ly/botond-sipos-scholar Substack: https://substack.com/@sbotond

49 Followers  |  112 Following  |  51 Posts  |  Joined: 01.02.2025  |  2.065

Latest posts by sbotond.bsky.social on Bluesky

Preview
Systematic benchmarking of basecalling models for RNA modification detection with highly multiplexed nanopore sequencing Nanopore direct RNA sequencing (DRS) holds promise for advancing our understanding of the epitranscriptome by detecting RNA modifications in native RNA molecules. Recently, Oxford Nanopore Technologie...

🚨 New preprint alert 🚨
We systematically benchmarked @nanoporetech.com 's modification-aware basecalling models released for RNA on sets of in vitro and in vivo sequences and made some curious observations πŸ§¬πŸ”.
bit.ly/4lXqNul
Follow along for a little recap (1/12)

14.07.2025 15:59 β€” πŸ‘ 42    πŸ” 22    πŸ’¬ 1    πŸ“Œ 1

Claus Wilke on Alphafold and the problem of protein folding in 2025

13.07.2025 20:41 β€” πŸ‘ 18    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0
Preview
Go 1.25 interactive tour Fake clock, new GC, flight recorder and more.

Go 1.25 interactive tour

Go 1.25 is scheduled for release in August, so it's a good time to explore what's new.
#golang

antonz.org/go-1-25/

28.06.2025 03:15 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Excited to launch our AlphaGenome API goo.gle/3ZPUeFX along with the preprint goo.gle/45AkUyc describing and evaluating our latest DNA sequence model powering the API. Looking forward to seeing how scientists use it! @googledeepmind

25.06.2025 14:29 β€” πŸ‘ 217    πŸ” 82    πŸ’¬ 5    πŸ“Œ 10
Preview
A general substitution matrix for structural phylogenetics. Abstract. Sequence-based maximum likelihood (ML) phylogenetics is a widely used method for inferring evolutionary relationships, which has illuminated the

New paper from the lab from Sriram Garg in my group. We introduce a general substitution matrix for structural phylogenetics. I think this is a big deal, so read on below if you think deep history is important. academic.oup.com/mbe/advance-...

11.06.2025 14:01 β€” πŸ‘ 93    πŸ” 52    πŸ’¬ 3    πŸ“Œ 2
Preview
Bayesian Phylodynamic Inference of Multitype Population Trajectories Using Genomic Data Abstract. Phylodynamic methods provide a coherent framework for the inference of population parameters directly from genetic data. They are an important to

Vaughan & @tanjastadler.bsky.social develop a method to infer multitype population trajectories and apply it to MERS-CoV, revealing transmission patterns between camels and humans.

πŸ”— doi.org/10.1093/molbev/msaf130

#evobio #molbio #virus

17.06.2025 14:29 β€” πŸ‘ 9    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

FastGA: Fast Genome Alignment www.biorxiv.org/content/10.1... 🧬πŸ–₯️πŸ§ͺ www.github.com/thegenemyers...

20.06.2025 09:39 β€” πŸ‘ 25    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1

Powerful stuff from @juliosaezrod.bsky.social who found himself on the other end of the process - as a patient not a computational biology researcher - giving him insight into both research and patient perspectives. Huge credit to Julio for talking about his experiences here

20.06.2025 08:06 β€” πŸ‘ 30    πŸ” 10    πŸ’¬ 0    πŸ“Œ 0
Post image

Michael Ashburner FRS was an influential figure in the fields of Drosophila genomics and early sequencing database initiatives such as @ebi.embl.org.

Read about their contributions across genetics and bioinformatics in the new biographical memoir: buff.ly/f01zNat

@geneticscam.bsky.social‬

17.06.2025 10:44 β€” πŸ‘ 29    πŸ” 19    πŸ’¬ 3    πŸ“Œ 0
Post image

Preprint on "Improving spliced alignment by modeling splice sites with deep learning". It describes minisplice for modeling splice signals. Minimap2 and miniprot now optionally use the predicted scores to improve spliced alignment.
arxiv.org/abs/2506.12986

17.06.2025 01:48 β€” πŸ‘ 108    πŸ” 54    πŸ’¬ 0    πŸ“Œ 1
Post image

Probabilistic Data Structures in Go: Building and Benchmarking a Bloom Filter
#golang

dev.to/umangsinha1...

14.06.2025 04:29 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Memories of the Human Genome Project at the Sanger Centre | Annual Reviews 2025 marks the twenty-fifth anniversary of the completion of a working draft of the 3-Gb human genome sequence and its availability in public databases to promote research into human health and diseas...

Memories of the Human #Genome Project at the #Sanger Centre www.annualreviews.org/content/jour...

06.06.2025 05:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Figure 3. Overview of data sources for RNA modification detection model development

Figure 3. Overview of data sources for RNA modification detection model development

"Investigating RNA Dynamics from Single Molecule Transcriptomes"
by Sarath Chandra Janga & colleagues

"In this review, we examine . . . isoform detection, poly(A) tail length quantification, and mapping of RNA modifications."

FREE till July 24th with this link:
authors.elsevier.com/a/1lC%7EfcQb...

05.06.2025 13:37 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Bit-Reproducible Phylogenetic Tree Inference under Varying Core-Counts via Reproducible Parallel Reduction Operators Motivation: Phylogenetic trees describe the evolutionary history among biological species based on their genomic data. Maximum Likelihood (ML) based phylogenetic inference tools search for the tree and evolutionary model that best explain the observed genomic data. Given the independence of likelihood score calculations between different genomic sites, parallel computation is commonly deployed. This is followed by a parallel summation over the per-site scores to obtain the overall likelihood score of the tree. However, basic arithmetic operations on IEEE 754 floating-point numbers, such as addition and multiplication, inherently introduce rounding errors. Consequently, the order by which floating-point operations are executed affects the exact resulting likelihood value since these operations are not associative. Moreover, parallel reduction algorithms in numerical codes re-associate operations as a function of the core count and cluster network topology, inducing different round-off errors. These low-level deviations can cause heuristic searches to diverge and induce high-level result discrepancies (e.g., yield topologically distinct phylogenies). This effect has also been observed in multiple scientific fields, beyond phylogenetics. Results: We observe that varying the degree of parallelism results in diverging phylogenetic tree searches (high level results) for over 31 % out of 10 130 empirical datasets. More importantly, 8 % of these diverging datasets yield trees that are statistically significantly worse than the best known ML tree for the dataset (AU-test, p < 0.05). To alleviate this, we develop a variant of the widely used phylogenetic inference tool RAxML-NG, which does yield bit-reproducible results under varying core-counts, with a slowdown of only 0 to 12.7 % (median 0.8 %) on up to 768 cores. We further introduce the ReproRed reduction algorithm, which yields bit-identical results under varying core-counts, by maintaining a fixed operation order that is independent of the communication pattern. ReproRed is thus applicable to all associative reduction operations – in contrast to competitors, which are confined to summation. Our ReproRed reduction algorithm only exchanges the theoretical minimum number of messages, overlaps communication with computation, and utilizes fast base-cases for local reductions. ReproRed is able to all-reduce (via a subsequent broadcast) 4.1 Β· 106 operands across 48 to 768 cores in 19.7 to 48.61 Β΅s, thereby exhibiting a slowdown of 13 to 93 % over a non-reproducible all-reduce algorithm. ReproRed outperforms the state-of-the-art reproducible all-reduction algorithm ReproBLAS (offers summation only) beyond 10 000 elements per core. In summary, we re-assess non-reproducibility in parallel phylogenetic inference, present the first bit-reproducible parallel phylogenetic inference tool, as well as introduce a general algorithm and open-source code for conducting reproducible associative parallel reduction operations. ### Competing Interest Statement The authors have declared no competing interest. European Research Council, https://ror.org/0472cxd90, 882500 European Union, https://ror.org/019w4f821, 101087081

Check out our new preprint on reproducible parallel phylogenetic inference under varying core counts - it also includes a generic method for reproducible parallel associative reduction operations www.biorxiv.org/content/10.1...

05.06.2025 15:14 β€” πŸ‘ 9    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Short reads often miss complex isoform dynamics. A new approach published in @natbiotech.nature.com, miniQuant improves quantification by leveraging complementary strengths of long reads and short reads. #LongReads #LongReadTranscriptomics

05.06.2025 13:03 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads Long-read RNA sequencing (lrRNA-seq) is a powerful technology to link transcript structures to genetic variants but such analysis is not often performed due to the lack of end-user tools. Here, we int...

#SNP calling, haplotype phasing and allele-specific analysis with long #RNA-seq reads www.biorxiv.org/content/10.1...

30.05.2025 04:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Industry friends, now is the time for MUCH more speaking out on behalf of academic colleagues under duress. Here are core open source methods that many of your products doubtlessly depend on either directly or indirectly (see en.wikipedia.org/wiki/HMMER) being abruptly defunded. Make noise.

29.05.2025 14:39 β€” πŸ‘ 76    πŸ” 50    πŸ’¬ 1    πŸ“Œ 0

Genuinely... quite excited about this. I think I might have to install it just for old skool x new school kicks.

(plus - great for controlled cloud environments!)

27.05.2025 22:00 β€” πŸ‘ 40    πŸ” 15    πŸ’¬ 0    πŸ“Œ 0
Preview
Conservation of regulatory elements with highly diverged sequences across large evolutionary distances Nature Genetics - Combining functional genomic data from mouse and chicken with a synteny-based strategy identifies positionally conserved cis-regulatory elements in the absence of direct sequence...

How to find Evolutionary Conserved Enhancers in 2025? 🐣-🐭
Check out our paper - fresh off the press!!!
We find widespread functional conservation of enhancers in absence of sequence homology
Including: a bioinformatic tool to map sequence-diverged enhancers!
rdcu.be/enVDN
github.com/tobiaszehnde...

27.05.2025 12:19 β€” πŸ‘ 241    πŸ” 109    πŸ’¬ 7    πŸ“Œ 9
Preview
Oxford Nanopore Tech Update LC2025 highlights My highlights from the LC2025 announcements

Oxford Nanopore Tech Update LC2025. My full analysis of what this means for NGS and Multi-Omics, including the new Proteomics PoC. open.substack.com/pub/albertvi...

27.05.2025 06:45 β€” πŸ‘ 13    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Preview
My Lab Unlocked: Professor Benjamin Schuster-BΓΆckler of the Ludwig Institute for Cancer Research Professor Benjamin Schuster-BΓΆckler FRSB on his work to understand the role of changing genomic patterns in the processes that drive cancer initiation My group aims to understand the processes that dr...

Thank you to @rsb.org.uk for this fantastic feature in The Biologist, where I get to talk a bit about my team and how I got to where I am right now:

22.05.2025 08:23 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Prior as data, prior as belief, prior as soft constraint, prior as unconditional distribution in a generative model | Statistical Modeling, Causal Inference, and Social Science

Prior as data, prior as belief, prior as soft constraint, prior as unconditional distribution in a generative model
statmodeling.stat.columbia.edu/2025/05/21/p...

21.05.2025 16:21 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 0    πŸ“Œ 3

Estimation of substitution and indel rates via k-mer statistics https://www.biorxiv.org/content/10.1101/2025.05.14.653858v1

18.05.2025 04:48 β€” πŸ‘ 10    πŸ” 7    πŸ’¬ 0    πŸ“Œ 1
Post image

End-to-end simulation of nanopore sequencing signals with feed-forward transformers https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btae744/7930676 🧬πŸ–₯️πŸ§ͺ https://github.com/ZKI-PH-ImageAnalysis/seq2squiggle

24.12.2024 08:21 β€” πŸ‘ 9    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1
Preview
Efficient evidence-based genome annotation with EviAnn For many years, machine learning-based ab initio gene finding approaches have been the central components of eukaryotic genome annotation pipelines, and they remain so today. The reliance on these app...

Bioinformatics folks: check out our @biorxivpreprint on a new, very efficient and accurate system for automated genome annotation, EviAnn, led by my colleague Aleksey Zimin: www.biorxiv.org/content/10.1...

13.05.2025 17:52 β€” πŸ‘ 54    πŸ” 22    πŸ’¬ 1    πŸ“Œ 0
Preview
Predicting gene expression from DNA sequence using deep learning models - Nature Reviews Genetics Barbadilla-MartΓ­nez et al. review recent progress in deep-learning-based sequence-to-expression models, which predict gene expression levels solely from DNA sequence. These models are providing new in...

Predicting gene expression from DNA sequence using deep learning models www.nature.com/articles/s41...

13.05.2025 15:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
De novo clustering of large long-read transcriptome datasets with isONclust3 AbstractMotivation. Long-read sequencing techniques can sequence transcripts from end to end, greatly improving our ability to study the transcription proc

@alexanderjpetri.bsky.social's isONclust3 algorithm is now published doi.org/10.1093/bioi.... isONclust3 performs de novo clustering of long-read cDNA sequencing data. A key step in reference-free transcriptome analysis.

08.05.2025 13:04 β€” πŸ‘ 11    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Preview
Building better genome annotations across the tree of life An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

Building better #genome #annotations across the tree of life genome.cshlp.org/content/35/5...

02.05.2025 18:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A kinetic ruler controls mRNA poly(A) tail length Poly(A) tails of newly synthesized mRNAs have uniform lengths, arising through cooperation between the cleavage and polyadenylation complex (CPAC) and poly(A) binding proteins (PABPs). In the budding ...

New work from Matti Turtola showing that mRNA poly(A) tail length is controlled by a kinetic ruler. A real pleasure to work with him!
www.biorxiv.org/content/10.1...

02.05.2025 17:10 β€” πŸ‘ 52    πŸ” 25    πŸ’¬ 0    πŸ“Œ 1
Video thumbnail

This time-lapse captures 17 hours of axonal growth from a chicken dorsal root ganglion explant, visualized through the actin cytoskeleton using live confocal imaging.
I just submitted this video to the Nikon Small World in Motion competition. Today is the last day to upload yours! πŸ˜‰
πŸ§ͺ

30.04.2025 07:16 β€” πŸ‘ 1432    πŸ” 225    πŸ’¬ 68    πŸ“Œ 26

@sbotond is following 20 prominent accounts