Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data.
The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv www.biorxiv.org/content/10.6... and github github.com/pachterlab/k...
Figure 1 shows they key result
I had the occasion of presenting nice results about the detection of biological events in De Bruijn Graph at #DSB2026, in the context of my PhD work on #Vizitig !
Thanks to the organizers and colleagues for this amazing and super-inspiring event (and @camillemrcht.bsky.social for the picture).
Beautiful caveat section !
Pour l'importance des pesticides dans l'incidence des cancers, voyez plutot ceci. Les expositions professionnelles (amiante, benzene) sont dans la barre bleue à droite, et les pesticides n'apparaissent nulle part faute de données suffisantes.
www.nature.com/articles/s41...
👇 😨
PREPRINT ALERT
I heard you craving for more combinatorics, here are some more for y'all !
Pour l'importance des facteurs de risque de cancer, voyez plutôt ceci. La petite zone bleu clair, ce sont toutes les causes professionnelles: amiante, arsenic, etc. Les pesticides n'apparaissent nulle part faute de données suffisantes.
Source: Fink et al. Nature Medicine, 2026
More minimizer papers! 😆
Stay tuned: We are now running Metapuccino on SRA’s 1 million human transcriptomes.
This ms. covers the full methodology and discusses the limits of NLP and LLMs for NGS metadata completion.
Usability was a top priority: Metapuccino runs on regular computers with open-source LLMs, but can also scale up on GPUs for large datasets. All it needs is a list of SRA IDs — no pre-processed tables required.
Fiona Hak developed a clever LLM training strategy using the hardest SRA cases — the fine-tuned model is available on Hugging Face.
Metapuccino fills and standardizes 19 key SRA metadata fields in human transcriptomics, using rule-based NLP and a large language model (LLM).
Even simple tasks, like selecting tumor vs. normal samples for a cancer type, require expert curation across multiple tables, protocols, and abstracts.
NCBI’s SRA is a fantastic resource for studying the human transcriptome. But its metadata is messy — over 70% of fields are empty, and information is often inconsistent.
www.biorxiv.org/cgi/content/...
What’s behind Metapuccino? ☕️, by PhD student Fiona Hak, @camillemrcht.bsky.social and Melina Gallopin. A thread 👇
My algorithmic friends (@camillemrcht.bsky.social) doing LLM stuff : www.biorxiv.org/content/10.1...! And also, screaming last names in the author list ;P. Given my level of trust in Camille, though, perhaps it's time for me to engage more seriously with these models in research...
Interested in #lncRNA and #ArtificiaIntelligence?
In the frame of our recently founded French-Korean bilateral project DHARP, we are recruiting a post-doc in bioinformatics and artificial intelligence in our team at
@ips2parissaclay.bsky.social
Application limit: 01/12/2025
PubMed is running on autopilot during shutdown, but key independent committee has been abolished www.bmj.com/content/391/... 🧪
New tool "bwt-svg" for making illustrations of the BWT and the many auxiliary arrays and other structures related to it. Pyodide-based no-installation-necessary interface here: benlangmead.github.io/bwt-svg/. (H/t to @robert.bio for pointing me to pyodide!) Full repo: github.com/benlangmead/....
The MSc. Bioinformatics students of U. Paris-Saclay are organizing the Junior Conference on Computational Biology (JC2B) 2025: AI and predictive models in bioinformatics
November 13, 2025 - I2BC, CNRS, Gif-sur-Yvette, France
Register for free : bioi2.i2bc.paris-saclay.fr/jc2b/#regist...
🦠🧍♀️From bacterial to human immunity.
We report in @science.org the discovery of a human homolog of SIR2 antiphage proteins that participates in the TLR pathway of animal innate immunity.
Co-led wt @enzopoirier.bsky.social by D. Bonhomme and @hugovaysset.bsky.social
www.science.org/doi/10.1126/...
Congratulations to Rayan Chiki, (Institut Pasteur) head of the “Sequence Bioinformatics” unit, for securing the ERC Proof of Concept 2025 for his project ENZYMINER! 👏
@rayan.chiki.bsky.social
#Bioinformatics
Paper Alert!
Our preprint on the K2R index, being able to efficiently associate kmers to the reads containing them is finally out there!
A thread!
academic.oup.com/bioinformati...
New ENCODE4 long-read RNA-seq transcripts track for hg38 and mm10. Triplets (e.g. [1,1,3]) indicate start site, exon combination, and stop site for each transcript. Enrichment scores show how these change across tissue and cell line samples.
Read more: genome.ucsc.edu/gold...
#JOBIM2025 Mathilde Girard ends the session with a simple but effective idea: re oder the reads before using an off the shelf compressor to improve compression gain
#JOBIM2025 @bdegardins.bsky.social presents his PhD work on Vizitig, a multi sample graph exploration tool, with a focus on RNA - this afternoon we'll do a demo on pangenomes with the same tool
Paper alert!
We present Oreo a tools that reorder long reads datasets in a way to compress them efficiently with ANY universal compressor like gz, zstd, xz ...
TLDR: You can get state of the art compression WITHOUT a dedicated compressor/decompressor!
academic.oup.com/bioinformati...
A thread!
Preprint alert from the group 🚨 super fast grep-like sequence selection