Pia Rautenstrauch's Avatar

Pia Rautenstrauch

@prauten.bsky.social

Computer Science PhD Student at @humboldtuni.bsky.social and @mdc-berlin.bsky.social | Data Science | Machine learning | AI | Bioinformatics | Genomics | Single-Cell Biology

113 Followers  |  192 Following  |  11 Posts  |  Joined: 20.01.2025  |  1.84

Latest posts by prauten.bsky.social on Bluesky

Post image

🧠 The Lipid #Brain Atlas is out now! If you think #lipids are boring and membranes are all the same, prepare to be surprised. Led by @lucafusarbassini.bsky.social with Giovanni D'Angelo's lab, we mapped membrane lipids in the mouse brain at high resolution.
www.biorxiv.org/cgi/content/...

16.10.2025 06:23 β€” πŸ‘ 275    πŸ” 107    πŸ’¬ 7    πŸ“Œ 11
Post image

We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)

22.09.2025 05:29 β€” πŸ‘ 174    πŸ” 90    πŸ’¬ 4    πŸ“Œ 5
Post image

I did not know Taylor Swift was moonlighting in soliciting contributions for fake journals!

16.09.2025 19:29 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Check out my talented colleagues' study, profiling hundreds of CRISPRa-responsive regulatory elements surrounding PHOX2B, a key player in neuroblastoma, using a targeted scRNA-seq screen in a neuroblastoma cell line.

13.09.2025 16:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Meeting agenda

Sep 10, 2025
Attendees:
Links:
Agenda (feel free to add your items):
β€’ Blog almost ready for R blogger linkage thanks to @Izabela Mamede, @Mengyuan Shen and @Maria Doyle
β€’ New posts from many including
@Juan Henao and myself
β€’ Ideas for other posts?
β€’ There is tidybulk v2 ready to be submitted. Some feedback would be nice there.
β€’ Stefano's new speedy code in tidySE
β€’ https://github.com/tidyomics/
genomics-todos/issues/19#is suecomment-3239791713
β€’ https://github.com/tidyomics/t
idySummarizedExperiment/i ssues/106
β€’ Report back from tidyomics workshop at useR! (Justin and Mike)
β€’ Other projects in the works?
β€’ Ideas for engaging new users?
New developers?

Meeting agenda Sep 10, 2025 Attendees: Links: Agenda (feel free to add your items): β€’ Blog almost ready for R blogger linkage thanks to @Izabela Mamede, @Mengyuan Shen and @Maria Doyle β€’ New posts from many including @Juan Henao and myself β€’ Ideas for other posts? β€’ There is tidybulk v2 ready to be submitted. Some feedback would be nice there. β€’ Stefano's new speedy code in tidySE β€’ https://github.com/tidyomics/ genomics-todos/issues/19#is suecomment-3239791713 β€’ https://github.com/tidyomics/t idySummarizedExperiment/i ssues/106 β€’ Report back from tidyomics workshop at useR! (Justin and Mike) β€’ Other projects in the works? β€’ Ideas for engaging new users? New developers?

These are the corresponding times for your meeting:
Location
Local Time
Durham (USA - North Carolina)
Wednesday, September 10, 2025 at 6:00:00 am
Adelaide (Australia - South Australia)
Wednesday, September 10, 2025 at 7:30:00 pm
Paris (France - Paris) |
Wednesday, September 10, 2025 at 12:00:00 noon
Corresponding UTC (GMT)
Wednesday, September 10, 2025 at 10:00:00

These are the corresponding times for your meeting: Location Local Time Durham (USA - North Carolina) Wednesday, September 10, 2025 at 6:00:00 am Adelaide (Australia - South Australia) Wednesday, September 10, 2025 at 7:30:00 pm Paris (France - Paris) | Wednesday, September 10, 2025 at 12:00:00 noon Corresponding UTC (GMT) Wednesday, September 10, 2025 at 10:00:00

Our first Fall #tidyomics meeting will be this Wed 10 September, early in US / noon in Europe / late in Australia. Feel free to join if you're interested in what we are doing to make omics data more amenable to tidy data analysis.

Organized with Stefano @stemang.bsky.social

08.09.2025 19:45 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
Post image

L’effet Matilda n’est pas une fiction.
Il est inscrit dans l’histoire scientifique.
Il a éclipsé des femmes comme Marthe Gautier, née il y a cent ans, pionnière oubliée de la trisomie 21.
➑️ https://l.franceculture.fr/1LI

10.09.2025 04:00 β€” πŸ‘ 496    πŸ” 303    πŸ’¬ 9    πŸ“Œ 13
Preview
Cross-biobank generalizability and accuracy of electronic health record-based predictors compared to polygenic scores - Nature Genetics Comparison of electronic health record-based phenotype risk scores (PheRS) and polygenic scores (PGS) across 13 common diseases and three biobank-based studies indicates that PheRS and PGS may provide...

Are electronic health records (EHR) more predictive of disease onset than polygenic scores? Can we transfer EHR-based prediction models between countries? Our study on these questions using 3 biobank-based studies with N>845K, is out in @natgenet.nature.com today:

www.nature.com/articles/s41...

27.08.2025 14:15 β€” πŸ‘ 30    πŸ” 12    πŸ’¬ 3    πŸ“Œ 2
The participants of Dagstuhl Seminar 24122 standing on steps outside (from https://www.dagstuhl.de/24122)

The participants of Dagstuhl Seminar 24122 standing on steps outside (from https://www.dagstuhl.de/24122)

Multiple types of embeddings (UMAP, t-SNE, Laplacian Eigenmaps, PHATE, PCA, MDS) of Wikipedia text data labelled by a text summaries generated by an LLM. Methods like UMAP and t-SNE show cluster structure that reflect shared subject matter in text, whiel other methods show more continuous structure.

Multiple types of embeddings (UMAP, t-SNE, Laplacian Eigenmaps, PHATE, PCA, MDS) of Wikipedia text data labelled by a text summaries generated by an LLM. Methods like UMAP and t-SNE show cluster structure that reflect shared subject matter in text, whiel other methods show more continuous structure.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of primate brain organoids at different time periods. Different methods highlight different aspects of development, such as clusters of similar cell types or time courses of cell development.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of primate brain organoids at different time periods. Different methods highlight different aspects of development, such as clusters of similar cell types or time courses of cell development.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of 1000 Genomes Project genotypes. Different methods reflect different aspects of demographic history of populations.

Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of 1000 Genomes Project genotypes. Different methods reflect different aspects of demographic history of populations.

Last year I met a bunch of great researchers who work with high-dimensional data at a Dagstuhl seminar. This week we put out a preprint about the history and philosophy of low-dimensional embedding methods, their applications, their challenges, and their possible future arxiv.org/abs/2508.15929

27.08.2025 13:25 β€” πŸ‘ 14    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1
Post image

We spent a year writing this review of low-dim embeddings and arguing about things like epistemic roles and best practices :-) 20+ authors are all participants of the Dagstuhl seminar we held last year: www.dagstuhl.de/24122. Led by @alexandr.bsky.social and Cyril de Bodt.

arxiv.org/abs/2508.15929

27.08.2025 15:14 β€” πŸ‘ 27    πŸ” 9    πŸ’¬ 1    πŸ“Œ 0

We're committed to support as many attendees as possible join us at #scverse2025 - feel free to reach out if you have questions!

25.08.2025 17:17 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
https://authors.elsevier.com/a/1lbX08YyDfuZWX

Antibodies are highly diverse, but most possible sequences are unstable or polyreactive. In this work, just published in Cell Syst., we propose a new source of data for modeling constraints from these properties. Our models show clear improvements in predicting Ab dysfunction. (1/n)
t.co/qCZERPUMPF

15.08.2025 13:17 β€” πŸ‘ 16    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

Thanks, @paubadiam.bsky.social! That makes sense. Excited for the results πŸ”Ž.

12.08.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Very well set up benchmark and informative comparisons! I might have missed it, but did you also compare the performance of the same methods using either truly paired vs synthetically paired multimodal data as input in terms of your performance evaluation metrics, in addition to network consistency?

12.08.2025 15:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

By now, I’ve heard from many people who’ve noticed inconsistencies when using silhouette-based metrics for horizontal data integration evaluation. I hope we’ve helped shed light on why these metrics fall short and that our recommendations prove useful to you!

12.08.2025 07:51 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Excited to share our latest paper @natmethods.nature.com
We present a high-throughput framework to map cellular interactions at ultra-high scale – broadly applicable from whole-organism immune response mapping to personalized therapy response prediction (1/4).
www.nature.com/articles/s41...

07.08.2025 11:24 β€” πŸ‘ 34    πŸ” 14    πŸ’¬ 3    πŸ“Œ 0
Preview
Protein language models reveal evolutionary constraints on synonymous codon choice Evolution has shaped the genetic code, with subtle pressures leading to preferences for some synonymous codons over others. Codons are translated at different speeds by the ribosome, imposing constrai...

This preprint from Helen Sakharova is one of the coolest things to come out of my lab: β€œProtein language models reveal evolutionary constraints on synonymous codon choice.” Codon choice is a big puzzle in how information is encoded in genomes, and we have a new angle. www.biorxiv.org/content/10.1...

07.08.2025 08:29 β€” πŸ‘ 215    πŸ” 83    πŸ’¬ 6    πŸ“Œ 4

Lucky to have inspiring and supportive mentors by my side! @mikelove.bsky.social

06.08.2025 08:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Evaluating something like batch correction requires looking at the data, and picking metrics that capture what you care about. Great work @prauten.bsky.social and @uweohler.bsky.social

05.08.2025 12:48 β€” πŸ‘ 25    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0
Preview
Shortcomings of silhouette in single-cell integration benchmarking - Nature Biotechnology Silhouette score is unsuitable as a metric for single-cell data integration.

Shortcomings of silhouette in single-cell integration benchmarking - @uweohler.bsky.social @prauten.bsky.social @mdc-bimsb.bsky.social @mdc-berlin.bsky.social @humboldtuni.bsky.social go.nature.com/4fcQzZr

30.07.2025 15:26 β€” πŸ‘ 32    πŸ” 13    πŸ’¬ 1    πŸ“Œ 2

Truly grateful for the exceptional opportunity to participate in #LPSHG2025 last week, featuring a stellar ✨ lineup of leading researchers who doubled as tutors, alongside inspiring fellow PhD students. Excited to apply my learnings and see where this collaborative spirit takes genomics next!

01.08.2025 11:49 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

*Easter egg alert* NOT in the published paper. We also benchmarked Evo 2 and while it did better than other gLMs (consistent that scale can improve gLMs), it still falls short of a basic CNN trained using one-hot sequences and far short of supervised SOTA

16.07.2025 12:16 β€” πŸ‘ 26    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Preview
The duplication crisis: the other replication crisis How bad publishing incentives hinder long-term thinking in computational biology research

The duplication crisis: the other replication crisis - www.worksinprogress.news/p/the-duplic...

02.06.2025 19:41 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

The deadline for the VIB.AI group leader positions is approaching - send in your CV and short research plan before 14th June to start your BioML research lab in Leuven or Ghent

04.06.2025 07:16 β€” πŸ‘ 10    πŸ” 10    πŸ’¬ 0    πŸ“Œ 0
Post image

Excited to share my first contribution here at Illumina! We developed PromoterAI, a deep neural network that accurately identifies non-coding promoter variants that disrupt gene expression.🧡 (1/)

29.05.2025 23:57 β€” πŸ‘ 60    πŸ” 21    πŸ’¬ 1    πŸ“Œ 1
Post image Post image

We finally concluded the meeting. Thanks to all attendees for their scientific contributions and for traveling (near or far) to the meeting! Thanks to the local organizers for the infrastructure and catering, and thanks to the co-organizers @yaronorenstein.bsky.social @camillemrcht.bsky.social!

25.04.2025 08:18 β€” πŸ‘ 26    πŸ” 11    πŸ’¬ 1    πŸ“Œ 0
Post image

When investors learn that the trait for green eyes is also ~20 SNPs

14.04.2025 00:45 β€” πŸ‘ 23    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

1. LLM-generated code tries to run code from online software packages. Which is normal but
2. The packages don’t exist. Which would normally cause an error but
3. Nefarious people have made malware under the package names that LLMs make up most often. So
4. Now the LLM code points to malware.

12.04.2025 23:43 β€” πŸ‘ 7433    πŸ” 3381    πŸ’¬ 116    πŸ“Œ 425
Post image

40 days until #RECOMB2025 in Seoul! 523 attendees confirmedβ€”thank you! πŸ₯³

For those who haven't registered yet, we have great news!
πŸ“’ Early bird deadline is extended to Friday, March 21st πŸ“’
Register now at recomb2025.com 🎟️

15.03.2025 14:08 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Last day to apply ⏰😱!

If you haven’t already done so, now is the time! Apply to be one of the speakers at the #SoapboxScience summer event and present your research in a fun and relaxed atmosphere ✨

#WomenInSTEM #WomenInScience #Scicomm

21.02.2025 10:25 β€” πŸ‘ 5    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Excerpt from the scikit-learn docs on Manifold learning quoting Baloo's song from the Jungle Book.

Excerpt from the scikit-learn docs on Manifold learning quoting Baloo's song from the Jungle Book.

Look for the bare necessities with manifold learning... 🐻🎢
Data science meets Disney in the most unexpected places!

Credits to the scikit-learn docs: scikit-learn.org/stable/modul... #DataScience #ManifoldLearning #TechHumor

04.02.2025 13:43 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@prauten is following 20 prominent accounts