‘Am I redundant?’: how AI changed my career in bioinformatics
A run-in with some artefact-laden AI-generated analyses convinced Lei Zhu that machine learning wasn’t making his role irrelevant, but more important than ever.
"(...) the rise of generative AI in bioinformatics has not diminished my role, but redefined it. It has challenged me to become a better scientist. For good or ill, AI seems to be here to stay. I urge you to embrace the technology — not to replace your expertise, but to amplify it.
#bioinfo
28.10.2025 13:07 — 👍 0 🔁 0 💬 0 📌 0
Can't advise this lab more. If you'd like to work in a curious-driven and nurturing environment, with a high focus on robust data analysis, don't even think twice!
#bioinfo #datascience
07.10.2025 11:04 — 👍 2 🔁 0 💬 0 📌 0
Last week, I was fortunate enough to watch a talk from @tkorem.bsky.social , where he presented different things, from addressing inter- study variability on microbiome projects to the use of novel approaches on metagenomics alignment and processing. Interesting and very relevant!
#bioinfo
30.09.2025 12:32 — 👍 3 🔁 0 💬 0 📌 0
Glucostats: an efficient Python library for glucose time series feature extraction and visual analysis - BMC Bioinformatics
Background The advancement of technology and continuous glucose monitoring (CGM) systems has introduced several computational and technical challenges for clinicians and researchers. The growing volume of CGM data necessitates the development of efficient computational tools capable of handling and processing this information effectively. This paper introduces GlucoStats, an open-source and multi-processing Python library designed for efficient computation and visualization of a comprehensive set of glucose metrics derived from CGM. It simplifies the traditionally time-consuming and error-prone process of manual CGM metrics calculation, making it a valuable tool for both clinical and research applications. Results Its modular design ensures easy integration into predefined workflows, while its user-friendly interface and extensive documentation make it accessible to a broad audience, including clinicians and researchers. GlucoStats offers several key features: (i) window-based time series analysis, enabling time series division into smaller ‘windows’ for detailed temporal analysis, particularly beneficial for CGM data; (ii) advanced visualization tools, providing intuitive, high-quality visualizations that facilitate pattern recognition, trend analysis, and anomaly detection in CGM data; (iii) parallelization, leveraging parallel computing to efficiently handle large CGM datasets by distributing computations across multiple processors; and (iv) scikit-learn compatibility, adhering to the standardized interface of scikit-learn to allow an easy integration into machine learning pipelines for end-to-end analysis. Conclusions GlucoStats demonstrates high efficiency in processing large-scale medical datasets in minimal time. Its modular design enables easy customization and extension, making it adaptable to diverse research and clinical needs. By offering precise CGM data analysis and user-friendly visualization tools, it serves both technical researchers and non-technical users, such as physicians and patients, with practical and research-driven applications.
"GlucoStats demonstrates high efficiency in processing large-scale medical datasets in minimal time. Its modular design enables easy customization and extension, making it adaptable to diverse research and clinical needs"
bmcbioinformatics.biomedcentral.com/articles/10....
#datascience #biostats
29.09.2025 08:10 — 👍 1 🔁 0 💬 0 📌 0
Amazing repository with several references and resources for scRNASeq analysis
github.com/crazyhottomm...
#bioinfo #singlecell
11.09.2025 13:54 — 👍 0 🔁 0 💬 0 📌 0
Push-button science - Nature Methods
Technological advances change not only what we can learn as scientists, but also how science is conducted. Here we explore how automation and outsourcing are affecting the act of doing science.
"Even if work is done by or with the help of experts (...), it is crucial that researchers understand how a method works, that they can assess data quality, and that they fundamentally understand what types of conclusions can and cannot be drawn from their data"
www.nature.com/articles/s41...
10.09.2025 23:48 — 👍 0 🔁 0 💬 0 📌 0
Six questions to ask before jumping into a spreadsheet
Spreadsheet software can be frustrating, but adopting some helpful habits can improve its effectiveness.
Spreadsheets represent an everyday tool for most wet-lab scientists. So, why not use them at their highest potential, efficiently and ready for open science?
This paper provides some recommendations for the use of spreadsheets:
www.nature.com/articles/d41...
#bioinfo #stats
21.08.2025 08:29 — 👍 2 🔁 0 💬 0 📌 0
Harvard University lays off fly database team
The layoffs jeopardize this resource, which has served more than 4,000 labs for about three decades.
FlyBase, a Drosophila database, will lose a third of its team in early October because the Harvard grant that covered the employees’ salaries was canceled. Scientists warn that losing FlyBase could devastate fly research.
By @claudia-lopez.bsky.social
www.thetransmitter.org/community/ha...
13.08.2025 19:32 — 👍 123 🔁 129 💬 3 📌 12
4/n
From equitable precision medicine to large-scale population studies and evolutionary research, pangenomics graphs and graph mapping excel in capturing diversity—especially in highly variable or repetitive genome regions where linear mapping fails.
12.08.2025 09:04 — 👍 0 🔁 0 💬 1 📌 0
3/n
Unlike linear references, which favour the “reference” allele (known as reference bias), the integration of multiple haplotypes in pangenomic graphs and respective mapping reduces bias, improves alignment quality, and makes analyses fairer across diverse populations.
12.08.2025 09:04 — 👍 1 🔁 0 💬 1 📌 0
2/n
Pangenome graphs represent variants as alternative paths, so reads can follow the path matching their true sequence.
These graphs can represent SNPs, indels, structural variants, and complex rearrangements, all in one model—boosting variant calling accuracy for common and rare alleles.
12.08.2025 09:04 — 👍 0 🔁 0 💬 1 📌 0
1/n
Because, in bioinformatics, sharing is caring, let me share something I have recently started exploring - graph mapping and pangenome graphs.
A pangenome graph encodes a reference genome built from many genomes in one structure, thus trying to encapsulate the known genetic variability.
12.08.2025 09:04 — 👍 1 🔁 0 💬 1 📌 0
12/12
"It is an ethical imperative that research be designed and analysed to avoid wasting investment of animals, research dollars and effort".
#biostatistics #stats
26.06.2025 08:11 — 👍 1 🔁 0 💬 0 📌 0
11/n
Ultimately, understanding the C&N concepts, design experiments, and analytical approaches accounting for that are fundamental for rigorous and reproducible preclinical research.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
10/n
In a collaborative setting, another key element is a good and clear communication between the experimentalists and the data analysts. Only then can we have sound and robust experiments, with efficient analysis, that can really provide insight into the questions being asked.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
9/n
The analysis must reflect the data hierarchy (e.g., measurements nested within animals, animals within cages). Reporting statistical model details, including random effects and degrees of freedom adjustments, is also highly recommended.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
8/n
For the analysis, researchers can either aggregate data to a single summary per cluster (e.g., mean per cage) - which implies losing some information (the variation within a group can still be relevant) or use mixed or hierarchical models.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
7/n
Additionally, increasing the number of independent groups is generally more beneficial than having more animals per group.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
6/n
For experimental design, when a condition or treatment is applied, the concepts of blocking (e.g. ensuring that each cage have both treatments) and randomisation (e.g. in each cage, the animals that have each treatment is random) are essential.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
5/n
Researchers must account for these effects by considering the group (e.g. cage, litter, etc.) as the experimental unit for power calculations. Experimental design and downstream analyses should take into consideration this group effect as well.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
4/n
Repeated sampling data (repeated measures from the same individual) is also composed of nested observations - This is probably the scenario most easily recognised and accounted for in research.
26.06.2025 08:11 — 👍 2 🔁 0 💬 1 📌 0
3/n
Examples of C&N include "cage effects" (e.g. mice housed in the same cage) , "litter effects" (animals from the same litter, thus sharing genetics or maternal environment), and "in vitro plate effects" (cells from the same plate).
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
2/n
The interdependency of the obtained data points violates the assumptions of standard statistical tests like t-tests or ANOVA. Therefore, ignoring C&N can inflate Type I error rates, leading to false positives, thus undermining the validity of any conclusion obtained.
26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0
Instead of painstakingly dissecting a set of primary data to find novel patterns, it can be more effective to fit an unsupervised cluster-factor-latent-spaces model and then painstakingly dissect the model parameters to find patterns imposed by the model inference.
24.06.2025 12:23 — 👍 28 🔁 1 💬 1 📌 0
Very interesting paper from Soares lab @gimmfoundation.bsky.social
17.06.2025 21:03 — 👍 1 🔁 0 💬 0 📌 0
Reposting this a single time feels too short to show how much I agree with this
28.05.2025 13:45 — 👍 1 🔁 0 💬 0 📌 0
@symbiosisalumni.bsky.social
27.05.2025 21:58 — 👍 1 🔁 0 💬 0 📌 0
Microbiome, metagenomics, ML, and reproductive health. All views are mine. So are all your base
Neuro and Developmental biologist. PostDoc at day. Supervillain at night. He/him #BlackLivesMatter #TransRightsAreHumanRights
@mads100tist@mastodon.social
You don’t need a PhD in statistics or years of coding experience to learn R, the most powerful tool for data analysis and visualization.
https://rfortherestofus.com/
Building personalized Bluesky feeds for academics! Pin Paper Skygest, which serves posts about papers from accounts you're following: https://bsky.app/profile/paper-feed.bsky.social/feed/preprintdigest. By @sjgreenwood.bsky.social and @nkgarg.bsky.social
The central mission of the R Consortium is to support to the R Foundation and key organizations developing, maintaining, distributing R software through the identification, development and implementation of infrastructure projects.
The official account of the UVA School of Data Science: Committed to teaching and practicing responsible data science for the common good.
datascience.virginia.edu
physician-scientist, author, editor
https://www.scripps.edu/faculty/topol/
Ground Truths https://erictopol.substack.com
SUPER AGERS https://www.simonandschuster.com/books/Super-Agers/Eric-Topol/9781668067666
Our mission is to enable the promise of genomics to better human health by creating the world’s most advanced sequencing technologies.
Postdoc at the University of Cambridge. Interested in transposons, evolution, epigenetics, worms and African cichlid fishes.
Cell Press partners with scientists across all disciplines to publish and share work that will inspire future directions in research. #ScienceThatInspires
Proteomics Head @gimmfoundation.bsky.social Former @EMBO.org Postdoctoral Fellow @biozentrum.unibas.ch
Into communicating & visualizing science. Passionate about politics, education & equality.
Lab @ Católica Biomed Research Centre (Lisbon - PT)
using cool tech to discover how parasites interact with blood vessels
silvapereiralab.com
The JCI is a premiere venue for discoveries in basic and clinical biomedical science that will advance the practice of medicine. Est. 1924
Ph.D, stats lover/writer✍🏼, #statistics #scicomm #datascience #statstiktok 👩🏻💻 she/her
Unofficial CRAN updates bot maintained by @chriskenny.bsky.social using R package bskyr https://christophertkenny.com/bskyr/
Biotechnologist, bioinformatician, and PhD. Nextflow ambassador. Napoli & Barcelona. https://lucacozzuto.github.io/
Open-source scientific and technical publishing system brought to you by posit.co.
github.com/quarto-dev/quarto-cli
A @natureportfolio.nature.com journal on mathematical models and computational methods/tools that help advance science in multiple disciplines. https://www.nature.com/natcomputsci