André Boler Barros @asbarros

‘Am I redundant?’: how AI changed my career in bioinformatics A run-in with some artefact-laden AI-generated analyses convinced Lei Zhu that machine learning wasn’t making his role irrelevant, but more important than ever.

"(...) the rise of generative AI in bioinformatics has not diminished my role, but redefined it. It has challenged me to become a better scientist. For good or ill, AI seems to be here to stay. I urge you to embrace the technology — not to replace your expertise, but to amplify it.

#bioinfo

28.10.2025 13:07 — 👍 0 🔁 0 💬 0 📌 0

Can't advise this lab more. If you'd like to work in a curious-driven and nurturing environment, with a high focus on robust data analysis, don't even think twice!

#bioinfo #datascience

07.10.2025 11:04 — 👍 2 🔁 0 💬 0 📌 0

Last week, I was fortunate enough to watch a talk from @tkorem.bsky.social , where he presented different things, from addressing inter- study variability on microbiome projects to the use of novel approaches on metagenomics alignment and processing. Interesting and very relevant!

#bioinfo

30.09.2025 12:32 — 👍 3 🔁 0 💬 0 📌 0

Glucostats: an efficient Python library for glucose time series feature extraction and visual analysis - BMC Bioinformatics Background The advancement of technology and continuous glucose monitoring (CGM) systems has introduced several computational and technical challenges for clinicians and researchers. The growing volume of CGM data necessitates the development of efficient computational tools capable of handling and processing this information effectively. This paper introduces GlucoStats, an open-source and multi-processing Python library designed for efficient computation and visualization of a comprehensive set of glucose metrics derived from CGM. It simplifies the traditionally time-consuming and error-prone process of manual CGM metrics calculation, making it a valuable tool for both clinical and research applications. Results Its modular design ensures easy integration into predefined workflows, while its user-friendly interface and extensive documentation make it accessible to a broad audience, including clinicians and researchers. GlucoStats offers several key features: (i) window-based time series analysis, enabling time series division into smaller ‘windows’ for detailed temporal analysis, particularly beneficial for CGM data; (ii) advanced visualization tools, providing intuitive, high-quality visualizations that facilitate pattern recognition, trend analysis, and anomaly detection in CGM data; (iii) parallelization, leveraging parallel computing to efficiently handle large CGM datasets by distributing computations across multiple processors; and (iv) scikit-learn compatibility, adhering to the standardized interface of scikit-learn to allow an easy integration into machine learning pipelines for end-to-end analysis. Conclusions GlucoStats demonstrates high efficiency in processing large-scale medical datasets in minimal time. Its modular design enables easy customization and extension, making it adaptable to diverse research and clinical needs. By offering precise CGM data analysis and user-friendly visualization tools, it serves both technical researchers and non-technical users, such as physicians and patients, with practical and research-driven applications.

"GlucoStats demonstrates high efficiency in processing large-scale medical datasets in minimal time. Its modular design enables easy customization and extension, making it adaptable to diverse research and clinical needs"

bmcbioinformatics.biomedcentral.com/articles/10....

#datascience #biostats

29.09.2025 08:10 — 👍 1 🔁 0 💬 0 📌 0

Learning the natural history of human disease with generative transformers - Nature Delphi-2M forecasts a person’s future health, covering more than 1,000 diseases, provides insights into co-morbidity dynamics and generates synthetic data for the training of AI models that have never...

"Delphi-2M predicts the rates of more than 1,000 diseases (...), with accuracy comparable to that of existing single-disease models. Delphi-2M (...) also enables sampling of synthetic future health trajectories"

www.nature.com/articles/s41...

22.09.2025 16:23 — 👍 1 🔁 0 💬 0 📌 0

Amazing repository with several references and resources for scRNASeq analysis

github.com/crazyhottomm...

#bioinfo #singlecell

11.09.2025 13:54 — 👍 0 🔁 0 💬 0 📌 0

Push-button science - Nature Methods Technological advances change not only what we can learn as scientists, but also how science is conducted. Here we explore how automation and outsourcing are affecting the act of doing science.

"Even if work is done by or with the help of experts (...), it is crucial that researchers understand how a method works, that they can assess data quality, and that they fundamentally understand what types of conclusions can and cannot be drawn from their data"

www.nature.com/articles/s41...

10.09.2025 23:48 — 👍 0 🔁 0 💬 0 📌 0

Six questions to ask before jumping into a spreadsheet Spreadsheet software can be frustrating, but adopting some helpful habits can improve its effectiveness.

Spreadsheets represent an everyday tool for most wet-lab scientists. So, why not use them at their highest potential, efficiently and ready for open science?

This paper provides some recommendations for the use of spreadsheets:

www.nature.com/articles/d41...

#bioinfo #stats

21.08.2025 08:29 — 👍 2 🔁 0 💬 0 📌 0

Harvard University lays off fly database team The layoffs jeopardize this resource, which has served more than 4,000 labs for about three decades.

FlyBase, a Drosophila database, will lose a third of its team in early October because the Harvard grant that covered the employees’ salaries was canceled. Scientists warn that losing FlyBase could devastate fly research.

By @claudia-lopez.bsky.social

www.thetransmitter.org/community/ha...

13.08.2025 19:32 — 👍 123 🔁 129 💬 3 📌 12

A survey of sequence-to-graph mapping algorithms in the pangenome era - Genome Biology A pangenome can reveal the genetic diversity across different individuals simultaneously. It offers a more comprehensive reference for genome analysis compared to a single linear genome that may intro...

5/n

The following paper provides an excellent overview of sequence-to-graph mapping algorithms—covering their principles, trade-offs, and performance benchmarks. It’s a great starting point if you’re interested on the topic

genomebiology.biomedcentral.com/articles/10....

#bioinformatics #bioinfo

12.08.2025 09:04 — 👍 1 🔁 0 💬 0 📌 0

4/n

From equitable precision medicine to large-scale population studies and evolutionary research, pangenomics graphs and graph mapping excel in capturing diversity—especially in highly variable or repetitive genome regions where linear mapping fails.

12.08.2025 09:04 — 👍 0 🔁 0 💬 1 📌 0

3/n

Unlike linear references, which favour the “reference” allele (known as reference bias), the integration of multiple haplotypes in pangenomic graphs and respective mapping reduces bias, improves alignment quality, and makes analyses fairer across diverse populations.

12.08.2025 09:04 — 👍 1 🔁 0 💬 1 📌 0

2/n
Pangenome graphs represent variants as alternative paths, so reads can follow the path matching their true sequence.
These graphs can represent SNPs, indels, structural variants, and complex rearrangements, all in one model—boosting variant calling accuracy for common and rare alleles.

12.08.2025 09:04 — 👍 0 🔁 0 💬 1 📌 0

1/n

Because, in bioinformatics, sharing is caring, let me share something I have recently started exploring - graph mapping and pangenome graphs.

A pangenome graph encodes a reference genome built from many genomes in one structure, thus trying to encapsulate the known genetic variability.

12.08.2025 09:04 — 👍 1 🔁 0 💬 1 📌 0

12/12

"It is an ethical imperative that research be designed and analysed to avoid wasting investment of animals, research dollars and effort".

#biostatistics #stats

26.06.2025 08:11 — 👍 1 🔁 0 💬 0 📌 0

11/n

Ultimately, understanding the C&N concepts, design experiments, and analytical approaches accounting for that are fundamental for rigorous and reproducible preclinical research.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

10/n

In a collaborative setting, another key element is a good and clear communication between the experimentalists and the data analysts. Only then can we have sound and robust experiments, with efficient analysis, that can really provide insight into the questions being asked.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

9/n

The analysis must reflect the data hierarchy (e.g., measurements nested within animals, animals within cages). Reporting statistical model details, including random effects and degrees of freedom adjustments, is also highly recommended.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

8/n

For the analysis, researchers can either aggregate data to a single summary per cluster (e.g., mean per cage) - which implies losing some information (the variation within a group can still be relevant) or use mixed or hierarchical models.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

7/n

Additionally, increasing the number of independent groups is generally more beneficial than having more animals per group.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

6/n

For experimental design, when a condition or treatment is applied, the concepts of blocking (e.g. ensuring that each cage have both treatments) and randomisation (e.g. in each cage, the animals that have each treatment is random) are essential.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

5/n

Researchers must account for these effects by considering the group (e.g. cage, litter, etc.) as the experimental unit for power calculations. Experimental design and downstream analyses should take into consideration this group effect as well.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

4/n

Repeated sampling data (repeated measures from the same individual) is also composed of nested observations - This is probably the scenario most easily recognised and accounted for in research.

26.06.2025 08:11 — 👍 2 🔁 0 💬 1 📌 0

3/n

Examples of C&N include "cage effects" (e.g. mice housed in the same cage) , "litter effects" (animals from the same litter, thus sharing genetics or maternal environment), and "in vitro plate effects" (cells from the same plate).

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

2/n

The interdependency of the obtained data points violates the assumptions of standard statistical tests like t-tests or ANOVA. Therefore, ignoring C&N can inflate Type I error rates, leading to false positives, thus undermining the validity of any conclusion obtained.

26.06.2025 08:11 — 👍 1 🔁 0 💬 1 📌 0

A brief guide to statistical analysis of grouped data in preclinical research - Nature Metabolism Clustering and nesting (C&N) arise in many preclinical studies, such as when animals are group-housed or share litters, or in cell culture. Ignoring C&N undermines the validity of analyses. He...

1/n
Brief guide to statistical analysis of grouped data in preclinical research
www.nature.com/articles/s42...

In preclinical studies, clustering and nesting (C&N) scenarios, such as group-housed animals or cells on a single plate, are frequently found. This has important statistical implications

26.06.2025 08:11 — 👍 2 🔁 0 💬 1 📌 0

Instead of painstakingly dissecting a set of primary data to find novel patterns, it can be more effective to fit an unsupervised cluster-factor-latent-spaces model and then painstakingly dissect the model parameters to find patterns imposed by the model inference.

24.06.2025 12:23 — 👍 28 🔁 1 💬 1 📌 0

Very interesting paper from Soares lab @gimmfoundation.bsky.social

17.06.2025 21:03 — 👍 1 🔁 0 💬 0 📌 0

Reposting this a single time feels too short to show how much I agree with this

28.05.2025 13:45 — 👍 1 🔁 0 💬 0 📌 0

@symbiosisalumni.bsky.social

27.05.2025 21:58 — 👍 1 🔁 0 💬 0 📌 0

André Boler Barros

Latest posts by asbarros.bsky.social on Bluesky

@asbarros is following 20 prominent accounts