David Selby's Avatar

David Selby

@davidselby.bsky.social

Data science researcher working on applications of machine learning in health at DFKI, getting the most out of small data. Reproducible #Rstats evangelist and unofficial British cultural ambassador to Rhineland-Palatinate πŸ‡©πŸ‡ͺ https://selbydavid.com

68 Followers  |  192 Following  |  14 Posts  |  Joined: 15.12.2024  |  1.9822

Latest posts by davidselby.bsky.social on Bluesky

Announcing a new special guest edition in Digital Health! Everything on patient-reported outcomes in mHealth, including algorithmic treatment and measurement allocation, N-of-1 trials, preference learning & handling subjective measurements. πŸ“²

Take a look!

journals.sagepub.com/topic/collec...

10.10.2025 14:06 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
How many patients could we save with LLM priors? Imagine a world where clinical trials need far fewer patients to achieve the same statistical power, thanks to the knowledge encoded in large language models (LLMs). We present a novel framework for h...

New preprint: how many patients could we save with LLM priors? Exploring the effect of eliciting informative priors for Bayesian clinical trials. arxiv.org/abs/2509.04250

05.09.2025 07:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Automated Visualization Makeovers with LLMs Making a good graphic that accurately and efficiently conveys the desired message to the audience is both an art and a science, typically not taught in the data science curriculum. Visualisation makeo...

πŸ“ŠπŸ“‰πŸ“ˆ Better data visualizations with AI: can LLMs provide constructive critiques on existing charts? We explore how generative AI can automate #MakeoverMonday -type exercises, suggesting improvements to existing charts.

πŸ“„ New preprint + benchmark dataset πŸ’½

arxiv.org/abs/2508.05637

19.08.2025 06:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often...

🧬BioDisco, an open-source biomedical hypothesis generator, uses agentic LLMs, knowledge graphs and literature search, with an iterative self-evaluation loop to discover novel relations, significantly outperforming other architectures.

Preprint: arxiv.org/abs/2508.01285

05.08.2025 09:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - Selbosh/aaai2026-quarto: Unofficial Quarto template for the AAAI-2026 Conference Unofficial Quarto template for the AAAI-2026 Conference - Selbosh/aaai2026-quarto

New: unofficial @quarto.org template for the upcoming @realaaai.bsky.social 2026 conference. Write your submission in Markdown with reproducible, inline computations!

github.com/Selbosh/aaai...

29.07.2025 14:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Frontiers | Visible neural networks for multi-omics integration: a critical review BackgroundBiomarker discovery and drug response prediction are central to personalized medicine, driving demand for predictive models that also offer biologi...

What is a "Visible Neural Network"? It's a new kind of deep learning model for multi-omics, where prior knowledge and interpretability are baked into the architecture.

πŸ“„ We reviewed dozens of models, datasets & applications, and call for better tools/benchmarks:

www.frontiersin.org/journals/art...

21.07.2025 12:36 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Health Research From Home Hackathon 2025 7-9 May 2025

Health Research From Home Hackathon 2025 |
This hackathon is being held by Health Research From Home Partnership led by the @OfficialUoM. Register your interest now: health-research-from-home.github.io/DataAnalysis...

06.03.2025 21:47 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models Large language models (LLMs) have been extensively studied for their ability to generate convincing natural language sequences; however, their utility for quantitative information retrieval is less w...

Just published: 'Had enough of experts? Quantitative retrieval from large language models'

Can LLMs, having read the scientific literature, offer us useful numerical info to help fill in missing data and fit statistical models, like a real human expert? We investigate:

doi.org/10.1002/sta4...

17.03.2025 08:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Learning to Denglisch At the railway station, a lost-looking US soldier asked me if I spoke English. Do I? At times it feels like it, but the Germans keep me guessing. Since moving to Germany, I have been continually teste...

New blog post: on all the English I have had to learn since moving to Germany πŸ‡¬πŸ‡§ πŸ‡©πŸ‡ͺ

selbydavid.com/2025/03/13/d...

13.03.2025 17:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

New blog post: Alternatives to @overleaf.com for #rstats, reproducible writing and collaboration

selbydavid.com/2025/03/04/o...

06.03.2025 18:55 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Beyond the black box with biologically informed neural networks Nature Reviews Genetics, Published online: 04 March 2025; doi:10.1038/s41576-025-00826-1Biologically informed neural networks promise to lead to more explainable, data-driven discoveries in genomics, drug development and precision medicine. Selby et al.…

New online! Beyond the black box with biologically informed neural networks

04.03.2025 13:16 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1
Preview
Beyond the black box with biologically informed neural networks - Nature Reviews Genetics Biologically informed neural networks promise to lead to more explainable, data-driven discoveries in genomics, drug development and precision medicine. Selby et al. highlight emerging opportunities, ...

Thrilled to share our latest publication in @natrevgenet.bsky.social. We explore how deep learning models infused with prior knowledgeβ€”biologically-informed neural networks or BINNsβ€”offer better predictive accuracy and interpretability in multi-omics data analysis. www.nature.com/articles/s41...

04.03.2025 15:12 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Preview
Had enough of experts? Quantitative knowledge retrieval from large language models Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less w...

Paper just accepted in Stat!

Can LLMs replace experts as sources of numerical information, such as Bayesian prior distributions for statistical models, or filling in missing values in tabular datasets for ML tasks?

We evaluate on applications across different fields.

arxiv.org/abs/2402.07770

20.02.2025 07:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Plot showing stacked bar plots with error bars to visualize normalized ROC AUC of different machine learning models, before and after fine-tuning for four hours. The main insight is that TabPFN, a tabular foundation model, outperforms tree-based methods such as random forests and XGBoost.

Plot showing stacked bar plots with error bars to visualize normalized ROC AUC of different machine learning models, before and after fine-tuning for four hours. The main insight is that TabPFN, a tabular foundation model, outperforms tree-based methods such as random forests and XGBoost.

How might one redesign this data visualization to avoid using much-maligned 'plunger plots'?

#visualisation

From www.nature.com/articles/s41...

10.01.2025 06:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Sebastian Vollmer, David Selby and Yuichiro Iwashita present their poster, "Had enough of experts? Bayesian prior elicitation from Large Language Models" at the NeurIPS Bayesian Decisionmaking and Uncertainty Workshop 2024 in Vancouver, Canada.

Sebastian Vollmer, David Selby and Yuichiro Iwashita present their poster, "Had enough of experts? Bayesian prior elicitation from Large Language Models" at the NeurIPS Bayesian Decisionmaking and Uncertainty Workshop 2024 in Vancouver, Canada.

Sebastian Vollmer, David Selby and Yuichiro Iwashita present their poster, "Had enough of experts? Bayesian prior elicitation from Large Language Models" at the NeurIPS Bayesian Decisionmaking and Uncertainty Workshop 2024 in Vancouver, Canada.

Sebastian Vollmer, David Selby and Yuichiro Iwashita present their poster, "Had enough of experts? Bayesian prior elicitation from Large Language Models" at the NeurIPS Bayesian Decisionmaking and Uncertainty Workshop 2024 in Vancouver, Canada.

Pleased to present our poster at #NeurIPS2024 workshop on Bayesian Decisionmaking and Uncertainty! πŸŽ‰ Our work explores using large language models for eliciting expert-informed Bayesian priors. Elicited lots of discussion with the ML community too! Check it out: neurips.cc/virtual/2024...

20.12.2024 12:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Visible neural networks for multi-omics integration: a critical review Biomarker discovery and drug response prediction is central to personalized medicine, driving demand for predictive models that also offer biological insights. Biologically informed neural networks (B...

Excited to share our new preprint: Visible neural networks for multi-omics integration: a critical review! 🌟 We systematically analyse 86 studies on biologically informed neural networks (BINNs/VNNs), highlighting trends, challenges, interesting ideas & opportunities. www.biorxiv.org/content/10.1...

20.12.2024 12:20 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@davidselby is following 20 prominent accounts