Maurício M. @phydev - Bluesky Profile

Apache Spark is one of the most frustrating piece of software I’ve ever used. Why something tailored for big data fails so easily when we try to escalate to … big data?!

06.02.2026 19:56 — 👍 0 🔁 0 💬 0 📌 0

A typical university strategist:

"Our unique selling point is that we are ranked 345 in the world. Nobody else can say that!"

02.02.2026 09:22 — 👍 17 🔁 1 💬 4 📌 2

How AI assistance impacts the formation of coding skills Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

«The AI group averaged 50% on the quiz, compared to 67% in the hand-coding group. The largest gap in scores was on debugging, suggesting that the ability to understand when code is incorrect and why it fails may be a particular area of concern.»

www.anthropic.com/research/AI-...

01.02.2026 10:00 — 👍 0 🔁 0 💬 0 📌 0

I see, that makes sense. I’ll play a little and see what I can get from it. Thanks for the recommendations. 😊

30.01.2026 19:47 — 👍 1 🔁 0 💬 1 📌 0

How do you know that the recommendations are reasonable? I’m planing to run a half marathon, but I’m not an experienced runner so I don’t know if I can trust a training plan from ChatGPT.

29.01.2026 22:38 — 👍 0 🔁 0 💬 1 📌 0

Interesting. Can you give an example?

29.01.2026 20:46 — 👍 0 🔁 0 💬 1 📌 0

A Primer on Large Language Models and Transformers. Part I: A High-Flying Bird's Eye-View Transformer-based LLMs are the most significant technology of the past decade. This is first in a series of posts exploring Transformers at various levels of abstraction, digging deeper with each post

Transformer-based LLMs are the most significant technology of the past decade. This is the first in a series of posts for the WHERE MACHINES THINK Substack, exploring Transformers/LLMs at various levels of abstraction, digging deeper with each post. wheremachinesthink.substack.com/p/a-primer-o...

25.01.2026 15:15 — 👍 8 🔁 2 💬 0 📌 0

Take a look on: HTTPS://ocbe-uio.github.io/trajpy

25.01.2026 09:57 — 👍 0 🔁 0 💬 0 📌 0

Just released a new version of trajpy with a new user interface. The previous GUI was built with tkinter, which has quite old looking design. This version got a full revamp with a web based application built with NiceUi.

To improve code maintainability I moved the frontend to its own repository.

25.01.2026 09:57 — 👍 0 🔁 0 💬 1 📌 0

This is a great package. Is the AI guide a thing now in R packages?

24.01.2026 11:12 — 👍 0 🔁 0 💬 1 📌 0

This semester I taught Spatial Data Science with #rstats Students analyzed areal, geostatistical & point pattern data, creating fantastic projects on disease mapping 🗺️ air pollution 🏭 crime 🚨 & species modeling 🐾

Book freely available:

👉 paulamoraga.com/book-spatial/

09.12.2025 13:31 — 👍 62 🔁 17 💬 2 📌 1

2025 highlights: AI research and code AI is everywhere. Can you see it here? Note Some highlights about my work in 2025: progress on tabular-learning stands out, a publication on unpacking trade-off and consequences of scale in...

My 2025 highlights for AI research and code:
▪ Unpacking the AI scale narrative
▪ Tabular-learning research
- TabICL: table foundation model
- Retrieve merge predict: data lakes
▪ Better software
- Skrub: machine learning with tables
- Fundamentals in scikit-learn
gael-varoquaux.info/science/2025...

02.01.2026 13:45 — 👍 22 🔁 5 💬 0 📌 0

Nice picture! The best view of Paris, because it keeps the ugly Montparnasse building out.. 😅

30.12.2025 22:21 — 👍 1 🔁 0 💬 0 📌 0

Experimental evidence that students are more likely to contest grades when they are delivered by an evaluator with a female-sounding name.

"These findings suggest that women in evaluative positions face disproportionate resistance when delivering negative assessments."

24.12.2025 23:21 — 👍 884 🔁 313 💬 23 📌 49

Evaluation of performance measures in predictive artificial intelligence models to support medical decisions: overview and guidance Numerous measures have been proposed to illustrate the performance of predictive artificial intelligence (AI) models. Selecting appropriate performance measures is essential for predictive AI models i...

Our guidance regarding performance measures for medical AI models is finally out!

- Stop bashing AUROC, although it does not settle things
- Calibration and clinical utility are key
- Show risk distributions
- Classification statistics (e.g. F1) are improper

www.thelancet.com/journals/lan...

13.12.2025 14:03 — 👍 48 🔁 25 💬 2 📌 1

Our didactic review on machine learning for causal inference, now open access:
• identifiability (theory of when the data can answer a causal question)
• machine-learning estimators
• study design (asking well-framed questions + loopholes, eg with timewise data)
www.annualreviews.org/content/jour...

20.08.2025 19:11 — 👍 43 🔁 10 💬 2 📌 0

🖊️AI for health: the impossible necessity of unbiased data

Is unbiased data important to build health AI? Yes!

Can there be unbiased data? No!
Building health on biased data discriminates

The notion of bias depends on the intended use:
gael-varoquaux.info/science/ai-f...

14.02.2025 08:30 — 👍 17 🔁 6 💬 1 📌 0

Adversarial random forests for density estimation and generative modeling We propose methods for density estimation and data synthesis using a novel form of unsupervised random forests. Inspired by generative adversarial networks, we implement a recursive procedure in which...

Side note: I attended a seminar this week about a new method called Adversarial Random Forest, which made me excited. The group that develops the method cares about statistical consistency and they have a paper under review on applying this generative method to imputation.
arxiv.org/abs/2205.09435

09.02.2025 11:19 — 👍 0 🔁 0 💬 0 📌 0

Maurício Moreira-Soares

What could go wrong when we use random forest based imputation methods for classical inference?

With a simple simulation study we show how random forest imputation can have catastrophic effects on classical inference with respect to bias and spurious correlations.

phydev.github.io/posts/ranger...

09.02.2025 10:59 — 👍 0 🔁 0 💬 1 📌 0

Based on this #MICCAI2024 paper, we are currently preparing a new submission with a Bayesian approach to investigate the probability of false claims in medical imaging AI papers. The results are shocking… stay tuned⏰

Great collaboration with @gaelvaroquaux.bsky.social and O. Colliot

30.01.2025 08:01 — 👍 16 🔁 5 💬 0 📌 0

Maurício Moreira-Soares

I wrote a short tutorial on how to run deepseek and other models locally with ollama and open-webui: phydev.github.io/posts/deepse...

30.01.2025 10:44 — 👍 2 🔁 0 💬 1 📌 0

Wrangling string columns for machine learning, the new StringEncoder in @skrub-data.bsky.social gives such a good compute/prediction performance tradeoff.

It's mostly just a bunch of simple tricks, but with well-chosen defaults. This is what we aim for in skrub

skrub-data.org/stable/refer...

28.01.2025 17:47 — 👍 21 🔁 4 💬 0 📌 1

Multiple Imputation for Longitudinal Data: A Tutorial Longitudinal studies are frequently used in medical research and involve collecting repeated measures on individuals over time. Observations from the same individual are invariably correlated and thu....

Hot off the press! 📣📣In this tutorial we illustrate available multiple imputation approaches for handling longitudinal data including when they are clustered within higher level clusters. A reproducible example with R and Stata code provided! #OpenAccess

onlinelibrary.wiley.com/doi/10.1002/...

27.01.2025 04:14 — 👍 151 🔁 64 💬 5 📌 3

Happy to share the first paper of my PhD is published☺️!

In case you like to use class imbalance corrections, maybe it is interesting. Let me know what you think!

onlinelibrary.wiley.com/doi/10.1002/...

Many thanks to @maartenvsmeden.bsky.social, @benvancalster.bsky.social, Anne, Kim and Carl !!

27.01.2025 15:27 — 👍 19 🔁 8 💬 0 📌 0

I’ve been living in Norway for 4.5 years and still in love with this place. Yesterday snowed all day long and this morning the sky is crystal clear with a beautiful yellow moon 🌙 , in contrast with the white snow that paints everything. I wished I had my camera with me - a recurrent thought here.

24.01.2025 06:49 — 👍 2 🔁 0 💬 0 📌 0

Great compilation! I’m so greatful that I found your and Riley’s research years ago, I wish this becomes common knowledge among data scientist outside academia/biostats also. God jul og godt nyttår! 😊

23.12.2024 10:55 — 👍 2 🔁 0 💬 1 📌 0

a countdown clock with the number 10 in the center ALT: a countdown clock with the number 10 in the center

Let us start 2025 in a positive mood: here are 10 methods things researchers can worry *less* about in 2025

23.12.2024 10:36 — 👍 261 🔁 119 💬 15 📌 18

Key question to consider before submitting your paper on the development and validation of your new clinical prediction model is:

WHERE IS THE MODEL????

11.12.2024 14:30 — 👍 80 🔁 13 💬 3 📌 3

Thanks for the clarification!

14.11.2023 16:05 — 👍 1 🔁 0 💬 0 📌 0

Hi Richard. I also recall reading somewhere that your method was intended to be used with up to 30 predictor candidates. Maybe it was mentioned in an early version of pmsampsize? Recently I searched again to be sure and couldn't find this restriction anymore.

14.11.2023 10:38 — 👍 0 🔁 0 💬 1 📌 0

Maurício M.

Latest posts by phydev.bsky.social on Bluesky

@phydev is following 20 prominent accounts