Data Elixir @dataelixir.com

Part 3 is Finished, Part 4 Started – Applied Predictive Modeling Blog

We've released 4 new chapters of Applied Machine Learning for Tabular Data.

Includes: Bayesian optimization, feature selection, model comparisons, classification metrics, calibration, #rstats computing sections, and more

blog.aml4td.org/posts/2025-0...

25.07.2025 16:53 — 👍 49 🔁 9 💬 2 📌 0

Webinar on Responding to NSF Grant Terminations Join the Federation of Associations in Behavioral and Brain Sciences (FABBS) on Friday, May 9th, at 2:00 PM ET for an informative webinar on responding to NSF grant terminations. We will discuss the rules, procedures and available options for grantees who have had their grant terminated.

🧪 If you had your NSF grant terminated, I'd *highly* recommend going to this Friday's webinar.

Friday May 9 at 2-3 pm ET

Will talk about both the appeals process, in addition to allowable closeout costs. Share widely.

Register here: us02web.zoom.us/webinar/regi...

05.05.2025 20:29 — 👍 220 🔁 247 💬 2 📌 6

The Beginner’s RL Playground: A Simple Interactive Website for Grokking Reinforcement Learning Almost a decade ago, I spent a year writing a series of articles teaching the basics of Reinforcement Learning (RL). What was exciting to…

A huge barrier to learning RL has always been the technical setup. This new browser-based RL Playground removes all the friction. You can experiment with Q-Learning, SARSA, and more with just a few clicks. Open source too if you want to tinker.

awjuliani.medium.com/the-beginner...

02.05.2025 18:07 — 👍 0 🔁 0 💬 0 📌 0

AI Index 2025: State of AI in 10 Charts | Stanford HAI Small models get better, regulation moves to the states, and more.

The 2025 AI Index shows a field leveling up fast: small models now rival giants, costs are plummeting, and China is closing the performance gap. AI use in business & healthcare is exploding, but so are harms. Here's a data-packed snapshot of the current state of AI:

hai.stanford.edu/news/ai-inde...

15.04.2025 22:10 — 👍 1 🔁 0 💬 0 📌 0

Ladies, gentlemen, this outstanding piece of #climate information is out.

Welcome to the European State of the Climate 2024 #ESOTC2024

Explore it, bookmark it, come back to it and then come back again, because there’s more in it than you can imagine.

climate.copernicus.eu/esotc/2024

15.04.2025 05:02 — 👍 91 🔁 43 💬 6 📌 2

This was actually my longest podcast ever at over 70 minutes. Not sure I could have made it any shorter because nerding out on databases with Andy Pavlo was too fun.

19.03.2025 05:06 — 👍 3 🔁 1 💬 0 📌 0

🆕 I'm excited to share our recently completed work on "Overcoming data challenges to measure whole-person health in electronic health records," for which Joe Rigdon and I received one of the first intercampus collaborative grants from Wake Forest University and Wake Forest School of Medicine.

14.03.2025 19:57 — 👍 7 🔁 2 💬 1 📌 0

*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...

10.03.2025 18:14 — 👍 506 🔁 290 💬 23 📌 76

Polls

538-style poll collection has now been recreated by a small group of fans and former staff. all fully transparent and public now. return of live public aggregation is imminent

docs.google.com/spreadsheets...

07.03.2025 18:52 — 👍 1591 🔁 372 💬 49 📌 40

Just a reminder that all the data FiveThirtyEight collected—polls, election results, and much more—is available for download (for now) on our GitHub page. github.com/fivethirtyei...

05.03.2025 19:38 — 👍 314 🔁 82 💬 7 📌 4

FiveThirtyEight is shutting down as part of broader cuts at ABC and Disney Though Nate Silver left in 2023, FiveThirtyEight still offered election forecasts, a presidential approval tracker, and other tools.

FiveThirtyEight is shutting down. www.niemanlab.org/2025/03/five...

06.03.2025 14:24 — 👍 0 🔁 0 💬 0 📌 0

The State of Machine Learning Competitions | ML Contests We summarise the state of the ML competitions landscape and analyse the hundreds of competitions that took place in 2024. Plus an overview of winning solutions and commentary on techniques used.

🏆 ML Competitions are booming!

400+ competitions last year
$22M total prize money
Innovative winning solutions

Check out this breakdown of the ML competition landscape: mlcontests.com/state-of-mac...

#MachineLearning

05.03.2025 18:15 — 👍 3 🔁 0 💬 0 📌 1

Two panel figure. Left panel: Hypotetical scenario: same latent gender gap. It's a density curve of a latent life satisfaction variable. There are thresholds that determine how the latent underlying variable affects responses on an ordinal scale. Theoretically, it's possible that the same latent difference results in very different observed differences. Here illustrated with a gender gap that remains of the same size on the latent variable. In 2010, it translates into both genders reporting a 4. In 2023, everybody is less satisfied, and the same gender gap translates into a large observed difference (female reports a 2, male reports a 4) Right panel: Empirical result: Different latent gender gap. Same structure on the left but now actual empirical results are presented. It turns out that according to the model fit, the latent difference in mean satisfaction between the genders was only 0.28 in 2010 but 0.61 in 2023.

Having some innocent fun with ordinal models today 🧮

13.02.2025 12:45 — 👍 92 🔁 10 💬 8 📌 3

A Tale of Six States: Flexible data extraction with scraping and browser automation | Emily Riederer Exploring how Playwright‘s headless browser automation (and its friends) can help unite the states’ data

For anyone backing up public data of interest, here are a few short examples of diff tools like scrapping, fetching, browser automation, and OCR

(Some may not run as they reference 2022 websites)

www.emilyriederer.com/post/states-...

07.02.2025 16:13 — 👍 34 🔁 14 💬 0 📌 1

Harvard's Library Innovation Lab Team announces the Data.gov Archive

They released archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov.

09.02.2025 21:34 — 👍 45 🔁 15 💬 1 📌 1

I tried something! Let me know if this works for you.

This is a handbuilt list of journalists, academics, industry figures, activists et al on Bluesky who are focused on tech — especially online platforms, social media and AI. Click "Pin to Home" if you find it worthwhile: bsky.app/profile/will...

07.02.2025 15:41 — 👍 152 🔁 32 💬 13 📌 3

Matching, missing data, a quasi-experiment, and causal inference--Oh my! | A. Solomon Kurz I'm finally dipping my does into causal inference for quasi-experiments, and my first use case has missing data. In this post we practice propensity score matching with multiply-imputed data sets, and...

Diving into causal analysis with non-randomized groups? Learn how matching and imputation in R can tackle missing data challenges. Perfect for those looking to refine their statistical toolkit!

solomonkurz.netlify.app/blog/2025-02...

#CausalEffect #rstats #DataScience

05.02.2025 23:22 — 👍 4 🔁 1 💬 0 📌 0

USAspending.gov

If you’re in the US and you’d like to know what projects and vital services federal grants currently fund in your state, you can search here: www.usaspending.gov

And you can find the contact information for your elected representatives here: www.usa.gov/elected-offi...

They need to hear from you.

28.01.2025 04:59 — 👍 5736 🔁 3444 💬 152 📌 133

A screenshot of the blog post

🌍 Excited to share the 1st edition of Geocomputation with Python is complete! 📖
Learn geospatial with Python.

Read it online: https://buff.ly/3NK2uBq
Get the book: https://buff.ly/42t4dD7
Blog post: https://buff.ly/42s5fiN

Open-source + community-driven!

#geocompx #geopython #GIS #SpatialData

27.01.2025 15:00 — 👍 17 🔁 5 💬 0 📌 1

As someone who has reported on AI for 7 years and covered China tech as well, I think the biggest lesson to be drawn from DeepSeek is the huge cracks it illustrates with the current dominant paradigm of AI development. A long thread. 1/

27.01.2025 14:12 — 👍 6235 🔁 2397 💬 223 📌 736

Where does the U.S. Government Keep its Data? | Sam Tyner-Monroe, Ph.D. Non-exhaustive list of R Package Sources for getting government data: federalregister ropengov censusapi censusr tidycensus elevater Principal Statistical Agency Programs Inspiration and primary so...

🏛️ TIL: The U.S. Gov't statistical system is HUGE! 13 main agencies + 115 programs tracking everything from population to agriculture. Most offer public APIs too! 📊

sctyner.me/post/2018-11...

#OpenData #DataScience

28.01.2025 04:21 — 👍 6 🔁 3 💬 0 📌 0

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1 Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

huggingface is doing a fully open source replication of R1 github.com/huggingface/...

25.01.2025 14:31 — 👍 124 🔁 29 💬 4 📌 4

The Rproj File – Positron

We've added an article about RStudio's Rproj files and how to adapt related workflows, if you're starting to kick the tires on Positron. If this interests you, check it out 👀

positron.posit.co/rstudio-rpro...

#rstats #rstudio #positron

22.01.2025 17:43 — 👍 136 🔁 32 💬 4 📌 5

Rachel Thomas, PhD - The Missing Medical Data Holding Back AI an AI researcher going back to school for immunology

Trump's early actions show the messy relationship between data and AI. Freezing health data sharing while boosting AI investment? 🤔 Read how Missing Data Sets and AlphaFold teach us about data's role in AI. rachel.fast.ai/posts/2025-0...

27.01.2025 21:51 — 👍 2 🔁 0 💬 0 📌 0

Cell and text formatting is everywhere let’s work with it in R

Ever thought about how spreadsheets use formatting? Luis D. Verde Arregoitia's study shows 62% of them do, with blue as the top color. Find out why this matters for R and AI. Dive in!

luisdva.github.io/rstats/fun-w...

#DataScience

27.01.2025 20:12 — 👍 1 🔁 0 💬 0 📌 0

{tinytable} 0.7.0 for #RStats is out! 🚀

This 📦 converts data frames to html, tex, docx, typ, or md tables. Super simple, ultra flexible, 0-dep, and the website hosts a billion tutorials.

vincentarelbundock.github.io/tinytable/

0.7.0 fixes bugs and adds some cool features. Please update!

26.01.2025 13:30 — 👍 194 🔁 46 💬 4 📌 1

10 Ways to Work with Large Files in Python: Effortlessly Handle Gigabytes of Data! Handling large text files in Python can feel overwhelming. When files grow into gigabytes, attempting to load them into memory all at once…

Got gigabytes of data? 🐍

Aleksei Aleinikov shows how Python can handle it effortlessly with smart techniques. Check it out!

blog.devgenius.io/10-ways-to-w...

#Python #DataHandling

22.01.2025 19:00 — 👍 2 🔁 0 💬 1 📌 0

Chat with Large Language Models Chat with large language models from a range of providers including Claude <https://claude.ai>, OpenAI <https://chatgpt.com>, and more. Supports streaming, asynchronous calls, tool calling, and struct...

ellmer (formerly known as elmer) is now on CRAN! ellmer makes it easy to chat with LLM models from a variety of providers and includes support for streaming responses, tool calling and structured data extraction: ellmer.tidyverse.org #rstats

09.01.2025 18:13 — 👍 140 🔁 49 💬 6 📌 1

PDF table made with LaTeX and tabularray with {tinytable}

R code for making the table

#rstats and #QuartoPub PSA: @vincentab.bsky.social's {tinytable} is the absolute best table making package out there for LaTeX output (it natively supports tabularray!), and it's phenomenal for HTML. It has fully replaced {gt} and {kableExtra} for me vincentarelbundock.github.io/tinytable/

11.01.2025 20:38 — 👍 178 🔁 31 💬 9 📌 4

Map showing the number of times a location has burned in SoCal and which kind of fire it was: one burning under Santa Ana winds, or one that was a summer, fuel-driven fire without Santa Ana winds. Malibu area had burned at least 8 times from 1900-2017, and has burned twice now since. Image from Kolden and Abatzoglou (2018): https://www.mdpi.com/2571-6255/1/2/19

Here's the reality about the #LAFires this week: this isn't the first time ANY of these places have burned. Not even close. In 2018, we mapped CA fire history to look at fire frequency across SoCal. Santa Monica Mtns area burns more than anywhere else -- up to once per decade in a given spot. 🧵

09.01.2025 17:33 — 👍 1664 🔁 695 💬 60 📌 164

Data Elixir

Latest posts by dataelixir.com on Bluesky

@dataelixir.com is following 20 prominent accounts