Data Elixir's Avatar

Data Elixir

@dataelixir.com.bsky.social

Data Elixir is a weekly newsletter with curated data science picks from around the web. Subscribe at dataelixir.com and follow us here for selections between issues. Covering machine learning, data visualization, analytics, and strategy.

610 Followers  |  960 Following  |  51 Posts  |  Joined: 15.11.2023  |  1.7889

Latest posts by dataelixir.com on Bluesky

Part 3 is Finished, Part 4 Started โ€“ Applied Predictive Modeling Blog

We've released 4 new chapters of Applied Machine Learning for Tabular Data.

Includes: Bayesian optimization, feature selection, model comparisons, classification metrics, calibration, #rstats computing sections, and more

blog.aml4td.org/posts/2025-0...

25.07.2025 16:53 โ€” ๐Ÿ‘ 49    ๐Ÿ” 9    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Webinar on Responding to NSF Grant Terminations 

Join the Federation of Associations in Behavioral and Brain Sciences (FABBS) on Friday, May 9th, at 2:00 PM ET for an informative webinar on responding to NSF grant terminations. We will discuss the rules, procedures and available options for grantees who have had their grant terminated.

Webinar on Responding to NSF Grant Terminations Join the Federation of Associations in Behavioral and Brain Sciences (FABBS) on Friday, May 9th, at 2:00 PM ET for an informative webinar on responding to NSF grant terminations. We will discuss the rules, procedures and available options for grantees who have had their grant terminated.

๐Ÿงช If you had your NSF grant terminated, I'd *highly* recommend going to this Friday's webinar.

Friday May 9 at 2-3 pm ET

Will talk about both the appeals process, in addition to allowable closeout costs. Share widely.

Register here: us02web.zoom.us/webinar/regi...

05.05.2025 20:29 โ€” ๐Ÿ‘ 220    ๐Ÿ” 247    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 6
Preview
The Beginnerโ€™s RL Playground: A Simple Interactive Website for Grokking Reinforcement Learning Almost a decade ago, I spent a year writing a series of articles teaching the basics of Reinforcement Learning (RL). What was exciting toโ€ฆ

A huge barrier to learning RL has always been the technical setup. This new browser-based RL Playground removes all the friction. You can experiment with Q-Learning, SARSA, and more with just a few clicks. Open source too if you want to tinker.

awjuliani.medium.com/the-beginner...

02.05.2025 18:07 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
AI Index 2025: State of AI in 10 Charts | Stanford HAI Small models get better, regulation moves to the states, and more.

The 2025 AI Index shows a field leveling up fast: small models now rival giants, costs are plummeting, and China is closing the performance gap. AI use in business & healthcare is exploding, but so are harms. Here's a data-packed snapshot of the current state of AI:

hai.stanford.edu/news/ai-inde...

15.04.2025 22:10 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Post image Post image

Ladies, gentlemen, this outstanding piece of #climate information is out.

Welcome to the European State of the Climate 2024 #ESOTC2024

Explore it, bookmark it, come back to it and then come back again, because thereโ€™s more in it than you can imagine.

climate.copernicus.eu/esotc/2024

15.04.2025 05:02 โ€” ๐Ÿ‘ 91    ๐Ÿ” 43    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 2

This was actually my longest podcast ever at over 70 minutes. Not sure I could have made it any shorter because nerding out on databases with Andy Pavlo was too fun.

19.03.2025 05:06 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿ†• I'm excited to share our recently completed work on "Overcoming data challenges to measure whole-person health in electronic health records," for which Joe Rigdon and I received one of the first intercampus collaborative grants from Wake Forest University and Wake Forest School of Medicine.

14.03.2025 19:57 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...

10.03.2025 18:14 โ€” ๐Ÿ‘ 506    ๐Ÿ” 290    ๐Ÿ’ฌ 23    ๐Ÿ“Œ 76
Preview
Polls

538-style poll collection has now been recreated by a small group of fans and former staff. all fully transparent and public now. return of live public aggregation is imminent

docs.google.com/spreadsheets...

07.03.2025 18:52 โ€” ๐Ÿ‘ 1591    ๐Ÿ” 372    ๐Ÿ’ฌ 49    ๐Ÿ“Œ 40

Just a reminder that all the data FiveThirtyEight collectedโ€”polls, election results, and much moreโ€”is available for download (for now) on our GitHub page. github.com/fivethirtyei...

05.03.2025 19:38 โ€” ๐Ÿ‘ 314    ๐Ÿ” 82    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 4
Preview
FiveThirtyEight is shutting down as part of broader cuts at ABC and Disney Though Nate Silver left in 2023, FiveThirtyEight still offered election forecasts, a presidential approval tracker, and other tools.

FiveThirtyEight is shutting down. www.niemanlab.org/2025/03/five...

06.03.2025 14:24 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
The State of Machine Learning Competitions | ML Contests We summarise the state of the ML competitions landscape and analyse the hundreds of competitions that took place in 2024. Plus an overview of winning solutions and commentary on techniques used.

๐Ÿ† ML Competitions are booming!

400+ competitions last year
$22M total prize money
Innovative winning solutions

Check out this breakdown of the ML competition landscape: mlcontests.com/state-of-mac...

#MachineLearning

05.03.2025 18:15 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Two panel figure.
Left panel: Hypotetical scenario: same latent gender gap.
It's a density curve of a latent life satisfaction variable. There are thresholds that determine how the latent underlying variable affects responses on an ordinal scale. Theoretically, it's possible that the same latent difference results in very different observed differences. Here illustrated with a gender gap that remains of the same size on the latent variable. In 2010, it translates into both genders reporting a 4. In 2023, everybody is less satisfied, and the same gender gap translates into a large observed difference (female reports a 2, male reports a 4)

Right panel: Empirical result: Different latent gender gap. Same structure on the left but now actual empirical results are presented. It turns out that according to the model fit, the latent difference in mean satisfaction between the genders was only 0.28 in 2010 but 0.61 in 2023.

Two panel figure. Left panel: Hypotetical scenario: same latent gender gap. It's a density curve of a latent life satisfaction variable. There are thresholds that determine how the latent underlying variable affects responses on an ordinal scale. Theoretically, it's possible that the same latent difference results in very different observed differences. Here illustrated with a gender gap that remains of the same size on the latent variable. In 2010, it translates into both genders reporting a 4. In 2023, everybody is less satisfied, and the same gender gap translates into a large observed difference (female reports a 2, male reports a 4) Right panel: Empirical result: Different latent gender gap. Same structure on the left but now actual empirical results are presented. It turns out that according to the model fit, the latent difference in mean satisfaction between the genders was only 0.28 in 2010 but 0.61 in 2023.

Having some innocent fun with ordinal models today ๐Ÿงฎ

13.02.2025 12:45 โ€” ๐Ÿ‘ 92    ๐Ÿ” 10    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 3
Preview
A Tale of Six States: Flexible data extraction with scraping and browser automation | Emily Riederer Exploring how Playwrightโ€˜s headless browser automation (and its friends) can help unite the statesโ€™ data

For anyone backing up public data of interest, here are a few short examples of diff tools like scrapping, fetching, browser automation, and OCR

(Some may not run as they reference 2022 websites)

www.emilyriederer.com/post/states-...

07.02.2025 16:13 โ€” ๐Ÿ‘ 34    ๐Ÿ” 14    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

Harvard's Library Innovation Lab Team announces the Data.gov Archive

They released archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov.

09.02.2025 21:34 โ€” ๐Ÿ‘ 45    ๐Ÿ” 15    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I tried something! Let me know if this works for you.

This is a handbuilt list of journalists, academics, industry figures, activists et al on Bluesky who are focused on tech โ€” especially online platforms, social media and AI. Click "Pin to Home" if you find it worthwhile: bsky.app/profile/will...

07.02.2025 15:41 โ€” ๐Ÿ‘ 152    ๐Ÿ” 32    ๐Ÿ’ฌ 13    ๐Ÿ“Œ 3
Preview
Matching, missing data, a quasi-experiment, and causal inference--Oh my! | A. Solomon Kurz I'm finally dipping my does into causal inference for quasi-experiments, and my first use case has missing data. In this post we practice propensity score matching with multiply-imputed data sets, and...

Diving into causal analysis with non-randomized groups? Learn how matching and imputation in R can tackle missing data challenges. Perfect for those looking to refine their statistical toolkit!

solomonkurz.netlify.app/blog/2025-02...

#CausalEffect #rstats #DataScience

05.02.2025 23:22 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
USAspending.gov

If youโ€™re in the US and youโ€™d like to know what projects and vital services federal grants currently fund in your state, you can search here: www.usaspending.gov

And you can find the contact information for your elected representatives here: www.usa.gov/elected-offi...

They need to hear from you.

28.01.2025 04:59 โ€” ๐Ÿ‘ 5736    ๐Ÿ” 3444    ๐Ÿ’ฌ 152    ๐Ÿ“Œ 133
A screenshot of the blog post

A screenshot of the blog post

A screenshot of the blog post

A screenshot of the blog post

A screenshot of the blog post

A screenshot of the blog post

๐ŸŒ Excited to share the 1st edition of Geocomputation with Python is complete! ๐Ÿ“–
Learn geospatial with Python.

Read it online: https://buff.ly/3NK2uBq
Get the book: https://buff.ly/42t4dD7
Blog post: https://buff.ly/42s5fiN

Open-source + community-driven!

#geocompx #geopython #GIS #SpatialData

27.01.2025 15:00 โ€” ๐Ÿ‘ 17    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

As someone who has reported on AI for 7 years and covered China tech as well, I think the biggest lesson to be drawn from DeepSeek is the huge cracks it illustrates with the current dominant paradigm of AI development. A long thread. 1/

27.01.2025 14:12 โ€” ๐Ÿ‘ 6235    ๐Ÿ” 2397    ๐Ÿ’ฌ 223    ๐Ÿ“Œ 736
Preview
Where does the U.S. Government Keep its Data? | Sam Tyner-Monroe, Ph.D. Non-exhaustive list of R Package Sources for getting government data: federalregister ropengov censusapi censusr tidycensus elevater Principal Statistical Agency Programs Inspiration and primary so...

๐Ÿ›๏ธ TIL: The U.S. Gov't statistical system is HUGE! 13 main agencies + 115 programs tracking everything from population to agriculture. Most offer public APIs too! ๐Ÿ“Š

sctyner.me/post/2018-11...

#OpenData #DataScience

28.01.2025 04:21 โ€” ๐Ÿ‘ 6    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1 Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

huggingface is doing a fully open source replication of R1 github.com/huggingface/...

25.01.2025 14:31 โ€” ๐Ÿ‘ 124    ๐Ÿ” 29    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 4
The Rproj File โ€“ Positron

We've added an article about RStudio's Rproj files and how to adapt related workflows, if you're starting to kick the tires on Positron. If this interests you, check it out ๐Ÿ‘€

positron.posit.co/rstudio-rpro...

#rstats #rstudio #positron

22.01.2025 17:43 โ€” ๐Ÿ‘ 136    ๐Ÿ” 32    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 5
Preview
Rachel Thomas, PhD - The Missing Medical Data Holding Back AI an AI researcher going back to school for immunology

Trump's early actions show the messy relationship between data and AI. Freezing health data sharing while boosting AI investment? ๐Ÿค” Read how Missing Data Sets and AlphaFold teach us about data's role in AI. rachel.fast.ai/posts/2025-0...

27.01.2025 21:51 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Cell and text formatting is everywhere letโ€™s work with it in R

Ever thought about how spreadsheets use formatting? Luis D. Verde Arregoitia's study shows 62% of them do, with blue as the top color. Find out why this matters for R and AI. Dive in!

luisdva.github.io/rstats/fun-w...

#DataScience

27.01.2025 20:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Post image

{tinytable} 0.7.0 for #RStats is out! ๐Ÿš€

This ๐Ÿ“ฆ converts data frames to html, tex, docx, typ, or md tables. Super simple, ultra flexible, 0-dep, and the website hosts a billion tutorials.

vincentarelbundock.github.io/tinytable/

0.7.0 fixes bugs and adds some cool features. Please update!

26.01.2025 13:30 โ€” ๐Ÿ‘ 194    ๐Ÿ” 46    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1
Preview
10 Ways to Work with Large Files in Python: Effortlessly Handle Gigabytes of Data! Handling large text files in Python can feel overwhelming. When files grow into gigabytes, attempting to load them into memory all at onceโ€ฆ

Got gigabytes of data? ๐Ÿ

Aleksei Aleinikov shows how Python can handle it effortlessly with smart techniques. Check it out!

blog.devgenius.io/10-ways-to-w...

#Python #DataHandling

22.01.2025 19:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Chat with Large Language Models Chat with large language models from a range of providers including Claude <https://claude.ai>, OpenAI <https://chatgpt.com>, and more. Supports streaming, asynchronous calls, tool calling, and struct...

ellmer (formerly known as elmer) is now on CRAN! ellmer makes it easy to chat with LLM models from a variety of providers and includes support for streaming responses, tool calling and structured data extraction: ellmer.tidyverse.org #rstats

09.01.2025 18:13 โ€” ๐Ÿ‘ 140    ๐Ÿ” 49    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 1
PDF table made with LaTeX and tabularray with {tinytable}

PDF table made with LaTeX and tabularray with {tinytable}

R code for making the table

R code for making the table

#rstats and #QuartoPub PSA: @vincentab.bsky.social's {tinytable} is the absolute best table making package out there for LaTeX output (it natively supports tabularray!), and it's phenomenal for HTML. It has fully replaced {gt} and {kableExtra} for me vincentarelbundock.github.io/tinytable/

11.01.2025 20:38 โ€” ๐Ÿ‘ 178    ๐Ÿ” 31    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 4
Map showing the number of times a location has burned in SoCal and which kind of fire it was: one burning under Santa Ana winds, or one that was a summer, fuel-driven fire without Santa Ana winds. Malibu area had burned at least 8 times from 1900-2017, and has burned twice now since. Image from Kolden and Abatzoglou (2018): https://www.mdpi.com/2571-6255/1/2/19

Map showing the number of times a location has burned in SoCal and which kind of fire it was: one burning under Santa Ana winds, or one that was a summer, fuel-driven fire without Santa Ana winds. Malibu area had burned at least 8 times from 1900-2017, and has burned twice now since. Image from Kolden and Abatzoglou (2018): https://www.mdpi.com/2571-6255/1/2/19

Here's the reality about the #LAFires this week: this isn't the first time ANY of these places have burned. Not even close. In 2018, we mapped CA fire history to look at fire frequency across SoCal. Santa Monica Mtns area burns more than anywhere else -- up to once per decade in a given spot. ๐Ÿงต

09.01.2025 17:33 โ€” ๐Ÿ‘ 1664    ๐Ÿ” 695    ๐Ÿ’ฌ 60    ๐Ÿ“Œ 164

@dataelixir.com is following 20 prominent accounts