's Avatar

@datascienceweekly.bsky.social

26 Followers  |  210 Following  |  4 Posts  |  Joined: 13.11.2024  |  2.1942

Latest posts by datascienceweekly.bsky.social on Bluesky

Preview
AWS deleted my 10-year account and all data without warning After 10 years as an AWS customer and open-source contributor, they deleted my account and all data with zero warning. Here's how AWS's 'verification' process became a digital execution, and why you s...

AWS Deleted all data despite redundancy, backup, dead manโ€™s switch. This is why you need to keep all your data offline. Never trust hosting company for your backups. www.seuros.com/blog/aws-del...

06.08.2025 02:48 โ€” ๐Ÿ‘ 115    ๐Ÿ” 48    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 9
Preview
Building notes: Linear regression viz in 3D Listening to Chappell Roanโ€™s new single โ€œSubwayโ€ on an hour-long loopโ€”canโ€™t tell if Iโ€™m crying because of the song or the XQuartz installation error.

A fool's errand? Making a 3D ggplot? I had Chappell Roan's newest single on loop and tortured myself into getting something complicated to work when I found an easier solution #dataviz #ggplot2 #rstats

05.08.2025 01:23 โ€” ๐Ÿ‘ 9    ๐Ÿ” 4    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Post image Post image Post image Post image

Check out "Scaling the r-spatial ecosystem" by Dewey Dunnington ๐ŸŒ๐Ÿ“ฆ
An exploration of how Rโ€™s spatial tools can be used for big(ger) data.

Video: youtu.be/tjNEoIYr_ag?...
Slides: dewey.dunnington.ca/slides/rspat...

#RStats #rspatial #GIS #SpatialData

06.08.2025 14:03 โ€” ๐Ÿ‘ 15    ๐Ÿ” 11    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image Post image Post image

happy august! I did a little data visualization project this week trying to see what genres of books I've read and liked the most so far in 2025. (if you, like me, are a big fan of scifi + fantasy + dark academia, please give me book recs!)

๐Ÿ”—๐Ÿ“š: github.com/jkaminags/bo...

#ggplot2 #dataviz #Rstats

02.08.2025 19:17 โ€” ๐Ÿ‘ 25    ๐Ÿ” 6    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 1
Preview
Switching my website from Hugo to Quarto โ€“ Nicola Rennie Having recently converted my personal website from Hugo to Quarto, this blog post explains the motivation behind it, the things that were tricky to switch, and some tips if youโ€™re thinking about doing...

I've written a short blog post about the process of switching my personal website from Hugo to Quarto, some of the tricky things, and advice for those thinking about doing the same!

Link: nrennie.rbind.io/blog/hugo-qu...

#RStats #QuartoPub

29.07.2025 15:12 โ€” ๐Ÿ‘ 38    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Video thumbnail

Introducing the Data Explorer in Positron!

Quickly view raw data files (CSV, Parquet, etc.) or dataframes from your #Python / #RStats sessions with a data grid, summary panel, and filter bar.

Learn more: positron.posit.co/data-explore... #Positron

30.07.2025 18:39 โ€” ๐Ÿ‘ 90    ๐Ÿ” 15    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2
Preview
Air, an extremely fast R formatter We are thrilled to announce Air, a new R formatter.

Okay, I saw @libbyheeren.bsky.social mention Air recently but did not realize what it was until today's Data Science Hangout. Looks very cool for formatting code!

www.tidyverse.org/blog/2025/02...

31.07.2025 16:16 โ€” ๐Ÿ‘ 18    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Animated Maps with {ggplot2} and {gganimate} In this blog we are creating an animated map of the gapminder data using {ggplot2} and {gganimate}. In the process we will cover some of the common pitfalls when working with spatial data and how to get round them!

In our latest blog post, our Data Scientist Osheen MacOscar shares a great overview of how to use {ggplot2} and {gganimate}, when dealing with spatial data.

#rstats #datavis #spatialdata #maps

31.07.2025 10:41 โ€” ๐Ÿ‘ 16    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Wrote a new, modern stats curriculum.

Teach about probability and sampling via computational examples / simulations with real data. It's unbelievably helpful for intuition. Everything else follows.

Online and open-source: jrudoler-teaching.github.io/understandin...

31.07.2025 20:20 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

And if you canโ€™t join, raise a toast and share her classic How To Name Files (github.com/jennybc/how-...) with all your teams so she doesnโ€™t have to SET YOUR COMPUTER(S) ON FIRE ๐Ÿ”ฅ.

31.07.2025 00:53 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - cwida/FastLanes: Next-Gen Big Data File Format Next-Gen Big Data File Format. Contribute to cwida/FastLanes development by creating an account on GitHub.

New data oriented file format just dropped.

FastLanes, "like Parquet, but with 40% better compression and 40ร— faster decoding". ๐Ÿ‘€

Seems it can exploit correlations between columns and have fully SIMD friendly encodings to help with vectorization.

github.com/cwida/FastLa...

24.07.2025 15:12 โ€” ๐Ÿ‘ 73    ๐Ÿ” 16    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2
Preview
Revisiting Moneyball Data, sports, payrolls, and memes

I have finally published my post about Moneyball, as promised (a long time ago). If you're interested in baseball, numbers, or movies, please take a look. I'd love to know what you think.

cc @matsonj.com @alexnoonan.bsky.social

djpardis.medium.com/revisiting-m...

24.07.2025 16:58 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Screenshot of the text of the linked blogpost 1/4

Screenshot of the text of the linked blogpost 1/4

Screenshot of the text of the linked blogpost 2/4

Screenshot of the text of the linked blogpost 2/4

Screenshot of the text of the linked blogpost 3/4

Screenshot of the text of the linked blogpost 3/4

Screenshot of the text of the linked blogpost 4/4

Screenshot of the text of the linked blogpost 4/4

~~ making sense of academic statistics ~~

i wrote about the confusing relationship between statistics and data analysis, and also about how statistics relates to science

#statistics #rstats #datascience

www.alexpghayes.com/post/making-...

15.07.2025 20:15 โ€” ๐Ÿ‘ 109    ๐Ÿ” 19    ๐Ÿ’ฌ 14    ๐Ÿ“Œ 8
Preview
Importing Data with Python Importing data is a key step in the data science workflow. Here we compare data import for two key Python data-frame libraries - Polars and Pandas.

Importing data is a key step in the data science workflow. In our latest Python blog post, we compare this process for two key libraries - Polars and Pandas - emphasising how to convert to the correct data-type and why you should validate the structure and content of the imported data.

#rstats

17.07.2025 10:24 โ€” ๐Ÿ‘ 5    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
even if it's fake it's real - sippey.com Michael Sippey's blog, published semi-regularly since 1995.

sippey.com/2010/11/even... this post was so influential in my understanding of how to think of internet culture, and it's held up so well in everything except the optimistic positivity

10.07.2025 13:03 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Designing the Tools that Shape Data Science โ€“ with Dr Hadley Wickham โ€“ The Random Sample

To listen, just search for The Random Sample wherever you get your podcasts, or head to our website: www.therandomsample.com.au/podcast/hadl...
#rstats #rstudio #datascience #statistics #opensource #programming #rstudio @robjhyndman.com @posit.co

02.06.2025 02:15 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Discovering cognitive strategies with tiny recurrent neural networks - Nature Modelling biological decision-making with tiny recurrent neural networks enables more accurate predictions of animal choices than classical cognitive models and offers insights into the underlying cog...

This looks awesome!
www.nature.com/articles/s41...

03.07.2025 01:03 โ€” ๐Ÿ‘ 15    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Writing a basic Linux device driver when you know nothing about Linux drivers or USB

i've always been curious about how to write a Linux USB device driver and this blog post looks like a great intro: crescentro.se/posts/writin...

26.06.2025 19:08 โ€” ๐Ÿ‘ 184    ๐Ÿ” 14    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 1

I wrote a blog post to celebrate 10 years of loo package ๐ŸŽ‰ (R package implementing fast Pareto smoothed importance sampling cross-validation and many other useful methods for cross-validation)

26.06.2025 11:01 โ€” ๐Ÿ‘ 68    ๐Ÿ” 15    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

This week's #ChemSciPicks comes from @graemeday.bsky.social (University of Southampton), @ffmmgg.bsky.socialโ€ฌ, Chengxi Zhaoโ€ฌ, Xenphon Evangelopoulos, and @aicooper.bsky.socialโ€ฌ (University of Liverpool).

Read the full paper here: doi.org/10.1039/D5SC...

#ChemSky

18.06.2025 09:00 โ€” ๐Ÿ‘ 7    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Pitfalls of premature closure with LLM assisted coding When LLM models generates clean, professional-looking code, it's tempting to stop exploring alternatives. But therein lies the risks that comes with premature closure. So what is premature closure?

New to me is the term "premature closure", where you too quickly latch on to the first solution you see. Always a danger in coding, but particularly so today when LLMs can give you a plausible fix so so quickly.

www.shayon.dev/post/2025/16...

18.06.2025 14:17 โ€” ๐Ÿ‘ 99    ๐Ÿ” 15    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 4

โ€œ71-75: Extremely grossโ€

That tracks

19.06.2025 02:30 โ€” ๐Ÿ‘ 15    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Really nice interactives from @rospearce.bsky.social in this deep dive into Glassdoor company-review data

www.economist.com/interactive/...

18.06.2025 21:08 โ€” ๐Ÿ‘ 31    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Writing Manually (In Times of AI-generated Content) In times of [[AI Writing]], writing manually is more important than ever. As with many [[Generative AI|AI Generated]] texts, Iโ€™d rather see the prompts, it would have more soul and character than theโ€ฆ

Is writing a manual like driving a car manually, instead of automatically?

In my experience with AI-writing, every time I use it for a bigger task (restructuring or telling me the missing chapters), it does things I don't like, most importantly, distracting me from my actual task: writing.

19.06.2025 07:59 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
R Package Quality: Validation and beyond! Not all R packages are clearly โ€œgoodโ€ or โ€œriskyโ€, most fall somewhere in between. This post introduces a scoring framework to help users assess package quality, based on documentation, code, maintenance, and popularity. We also share key principles to ensure the scores are useful, fair, and adaptable to different contexts.

At Jumping Rivers, we've developed a scoring framework to help users assess R package quality.

Our latest blog post, "R Package Quality: Validation and Beyond!", walks through this new framework and shares guiding principles that ensure the scores are fair, flexible, and context-aware.

#rstats #R

19.06.2025 13:15 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Scalable Lakehouse Architecture with Iceberg & Polaris: A Battle-tested Playbook
YouTube video by Apache Iceberg Scalable Lakehouse Architecture with Iceberg & Polaris: A Battle-tested Playbook

Scalable Lakehouse Architectures with Iceberg and Polaris!

Simon from Tactile shared insights on tackling bottlenecks in data loading using Apache Iceberg and our open-source dlt library.

youtu.be/gb5fwIO4pX0?...

#databs #iceberg

12.06.2025 07:15 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

I wrote a short blog post about our experimental work on the logic of guesses:
xphi.net/2025/06/03/e...

11.06.2025 15:25 โ€” ๐Ÿ‘ 13    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
GitHub - ryancdotorg/freq: Like `sort | uniq -c | sort -rn` but better Like `sort | uniq -c | sort -rn` but better . Contribute to ryancdotorg/freq development by creating an account on GitHub.

There are now binary builds of my data analysis tool `freq` available for Linux, MacOS and Windows, and I've added a few more features.

It's great for ad-hoc log file analysis.

github.com/ryancdotorg/...

04.06.2025 19:02 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

I think the thing I'm most excited to see over the next ~10 years of #dataviz is web-based content that interweaves long-form text and modular interactives.

Not as heavy as scrollytelling and not as aimless as a dashboard, but something in between.

This is what I was going for with the QR project!

04.06.2025 14:46 โ€” ๐Ÿ‘ 37    ๐Ÿ” 7    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Preview
Deep Dive into Apache Iceberg with Flink CDC - tech.kakao.com ์ด ๊ธ€์€ ๋ฅผ ์˜์–ด๋กœ ์ž‘์„ฑํ•œ ๋ฌธ์„œ์ž…๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ฒˆ์—ญ๋ณธ ๋ณด๊ธฐ: ๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด: h...

Some excellent deep-dive blogs from Kakao on their use of Flink CDC and Iceberg:

tech.kakao.com/posts...
tech.kakao.com/posts...
tech.kakao.com/posts...

28.05.2025 17:32 โ€” ๐Ÿ‘ 11    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@datascienceweekly is following 20 prominent accounts