Lucas Stich (@lucasstich) — Bluesky Profile

8 months ago

🚨Free data alert!! 🚨 Please share.

Large new dataset of Amazon product reviews, including full text and photos and product characteristics, with individual *reviews labeled as fake reviews*.

I believe this is the first publicly available data of this kind.

github.com/bretthollenb...

128 42 1 2

9 months ago

Designing better out-of-the-box histograms Given how common histograms are in BI tools, you might think they’re easy to design. Think again. We share challenges we encountered, and how we handled them, while designing better out-of-the-box…

Histograms are incredibly useful, interpretable, and common in BI. But building histograms that work well out of the box — no matter the data — is trickier than it sounds. We share some of the challenges faced, and decisions made, when designing histograms for Observable Canvases:

26 7 0 1

10 months ago

What cyborg work looks like as an academic, by Robert Ghrist, a mathematician and associate dean of undergraduate education at the University of Pennsylvania.

88 18 6 1

11 months ago

Model to Meaning: How to interpret statistical models with marginaleffects for R and Python

📚😅🎉

Yay!! I just submitted the complete manuscript of my upcoming book to the publisher!

Learn to easily and clearly interpret (almost) any stats model w/ R or Python. Simple ideas, consistent workflow, powerful tools, detailed case studies.

Read it for free @ marginaleffects.com

#RStats #PyData

592 147 21 9

1 year ago

We streamlined six new DID-like estimators and created this tutorial for implementation in R.
yiqingxu.org/packages/fec...

Hope you no longer need to spend months figuring out what these estimators are and how to use them.

327 97 6 6

1 year ago

People Pay for the Right to Bid — and Then Overbid - UCLA Anderson Review Bidders sacrifice a better price to avoid ending up with nothing

People Pay for the Right to Bid — and Then Overbid anderson-review.ucla.edu/people-pay-f...

5 2 0 0

1 year ago

Designing monochrome data visualisations | Nicola Rennie In data visualisations, colours are often used to show values or categories of data. However, sometimes you might not be able to or want to use colour. This blog post discusses some tips for designing...

🚨 New blog post! 🚨

If you want to learn about:

🎨 Monochrome colour palettes
📊 Designing better black & white visualisations
🛠️ Rethinking single-colour chart design

Read this ➡️ nrennie.rbind.io/blog/monochr...

#RStats #DataViz #ggplot2 #RLadies

141 41 9 3

1 year ago

The Rproj File – Positron

We've added an article about RStudio's Rproj files and how to adapt related workflows, if you're starting to kick the tires on Positron. If this interests you, check it out 👀

positron.posit.co/rstudio-rpro...

#rstats #rstudio #positron

134 31 3 4

1 year ago

📊 vs. 🥧

I made a tiny teaching tool to help me interactively demo + share differences between 📊 and 🥧

Play: I find that tinkering with data + visuals in class reinforces understanding far more than slides or readings

Save + share: Copy the url to link the current data

👉 barvpie.netlify.app

90 23 6 2

1 year ago

My PhD syllabus for Introduction to Quantitative Marketing @rotmanschool. Updated for 2025. Comments welcome.

Feel free to suggest additional papers. Self promotion encouraged! All University of Toronto PhD students welcome to audit. Please get in touch.

23 6 2 2

1 year ago

The enmity paradox - Scientific Reports Scientific Reports - The enmity paradox

In 24,678 people in 176 rural Honduras villages, we found that villagers have an average of 6.89 (SD 3.79) friends, and these friends have 8.40 (SD 2.52) friends.

Villagers have an average of 1.26 (SD 1.70) enemies, and these enemies have 3.40 (SD 2.11) enemies.

www.nature.com/articles/s41... 7/

12 4 1 0

1 year ago

Mirrored histogram showing “weird” parts of the population: treated people who were unlikely to be treated, and untreated people who were likely to be treated

Mirrored histogram showing pseudo-populations of treated and untreated people that have been reweighted to be more comparable and unconfounded

Table showing potential and realized outcomes for 9 simulated people

Before we calculate these different treatment effects with the realized outcomes instead of the hypothetical potential outcomes, let's look really quick at the practical difference between the true ATE, AT 1, and ATU. All three estimands are useful for policymaking!
The ATE is -15, implying that mosquito nets cause a 15 point reduction in malaria risk for every person in the country. This includes people who live at high elevations where mosquitoes don't live, people who live near mosquito-infested swamps, people who are rich enough to buy Bill Gates's mosquito laser, and people who can't afford a net but would really like to use one. If we worked in the Ministry of Health and wanted to know if we should make a new national program that gave everyone a free bed net, the overall reduction in risk is -15, which is probably pretty good!
The ATT is -16.29, which is bigger than the ATE. The effect of net usage is bigger for people who are already using the nets. This is because of underlying systematic reasons, or selection bias. Those using nets want to use them because they need them more or can access them more easily-they might live in areas more prone to mosquitoes, or they can afford to buy their own nets, or something else. They know themselves and understand some notion of their personal individual causal effect and seek out nets. If we removed access to their nets, it would have a strong effect.
The ATU is -13.63, which is smaller than the ATE. The effect of net usage is smaller for people who aren't using the nets. Again, this is because of selection bias. Those not using nets are likely not using them for systematic reasons-they live far away from mosquitoes, they've received a future malaria vaccine, they have some other form of mosquito abatement, or something else. Because they can read their own minds, they know that mosquito net use won't do much for them personally, so they don't seek out nets. If we expanded access to nets to them, they wouldn't benefit

From the archives: Have you (like me!) wondered what the ATT means and how it's different from average treatment effects? I use #rstats to explore why we care about (and how to calculate) the ATE, ATT, and ATU #polisky #episky #econsky www.andrewheiss.com/blog/2024/03...

205 44 8 5

1 year ago

Foursquare Open Source Places: A new foundational dataset for the geospatial community I did not expect this! > [...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer …

Foursquare just open sourced their 100 million place point of interest dataset! Some notes on poking around with it using DuckDB (it's Parquet files on S3) simonwillison.net/2024/Nov/20/...

459 113 23 16

1 year ago

A few things I've been working on lately:

elmer, elmer.tidyverse.org, is a new package to make it easier to work with LLMs (hosted and local) from #rstats. It includes helps for structured data extraction and tool calling, and an easy way to upload a plot. Joint work with Joe Cheng.

229 55 10 5