André Cruz andcrz - Bluesky Statics

Meet me at the Benchmarking workshop (sites.google.com/view/benchma...) at EurIPS on Saturday: We’ll present two works on errors in LLM-as-Judge and their impacts on benchmarking and test-time-scaling:

05.12.2025 08:57 — 👍 7 🔁 3 💬 1 📌 0

We (w/ Moritz Hardt, Olawale Salaudeen and
@joavanschoren.bsky.social) are organizing the Workshop on the Science of Benchmarking & Evaluating AI @euripsconf.bsky.social 2025 in Copenhagen!

📢 Call for Posters: rb.gy/kyid4f
📅 Deadline: Oct 10, 2025 (AoE)
🔗 More info: rebrand.ly/bg931sf

22.09.2025 13:45 — 👍 21 🔁 7 💬 1 📌 0

Tufts says federal authorities detained graduate student The university received reports that an international graduate student was taken into custody from an off-campus apartment Tuesday night, President Sunil Kumar wrote in a letter to the school communit...

ICE kidnapped another student.

This time a grad student at Tufts, Rumeysa Ozturk. Turkish national, in the United States on a student visa. Her lawyer does not know where she is being held.

Rumeysa wrote op-eds criticizing the university's response to student demands on Gaza.

26.03.2025 17:05 — 👍 465 🔁 235 💬 14 📌 19

Welcome to the Bluesky account for Stand Up for Science 2025!

Keep an eye on this space for updates, event information, and ways to get involved. We can't wait to see everyone #standupforscience2025 on March 7th, both in DC and locations nationwide!

#scienceforall #sciencenotsilence

12.02.2025 17:04 — 👍 11497 🔁 5433 💬 291 📌 670

GitHub - socialfoundations/folktexts: Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data! Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data! - socialfoundations/folktexts

The paper is accompanied by a new benchmark package: *Folktexts*. It builds socio-demographic backstories from Census data to evaluate LLM calibration, fairness, and uncertainty estimation.

Package: github.com/socialfounda...
Paper: arxiv.org/pdf/2407.14614

06.02.2025 23:10 — 👍 1 🔁 0 💬 0 📌 0

Spring EconCS 2025 Seminars | EconCS Group

Tomorrow at 1:30pm ET at the Harvard EconCS seminar, I'm presenting our paper on LLMs as risk scorers: We build benchmarks using US Census data & show how miscalibrated LLMs are on real-world tabular data distributions.

📍Harvard SEC LL2.221-open to the public
econcs.seas.harvard.edu/event/spring...

06.02.2025 23:04 — 👍 2 🔁 1 💬 1 📌 0

Posts by André Cruz (@andcrz.bsky.social)