AndrΓ© Cruz's Avatar

AndrΓ© Cruz

@andcrz.bsky.social

πŸŽ“ PhD student at the Max Planck Institute for Intelligent Systems πŸ”¬ Safe and robust AI, algorithms and society πŸ”— https://andrefcruz.github.io πŸ“ researcher in πŸ‡©πŸ‡ͺ, from πŸ‡΅πŸ‡Ή

135 Followers  |  451 Following  |  2 Posts  |  Joined: 06.12.2024
Posts Following

Posts by AndrΓ© Cruz (@andcrz.bsky.social)

Meet me at the Benchmarking workshop (sites.google.com/view/benchma...) at EurIPS on Saturday: We’ll present two works on errors in LLM-as-Judge and their impacts on benchmarking and test-time-scaling:

05.12.2025 08:57 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

We (w/ Moritz Hardt, Olawale Salaudeen and
@joavanschoren.bsky.social) are organizing the Workshop on the Science of Benchmarking & Evaluating AI @euripsconf.bsky.social 2025 in Copenhagen!

πŸ“’ Call for Posters: rb.gy/kyid4f
πŸ“… Deadline: Oct 10, 2025 (AoE)
πŸ”— More info: rebrand.ly/bg931sf

22.09.2025 13:45 β€” πŸ‘ 21    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Preview
Tufts says federal authorities detained graduate student The university received reports that an international graduate student was taken into custody from an off-campus apartment Tuesday night, President Sunil Kumar wrote in a letter to the school communit...

ICE kidnapped another student.

This time a grad student at Tufts, Rumeysa Ozturk. Turkish national, in the United States on a student visa. Her lawyer does not know where she is being held.

Rumeysa wrote op-eds criticizing the university's response to student demands on Gaza.

26.03.2025 17:05 β€” πŸ‘ 465    πŸ” 235    πŸ’¬ 14    πŸ“Œ 19
Post image

Welcome to the Bluesky account for Stand Up for Science 2025!

Keep an eye on this space for updates, event information, and ways to get involved. We can't wait to see everyone #standupforscience2025 on March 7th, both in DC and locations nationwide!

#scienceforall #sciencenotsilence

12.02.2025 17:04 β€” πŸ‘ 11497    πŸ” 5433    πŸ’¬ 291    πŸ“Œ 670
Preview
GitHub - socialfoundations/folktexts: Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data! Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data! - socialfoundations/folktexts

The paper is accompanied by a new benchmark package: *Folktexts*. It builds socio-demographic backstories from Census data to evaluate LLM calibration, fairness, and uncertainty estimation.

Package: github.com/socialfounda...
Paper: arxiv.org/pdf/2407.14614

06.02.2025 23:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Spring EconCS 2025 Seminars | EconCS Group

Tomorrow at 1:30pm ET at the Harvard EconCS seminar, I'm presenting our paper on LLMs as risk scorers: We build benchmarks using US Census data & show how miscalibrated LLMs are on real-world tabular data distributions.

πŸ“Harvard SEC LL2.221-open to the public
econcs.seas.harvard.edu/event/spring...

06.02.2025 23:04 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0