Closed out the inaugural year of the Berkeley AI & Society social with a look at a new statewide police use-of-force + misconduct database built by Berkeley EECS, journalism, law, data science, and others, including at Stanford. ML+ other techniques+sweat pulled of the enormous data cleaning (1/3)
Beautiful nostalgic piece by @adityagp.bsky.social! Don't miss the end for some nice reflections. Aditya's just getting started! :)
Adam!!!!!! So nice to hear from you, and as always, you're too kind :)
Blog Post: Looking back on the first decade as faculty (2014-2024).
I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection.
data-people-group.github.io/blogs/2025/0...
🧵 A new public database, the Police Records Access Project, now makes 1.5 million pages of public records on police use of force and misconduct in California searchable for the first time.
The project was led by @berkeleyjournalism.bsky.social, @ucbids.bsky.social, and Stanford's Big Local News.
“Inside California’s new police records database” - Spend a few minutes watching an interview with Cheryl Phillips (@cephillips.bsky.social) on KTVU Fox 2 (@ktvufox2.bsky.social) to hear more about the new database.
www.ktvu.com/video/1686149
#AI #UCBerkeley #Stanford #localnews
Today, we celebrate the incredible work on the Police Records Access Project – a first-of-its-kind database in the nation. It took years of work by a multidisciplinary team of journalists, data scientists, lawyers and civil rights advocates.
Learn more: bit.ly/4m5uDlu
New database just published by @latimes.com, @sfchronicle.com, @kqednews.kqed.org + CalMatters and built by Berkeley Journalism's IRP, @berkeleyengineer.bsky.social BIDS + @biglocalnews.bsky.social makes public 1.5 million pages of once-secret police records. journalism.berkeley.edu/police-recor...
We're hiring! Working with @mikeolson.mastodon.social.ap.brid.gy and the rest of our @aimatx.bsky.social team has been wonderful, but we have a lot more to accomplish.
If you're interested AI, material science (theory + exp) and user-facing tools, join us!
www.linkedin.com/posts/fperez...
My periodic reminder to my CMU friends to get someone with any inkling of UI design to fix their letter upload portal. I don't know what to pick between "Submit" and "Upload Recommendation", and the dropdown forces me to make a choice even if I don't want to. cc @andypavlo.bsky.social @domoritz.de
If you're at The Curve Conference this weekend, come dive into what the future of applied evals looks like! I'll be demo-ing EvalForge, an implementation of the "who validates the validators" paper with @weightsbiases.bsky.social
Big thanks to @capetorch.bsky.social and Anish for their work here
Thanks to the authors of the paper! @sh-reya.bsky.social @zamfi.bsky.social, Bjorn Hartmann, @adityagp.bsky.social and Ian Arawjo.
Read it if you haven't: arxiv.org/abs/2404.12272
Vol:17 No:12 → Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching
👥 Authors: Joshua Wu, Dixin Tang, Nithin V Chalapathi, Tristan Chambers, Julie Ciccolini, Cheryl Phillips, Lisa Pickoff-White, Aditya ...
📄 PDF: https://www.vldb.org/pvldb/vol17/p4104-tang.pdf
First post here! I'm currently researching how LLMs can be grounded in tables, bc they're a goldmine for fresh domain data.
We've now introduced the 🎯TARGET benchmark for evaluating table retrieval in RAG pipelines, e.g. QA/factver/text2sql: target-benchmark.github.io
Thinking BM25? Think twice..🧵
🎦 Watch the recording of @madelonhulsebos.bsky.social seminar at BIDS!
"There is actually a lot of structured data available on the web.... But really, the task here is that we should be able to retrieve that easily."
#datascience #structureddata #dataretrieval
bids.berkeley.edu/news/madelon...
SIC is so much cooler than SIG, wish they hadn’t changed it