@prakharg - Bluesky Profile

Four case studies with the gap between the reality of model use and their sandbox evaluations in audits... Definitely need to take a deeper dive, great presentation by Emily Black!

25.06.2025 08:52 — 👍 0 🔁 0 💬 0 📌 0

Evaluations in the way the model would be deployed vs evaluations in only controlled unrealistic settings!

25.06.2025 08:52 — 👍 0 🔁 0 💬 1 📌 0

Allowing companies to do isolated audits can lead to D-Hacking!! More robust testing is needed...

25.06.2025 08:52 — 👍 0 🔁 0 💬 1 📌 0

Legal frameworks tend to have control over allocative decisions (Yes/No outcomes), which fit well with traditional ML systems... But not with GenAI systems

25.06.2025 08:52 — 👍 0 🔁 0 💬 1 📌 0

Zollo et al: Towards Effective Discrimination Testing for Generative AI
#FAccT2025

25.06.2025 08:43 — 👍 1 🔁 0 💬 1 📌 0

Nuance of stereotype errors is so important to understand their true harms... Insightful presentation by @angelinawang.bsky.social

25.06.2025 08:43 — 👍 0 🔁 0 💬 0 📌 0

Women tend to report stereotype-reinforcing errors as more harmful while men tend to report stereotype-violating errors as more harmful...

25.06.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

Some items are more associated with men vs women (not surprising), but not all of them are equally harmful!!

25.06.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

Cognitive beliefs, attitudes and behaviours... Three ways to measure harms ('pragmatic harms')

25.06.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

Are all errors equally harmful? No! Stereotype-reinforcing errors vs stereotype-violating errors

25.06.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

Our understanding of stereotypes sometimes isn't indicative of reality.... they can appear in both directions, or might exist simply without harm

25.06.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

Wang et al: Measuring Machine Learning Harms from Stereotypes Requires Understanding Who Is Harmed by Which Errors in What Ways
#FAccT2025

25.06.2025 08:34 — 👍 1 🔁 0 💬 1 📌 0

Clear narrative and a great presentation by Cecilia Panigutti

25.06.2025 08:33 — 👍 0 🔁 0 💬 0 📌 0

Risk-measuring studies - Bringing it back to risk measurement, but this time with a clearly defined objective instead of risk-uncovering as before... Not just whether a risk exists, but 'how severe' is it?

25.06.2025 08:33 — 👍 0 🔁 0 💬 1 📌 0

Interface-design studies - Focus on UI design elements which impact user interaction

25.06.2025 08:33 — 👍 0 🔁 0 💬 1 📌 0

Reverse-engineering studies - Narrower scope and in-depth studies of how algorithms work... Methodological precision in the key!

25.06.2025 08:33 — 👍 0 🔁 0 💬 1 📌 0

Risk-uncovering studies - Typical starts from anecdotal evidence and help surface new risks

25.06.2025 08:33 — 👍 0 🔁 0 💬 1 📌 0

A review organized not by data collection technique, but by DSA risk management framework categories

25.06.2025 08:33 — 👍 0 🔁 0 💬 1 📌 0

Narrative review of algorithmic auditing studies, practical recommendation for best practices, and mapping to DSA obligations...

25.06.2025 08:33 — 👍 0 🔁 0 💬 1 📌 0

Panigutti et al: How to investigate algorithmic-driven risks in online platforms and search engines? A narrative review through the lens of the EU Digital Services Act
#FAccT2025

25.06.2025 08:22 — 👍 0 🔁 0 💬 1 📌 0

Such a broad topic... Excellent presentation by @feliciajing.bsky.social

25.06.2025 08:22 — 👍 0 🔁 0 💬 0 📌 0

Historical methods working alongside many other ways of auditing these models can help us take advantage of the broader scope of historical evaluations....

25.06.2025 08:22 — 👍 0 🔁 0 💬 1 📌 0

AI Audits have moved from bottom-up external evaluations to new age 'auditing companies'. While this has increased speed and scale, they have significantly narrowed the scope of auditing.

25.06.2025 08:22 — 👍 0 🔁 0 💬 1 📌 0

Why the history of AI assessments? A study through the lens of historical methods can help us understand neglected areas of auditing.

25.06.2025 08:22 — 👍 0 🔁 0 💬 1 📌 0

Sandoval and Jing: Historical Methods for AI Evaluations, Assessments, and Audits
#FAccT2025

25.06.2025 08:10 — 👍 0 🔁 0 💬 1 📌 0

Important recommendations on standardization of report creation and storage to allow better meta-analysis in the future... Eye opening presentation by @mkgerchick.bsky.social

25.06.2025 08:10 — 👍 0 🔁 0 💬 0 📌 0

Applicants impacted by these tools, whose demographic data is missing, are completely removed from these audits!

25.06.2025 08:10 — 👍 0 🔁 0 💬 1 📌 0

Serious issues with the data usage... most weird for me: 'simulated test data'!

25.06.2025 08:10 — 👍 0 🔁 0 💬 1 📌 0

More than 98% Fortune 500 companies use some form of automated hiring, only about 2% of them have audited these systems!!

25.06.2025 08:10 — 👍 0 🔁 0 💬 1 📌 0

NYC LL 144: One of the first enacted laws in the US regulating the use of AI in employment

25.06.2025 08:10 — 👍 0 🔁 0 💬 1 📌 0

Latest posts by prakharg.bsky.social on Bluesky

@prakharg is following 20 prominent accounts