Four case studies with the gap between the reality of model use and their sandbox evaluations in audits... Definitely need to take a deeper dive, great presentation by Emily Black!
25.06.2025 08:52 β π 0 π 0 π¬ 0 π 0@prakharg.bsky.social
Four case studies with the gap between the reality of model use and their sandbox evaluations in audits... Definitely need to take a deeper dive, great presentation by Emily Black!
25.06.2025 08:52 β π 0 π 0 π¬ 0 π 0Evaluations in the way the model would be deployed vs evaluations in only controlled unrealistic settings!
25.06.2025 08:52 β π 0 π 0 π¬ 1 π 0Allowing companies to do isolated audits can lead to D-Hacking!! More robust testing is needed...
25.06.2025 08:52 β π 0 π 0 π¬ 1 π 0Legal frameworks tend to have control over allocative decisions (Yes/No outcomes), which fit well with traditional ML systems... But not with GenAI systems
25.06.2025 08:52 β π 0 π 0 π¬ 1 π 0Zollo et al: Towards Effective Discrimination Testing for Generative AI
#FAccT2025
Nuance of stereotype errors is so important to understand their true harms... Insightful presentation by @angelinawang.bsky.social
25.06.2025 08:43 β π 0 π 0 π¬ 0 π 0Women tend to report stereotype-reinforcing errors as more harmful while men tend to report stereotype-violating errors as more harmful...
25.06.2025 08:43 β π 0 π 0 π¬ 1 π 0Some items are more associated with men vs women (not surprising), but not all of them are equally harmful!!
25.06.2025 08:43 β π 0 π 0 π¬ 1 π 0Cognitive beliefs, attitudes and behaviours... Three ways to measure harms ('pragmatic harms')
25.06.2025 08:43 β π 0 π 0 π¬ 1 π 0Are all errors equally harmful? No! Stereotype-reinforcing errors vs stereotype-violating errors
25.06.2025 08:43 β π 0 π 0 π¬ 1 π 0Our understanding of stereotypes sometimes isn't indicative of reality.... they can appear in both directions, or might exist simply without harm
25.06.2025 08:43 β π 0 π 0 π¬ 1 π 0Wang et al: Measuring Machine Learning Harms from Stereotypes Requires Understanding Who Is Harmed by Which Errors in What Ways
#FAccT2025
Clear narrative and a great presentation by Cecilia Panigutti
25.06.2025 08:33 β π 0 π 0 π¬ 0 π 0Risk-measuring studies - Bringing it back to risk measurement, but this time with a clearly defined objective instead of risk-uncovering as before... Not just whether a risk exists, but 'how severe' is it?
25.06.2025 08:33 β π 0 π 0 π¬ 1 π 0Interface-design studies - Focus on UI design elements which impact user interaction
25.06.2025 08:33 β π 0 π 0 π¬ 1 π 0Reverse-engineering studies - Narrower scope and in-depth studies of how algorithms work... Methodological precision in the key!
25.06.2025 08:33 β π 0 π 0 π¬ 1 π 0Risk-uncovering studies - Typical starts from anecdotal evidence and help surface new risks
25.06.2025 08:33 β π 0 π 0 π¬ 1 π 0A review organized not by data collection technique, but by DSA risk management framework categories
25.06.2025 08:33 β π 0 π 0 π¬ 1 π 0Narrative review of algorithmic auditing studies, practical recommendation for best practices, and mapping to DSA obligations...
25.06.2025 08:33 β π 0 π 0 π¬ 1 π 0Panigutti et al: How to investigate algorithmic-driven risks in online platforms and search engines? A narrative review through the lens of the EU Digital Services Act
#FAccT2025
Such a broad topic... Excellent presentation by @feliciajing.bsky.social
25.06.2025 08:22 β π 0 π 0 π¬ 0 π 0Historical methods working alongside many other ways of auditing these models can help us take advantage of the broader scope of historical evaluations....
25.06.2025 08:22 β π 0 π 0 π¬ 1 π 0AI Audits have moved from bottom-up external evaluations to new age 'auditing companies'. While this has increased speed and scale, they have significantly narrowed the scope of auditing.
25.06.2025 08:22 β π 0 π 0 π¬ 1 π 0Why the history of AI assessments? A study through the lens of historical methods can help us understand neglected areas of auditing.
25.06.2025 08:22 β π 0 π 0 π¬ 1 π 0Sandoval and Jing: Historical Methods for AI Evaluations, Assessments, and Audits
#FAccT2025
Important recommendations on standardization of report creation and storage to allow better meta-analysis in the future... Eye opening presentation by @mkgerchick.bsky.social
25.06.2025 08:10 β π 0 π 0 π¬ 0 π 0Applicants impacted by these tools, whose demographic data is missing, are completely removed from these audits!
25.06.2025 08:10 β π 0 π 0 π¬ 1 π 0Serious issues with the data usage... most weird for me: 'simulated test data'!
25.06.2025 08:10 β π 0 π 0 π¬ 1 π 0More than 98% Fortune 500 companies use some form of automated hiring, only about 2% of them have audited these systems!!
25.06.2025 08:10 β π 0 π 0 π¬ 1 π 0NYC LL 144: One of the first enacted laws in the US regulating the use of AI in employment
25.06.2025 08:10 β π 0 π 0 π¬ 1 π 0