✈️ Headed to @iclr-conf.bsky.social — whether you’ll be there in person or tuning in remotely, I’d love to connect!
We’ll be presenting our paper on pre-training stability in language models and the PolyPythias 🧵
🔗 ArXiv: arxiv.org/abs/2503.09543
🤗 PolyPythias: huggingface.co/collections/...
Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome!
go.bsky.app/NZDc31B
I would like to be added! 😄
Hi, I'd like to be part of this!
👋
💬Panel discussion with Sally Haslanger and Marjolein Lanzing: A philosophical perspective on algorithmic discrimination
Is discrimination the right way to frame the issues of lang tech? Or should we answer deeper rooted questions? And how does tech fit in systems of oppression?
📄Undesirable Biases in NLP: Addressing Challenges of Measurement
We also presented our own work on strategies for testing the validity and reliability of LM bias measures:
www.jair.org/index.php/ja...
🔑Keynote @zeerak.bsky.social: On the promise of equitable machine learning technologies
Can we create equitable ML technologies? Can statistical models faithfully express human language? Or are tokenizers "tokenizing" people—creating a Frankenstein monster of lived experiences?
📄A Capabilities Approach to Studying Bias and Harm in Language Technologies
@hellinanigatu.bsky.social introduced us to the Capabilities Approach and how it can help us better understand the social impact of language technologies—with case studies of failing tech in the Majority World.
📄Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution
Flor Plaza discussed the importance of studying gendered emotional stereotypes in LLMs, and how collaborating with philosophers benefits work on bias evaluation greatly.
🔑Keynote by John Lalor: Should Fairness be a Metric or a Model?
While fairness is often viewed as a metric, using integrated models instead can help with explaining upstream bias, predicting downstream fairness, and capturing intersectional bias.
📄A Decade of Gender Bias in Machine Translation
Eva Vanmassenhove: how has research on gender bias in MT developed over the years? Important issues, like non-binary gender bias, now fortunately get more attention. Yet, fundamental problems (that initially seemed trivial) remain unsolved.
📄MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
Vera Neplenbroek presented a multilingual extension of the BBQ bias benchmark to study bias across English, Dutch, Spanish, and Turkish.
"Multilingual LLMs are not necessarily multicultural!"
🔑Keynote by Dong Nguyen: When LLMs meet language variation: Taking stock and looking forward
Non-standard language is often seen as noisy/incorrect data, but this ignores the reality of language. Variation should play a larger role in LLM developments and sociolinguistics can help!
Last week, we organized the workshop "New Perspectives on Bias and Discrimination in Language Technology" 🤖 @uvahumanities.bsky.social @amsterdamnlp.bsky.social
We're looking back at two inspiring days of talks, posters, and discussions—thanks to everyone who participated!
wai-amsterdam.github.io
This is a friendly reminder that there are 7 days left for submitting your extended abstract to this workshop!
(Since the workshop is non-archival, previously published work is welcome too. So consider submitting previous/future work to join the discussion in Amsterdam!)
This workshop is organized by University of Amsterdam researchers Katrin Schulz, Leendert van Maanen, @wzuidema.bsky.social, Dominik Bachmann, and myself.
More information on the workshop can be found on the website, which will be updated regularly.
wai-amsterdam.github.io
🌟The goal of this workshop is to bring together researchers from different fields to discuss the state of the art on bias measurement and mitigation in language technology and to explore new avenues of approach.
One of the central issues discussed in the context of the societal impact of language technology is that ML systems can contribute to discrimination. Despite efforts to address these issues, we are far from solving them.
We're super excited to host Dong Nguyen, John Lalor, @zeerak.bsky.social and @azjacobs.bsky.social as invited speakers at this workshop! Submit an extended abstract to join the discussions; either in a 20min talk or a poster session.
📝Deadline Call for Abstracts: 15 Sep, 2024
Working on #bias & #discrimination in #NLP? Passionate about integrating insights from different disciplines? And do you want to discuss current limitations of #LLM bias mitigation work? 🤖
👋Join the workshop New Perspectives on Bias and Discrimination in Language Technology 4&5 Nov in #Amsterdam!
release day release day 🥳 OLMo 1b +7b out today and 65b soon...
OLMo accelerates the study of LMs. We release *everything*, from toolkit for creating data (Dolma) to train/inf code
blog blog.allenai.org/olmo-open-la...
olmo paper allenai.org/olmo/olmo-pa...
dolma paper allenai.org/olmo/dolma-p...
But exciting to see more work dedicated to sharing models, checkpoints, and training data to the (research) community!
Don't forget EleutherAI's Pythia, which came out last year! dl.acm.org/doi/10.5555/...
@michahu.bsky.social did an interview laying out our recent paper containing the figures I insist on calling "the mona lisa of training visualizations"
I look forward to debates in the philosophy of (techno)science by people more knowledgeable. I'd say we have some philosophical basis that people are capable of such tasks. But there is also sufficient reason to believe LLM≠human, so any trust in one does not automatically transfer to the other.
That is not to say I am categorically against using LLMs as epistemic tools, but from my own experience as a bias and interpretability researcher I think we should be careful of potential biases/failure modes. If we are transparent about their use and potential issues, I could see LLMs being useful.
I think it all boils down to the reliability+validity of the approach. We don't have good methodologies (yet) to assess these qualities for LLMs compared to simpler more interpretable techniques. And intuitively I think we have more reasons to trust (expert) human annotators—see also psychometrics.
A 🧵thread about strategies for improving social bias evaluations of LMs. #blueskAI 🤖
bsky.app/profile/ovdw...
Special thanks go to Dominik Bachmann (shared first-author) whose insights from the perspective of psychometrics not only helped shape this paper, but also my views of current AI fairness practices more broadly.