Zara Siddique's Avatar

Zara Siddique

@zarasiddique.bsky.social

Working on ethics and bias in NLP @CardiffNLP #NLP #NLProc

143 Followers  |  661 Following  |  32 Posts  |  Joined: 20.11.2024  |  2.2289

Latest posts by zarasiddique.bsky.social on Bluesky

Post image

Loved giving my second tutorial on steering vectors at #CardiffNLPWorkshop Lots of enthusiastic participants! @cardiffnlp.bsky.social

26.07.2025 07:35 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Post image

#CardiffNLPWorkshop off to a flying start with talks from Jennifer Foster and Marianna Apidianaki @cardiffnlp.bsky.social

14.07.2025 10:37 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Pleased to say this has been accepted to ACL System Demos :)

26.05.2025 20:00 โ€” ๐Ÿ‘ 9    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Come to my hackathon! Last one was super fun I promise

21.05.2025 19:40 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Shoutout to supervisors Liam Turner and Luis Espinosa-Anke and @cardiffnlp.bsky.social. I'm also interested in future collaborations on the topic so please message if you are interested :)

14.05.2025 10:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Dialz Tutorial - Zara Siddique - KnitTogether 2025.ipynb Colab notebook

I highly encourage people to play around, you can get started in just a few lines. Here's a Colab notebook:
tinyurl.com/yysmb45c
Note that the results from this Colab won't be the best because it's using a smaller model to reduce loading times. I would recommend using at least a 7B.

14.05.2025 10:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

As part of our validation, we see if we can reduce stereotypicality in outputs from Mistral 7B, using GPT-4o as a judge. There is a notable reduction compared to baselines and prompting, which is cool.

14.05.2025 10:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

For those that are new to the topic, steering vectors are constructed using a set of paired sentences, where one elicits a 'positive' activation of neurons and the other elicits a 'negative' activation of neurons - by taking the difference, we isolate activations responsible for a certain 'concept'.

14.05.2025 10:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿšจ NEW PAPER ALERT ๐Ÿšจ

Dialz: A Python Toolkit for Steering Vectors

ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...

A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.

14.05.2025 10:29 โ€” ๐Ÿ‘ 26    ๐Ÿ” 8    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

New friends! Old friends! Please register if youโ€™d like 2 whole days packed with NLP fun

13.05.2025 16:37 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Super interesting!

03.04.2025 13:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Could AI help us build a more racially just society? | Sanmi Koyejo We have an opportunity to build systems that donโ€™t just replicate our current inequities. Will we take them?

Love this take: "Society appears far more willing to critically examine and address bias in AI systems than confront human bias directly"

27.03.2025 09:47 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Iโ€™d hire you

25.03.2025 18:19 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I am still in need of emergency reviewers for ARR this cycle for the computational social science track, please DM me if you have capacity ๐Ÿ™

25.03.2025 15:18 โ€” ๐Ÿ‘ 3    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Do it! When interviewers ask me about them itโ€™s usually a good sign that itโ€™s a nice workplace.

25.03.2025 18:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that SVE is a powerful and computationally efficient strategy for reducing bias in LLMs, with broader implications for enhancing AI safety.

13.03.2025 11:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender.

13.03.2025 11:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2% over the baseline for Mistral, Llama, and Qwen, respectively.

13.03.2025 11:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes.

13.03.2025 11:44 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematic...

NEW PAPER ๐Ÿ“œ

Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs

ArXiv: arxiv.org/abs/2503.05371
GitHub: github.com/groovychoons...
Extremely Unofficial Blog Post: zarasiddique.com/blog/shiftin...

13.03.2025 11:44 โ€” ๐Ÿ‘ 17    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Strongly encourage you to register for our free NLP workshop, previously had speakers from DeepMind, Microsoft, Amazon and top university NLP labs etc. and itโ€™s looking like itโ€™s going to be a great line up this year too.

If you canโ€™t make it, please share with others who may be interested!

10.03.2025 20:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We've created a Cardiff NLP Starter Pack to make it easy to follow #NLP researchers at Cardiff Uni.

05.03.2025 10:46 โ€” ๐Ÿ‘ 6    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Super interesting work!

03.03.2025 21:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

OpenAI furious DeepSeek might have stolen all the data OpenAI stole from us

๐Ÿ”— www.404media.co/openai-furio...

29.01.2025 15:43 โ€” ๐Ÿ‘ 3723    ๐Ÿ” 944    ๐Ÿ’ฌ 112    ๐Ÿ“Œ 249
Joy Buolamwini and Sam Altman: Unmasking the Future of AI
YouTube video by Commonwealth Club World Affairs (CCWA) Joy Buolamwini and Sam Altman: Unmasking the Future of AI

Severance episode 2, Traitors final AND this

It's a weekend of watching for me ๐Ÿฟ

26.01.2025 15:52 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Need to be spending less time on deepseek and more time on deep sleep ๐Ÿ˜ด

26.01.2025 15:19 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Any thoughts to whether this would extend well to more traditional CompSci courses?

23.01.2025 19:42 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Welcome Wikipedian!

And totally agree.

23.01.2025 19:41 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

+1 I would also like to see this.

23.01.2025 19:39 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our paper on extraction of metaphoric analogies from literary texts will be presented in COLING ( Wed 11:00 , Atrium, poster) by @camachocollados.bsky.social and Luis Espinosa-Anke.

Done with @zarasiddique.bsky.social , @hsuvas.bsky.social and @antypasd.bsky.social

20.01.2025 21:10 โ€” ๐Ÿ‘ 6    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

@zarasiddique is following 20 prominent accounts