Zara Siddique @zarasiddique

Did it.. work?!

23.10.2025 19:12 — 👍 0 🔁 0 💬 1 📌 0

Loved giving my second tutorial on steering vectors at #CardiffNLPWorkshop Lots of enthusiastic participants! @cardiffnlp.bsky.social

26.07.2025 07:35 — 👍 3 🔁 2 💬 0 📌 0

#CardiffNLPWorkshop off to a flying start with talks from Jennifer Foster and Marianna Apidianaki @cardiffnlp.bsky.social

14.07.2025 10:37 — 👍 3 🔁 2 💬 0 📌 0

Pleased to say this has been accepted to ACL System Demos :)

26.05.2025 20:00 — 👍 10 🔁 1 💬 0 📌 0

Come to my hackathon! Last one was super fun I promise

21.05.2025 19:40 — 👍 3 🔁 2 💬 0 📌 0

Shoutout to supervisors Liam Turner and Luis Espinosa-Anke and @cardiffnlp.bsky.social. I'm also interested in future collaborations on the topic so please message if you are interested :)

14.05.2025 10:29 — 👍 0 🔁 0 💬 0 📌 0

Dialz Tutorial - Zara Siddique - KnitTogether 2025.ipynb Colab notebook

I highly encourage people to play around, you can get started in just a few lines. Here's a Colab notebook:
tinyurl.com/yysmb45c
Note that the results from this Colab won't be the best because it's using a smaller model to reduce loading times. I would recommend using at least a 7B.

14.05.2025 10:29 — 👍 0 🔁 0 💬 1 📌 0

As part of our validation, we see if we can reduce stereotypicality in outputs from Mistral 7B, using GPT-4o as a judge. There is a notable reduction compared to baselines and prompting, which is cool.

14.05.2025 10:29 — 👍 0 🔁 0 💬 1 📌 0

For those that are new to the topic, steering vectors are constructed using a set of paired sentences, where one elicits a 'positive' activation of neurons and the other elicits a 'negative' activation of neurons - by taking the difference, we isolate activations responsible for a certain 'concept'.

14.05.2025 10:29 — 👍 1 🔁 0 💬 1 📌 0

🚨 NEW PAPER ALERT 🚨

Dialz: A Python Toolkit for Steering Vectors

ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...

A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.

14.05.2025 10:29 — 👍 26 🔁 8 💬 2 📌 2

New friends! Old friends! Please register if you’d like 2 whole days packed with NLP fun

13.05.2025 16:37 — 👍 2 🔁 2 💬 0 📌 0

Super interesting!

03.04.2025 13:56 — 👍 1 🔁 0 💬 0 📌 0

Could AI help us build a more racially just society? | Sanmi Koyejo We have an opportunity to build systems that don’t just replicate our current inequities. Will we take them?

Love this take: "Society appears far more willing to critically examine and address bias in AI systems than confront human bias directly"

27.03.2025 09:47 — 👍 1 🔁 0 💬 0 📌 0

I’d hire you

25.03.2025 18:19 — 👍 1 🔁 0 💬 1 📌 0

I am still in need of emergency reviewers for ARR this cycle for the computational social science track, please DM me if you have capacity 🙏

25.03.2025 15:18 — 👍 3 🔁 6 💬 0 📌 0

Do it! When interviewers ask me about them it’s usually a good sign that it’s a nice workplace.

25.03.2025 18:14 — 👍 1 🔁 0 💬 1 📌 0

The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that SVE is a powerful and computationally efficient strategy for reducing bias in LLMs, with broader implications for enhancing AI safety.

13.03.2025 11:44 — 👍 1 🔁 0 💬 0 📌 0

Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender.

13.03.2025 11:44 — 👍 1 🔁 0 💬 1 📌 0

When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2% over the baseline for Mistral, Llama, and Qwen, respectively.

13.03.2025 11:44 — 👍 1 🔁 0 💬 1 📌 0

We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes.

13.03.2025 11:44 — 👍 3 🔁 0 💬 1 📌 0

Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematic...

NEW PAPER 📜

Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs

ArXiv: arxiv.org/abs/2503.05371
GitHub: github.com/groovychoons...
Extremely Unofficial Blog Post: zarasiddique.com/blog/shiftin...

13.03.2025 11:44 — 👍 17 🔁 5 💬 1 📌 0

Strongly encourage you to register for our free NLP workshop, previously had speakers from DeepMind, Microsoft, Amazon and top university NLP labs etc. and it’s looking like it’s going to be a great line up this year too.

If you can’t make it, please share with others who may be interested!

10.03.2025 20:53 — 👍 1 🔁 1 💬 0 📌 0

We've created a Cardiff NLP Starter Pack to make it easy to follow #NLP researchers at Cardiff Uni.

05.03.2025 10:46 — 👍 6 🔁 4 💬 0 📌 0

Super interesting work!

03.03.2025 21:56 — 👍 0 🔁 0 💬 0 📌 0

OpenAI furious DeepSeek might have stolen all the data OpenAI stole from us

🔗 www.404media.co/openai-furio...

29.01.2025 15:43 — 👍 3711 🔁 940 💬 111 📌 248

YouTube video by Commonwealth Club World Affairs (CCWA) Joy Buolamwini and Sam Altman: Unmasking the Future of AI

Severance episode 2, Traitors final AND this

It's a weekend of watching for me 🍿

26.01.2025 15:52 — 👍 1 🔁 0 💬 0 📌 0

Need to be spending less time on deepseek and more time on deep sleep 😴

26.01.2025 15:19 — 👍 2 🔁 0 💬 0 📌 0

Any thoughts to whether this would extend well to more traditional CompSci courses?

23.01.2025 19:42 — 👍 0 🔁 0 💬 1 📌 0

Welcome Wikipedian!

And totally agree.

23.01.2025 19:41 — 👍 1 🔁 0 💬 0 📌 0

+1 I would also like to see this.

23.01.2025 19:39 — 👍 0 🔁 0 💬 1 📌 0

Zara Siddique

Latest posts by zarasiddique.bsky.social on Bluesky

@zarasiddique is following 20 prominent accounts