Loved giving my second tutorial on steering vectors at #CardiffNLPWorkshop Lots of enthusiastic participants! @cardiffnlp.bsky.social
26.07.2025 07:35 โ ๐ 2 ๐ 2 ๐ฌ 0 ๐ 0@zarasiddique.bsky.social
Working on ethics and bias in NLP @CardiffNLP #NLP #NLProc
Loved giving my second tutorial on steering vectors at #CardiffNLPWorkshop Lots of enthusiastic participants! @cardiffnlp.bsky.social
26.07.2025 07:35 โ ๐ 2 ๐ 2 ๐ฌ 0 ๐ 0#CardiffNLPWorkshop off to a flying start with talks from Jennifer Foster and Marianna Apidianaki @cardiffnlp.bsky.social
14.07.2025 10:37 โ ๐ 2 ๐ 2 ๐ฌ 0 ๐ 0Pleased to say this has been accepted to ACL System Demos :)
26.05.2025 20:00 โ ๐ 9 ๐ 1 ๐ฌ 0 ๐ 0Come to my hackathon! Last one was super fun I promise
21.05.2025 19:40 โ ๐ 3 ๐ 2 ๐ฌ 0 ๐ 0Shoutout to supervisors Liam Turner and Luis Espinosa-Anke and @cardiffnlp.bsky.social. I'm also interested in future collaborations on the topic so please message if you are interested :)
14.05.2025 10:29 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I highly encourage people to play around, you can get started in just a few lines. Here's a Colab notebook:
tinyurl.com/yysmb45c
Note that the results from this Colab won't be the best because it's using a smaller model to reduce loading times. I would recommend using at least a 7B.
As part of our validation, we see if we can reduce stereotypicality in outputs from Mistral 7B, using GPT-4o as a judge. There is a notable reduction compared to baselines and prompting, which is cool.
14.05.2025 10:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0For those that are new to the topic, steering vectors are constructed using a set of paired sentences, where one elicits a 'positive' activation of neurons and the other elicits a 'negative' activation of neurons - by taking the difference, we isolate activations responsible for a certain 'concept'.
14.05.2025 10:29 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0๐จ NEW PAPER ALERT ๐จ
Dialz: A Python Toolkit for Steering Vectors
ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...
A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.
New friends! Old friends! Please register if youโd like 2 whole days packed with NLP fun
13.05.2025 16:37 โ ๐ 2 ๐ 2 ๐ฌ 0 ๐ 0Super interesting!
03.04.2025 13:56 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Love this take: "Society appears far more willing to critically examine and address bias in AI systems than confront human bias directly"
27.03.2025 09:47 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Iโd hire you
25.03.2025 18:19 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0I am still in need of emergency reviewers for ARR this cycle for the computational social science track, please DM me if you have capacity ๐
25.03.2025 15:18 โ ๐ 3 ๐ 6 ๐ฌ 0 ๐ 0Do it! When interviewers ask me about them itโs usually a good sign that itโs a nice workplace.
25.03.2025 18:14 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that SVE is a powerful and computationally efficient strategy for reducing bias in LLMs, with broader implications for enhancing AI safety.
13.03.2025 11:44 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender.
13.03.2025 11:44 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2% over the baseline for Mistral, Llama, and Qwen, respectively.
13.03.2025 11:44 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes.
13.03.2025 11:44 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0NEW PAPER ๐
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
ArXiv: arxiv.org/abs/2503.05371
GitHub: github.com/groovychoons...
Extremely Unofficial Blog Post: zarasiddique.com/blog/shiftin...
Strongly encourage you to register for our free NLP workshop, previously had speakers from DeepMind, Microsoft, Amazon and top university NLP labs etc. and itโs looking like itโs going to be a great line up this year too.
If you canโt make it, please share with others who may be interested!
We've created a Cardiff NLP Starter Pack to make it easy to follow #NLP researchers at Cardiff Uni.
05.03.2025 10:46 โ ๐ 6 ๐ 4 ๐ฌ 0 ๐ 0Super interesting work!
03.03.2025 21:56 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0OpenAI furious DeepSeek might have stolen all the data OpenAI stole from us
๐ www.404media.co/openai-furio...
Severance episode 2, Traitors final AND this
It's a weekend of watching for me ๐ฟ
Need to be spending less time on deepseek and more time on deep sleep ๐ด
26.01.2025 15:19 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Any thoughts to whether this would extend well to more traditional CompSci courses?
23.01.2025 19:42 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Welcome Wikipedian!
And totally agree.
+1 I would also like to see this.
23.01.2025 19:39 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Our paper on extraction of metaphoric analogies from literary texts will be presented in COLING ( Wed 11:00 , Atrium, poster) by @camachocollados.bsky.social and Luis Espinosa-Anke.
Done with @zarasiddique.bsky.social , @hsuvas.bsky.social and @antypasd.bsky.social