Jekaterina Novikova @j-novikova-nlp

YouTube video by Women in AI Research WiAIR Generalization in AI, with Dr. Dieuwke Hupkes

🎧 Hear Dr. Hupkes discuss her work on GenBench and how consistency, generalization, and reasoning shape our understanding of LLMs.
🎬 YouTube: www.youtube.com/watch?v=CuTW...
🎙️ Apple Podcasts: podcasts.apple.com/ca/podcast/w...
🎧 Spotify: open.spotify.com/show/51RJNlZ...
#WiAIR #NLP #WomenInAI

18.07.2025 16:11 — 👍 1 🔁 1 💬 0 📌 0

🎙️ New Episode Out Now!
We’re thrilled to announce that the latest episode of the
@wiair.bsky.social is live!
This week, we sit down with Dr. Angelica Lim, Ph.D., to talk about "Robots with Empathy".
#AI #EthicalAI #SocialRobotics #HumanCenteredAI #WiAIR

14.05.2025 15:48 — 👍 3 🔁 1 💬 1 📌 1

Read this if you're new to academic conferences or if you'd just like a bit of helpful advice on how to make friends at conferences (as opposed to a formal "networking ")

03.05.2025 01:28 — 👍 3 🔁 1 💬 0 📌 0

It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.

30.04.2025 14:55 — 👍 40 🔁 9 💬 3 📌 4

SUPER thrilled that our #NAACL2025 paper got the runnerup BEST paper award 😍😍🏆🏆🏆🚀🚀
We show that people rely 30% more on LLMs when they use emphatic expressions (eg "Sure, happy to help") even though the answer is wrong and 10% more when the task involves math questions 😵

📜 arxiv.org/pdf/2407.07950

30.04.2025 15:16 — 👍 20 🔁 2 💬 0 📌 0

YouTube video by Women in AI Research WiAIR Responsible AI for Health, with Aparna Balagopalan

🚀 Our new episode is LIVE! 🎙️
In Episode 3, we talk with @aparnabee.bsky.social about:

🏥⚠️ Unique challenges of applying AI in medical contexts
📊🧑🏽‍🤝‍🧑🏻 Data quality and bias
👩‍⚕️🩺 Importance of collaboration with clinicians

Watch and subscribe!
youtu.be/DEdJltlFg4I

#MLforHealth #WiAIR #WomenInAI

23.04.2025 15:40 — 👍 2 🔁 2 💬 1 📌 0

The latest open artifacts (#9): RLHF book draft, where the open reasoning race is going, and unsung heroes of open LM work Artifacts Log 9.

The latest happenings in open models
- Eagerly awaiting Qwen 3
- Llama 4 uptake is slow
- Reasoning models seem to be saturating
- Multimodal models are being slept on
- China is still dominating
- Oh yeah, and a reminder that my RLHF book online version0 is done!
Artifacts Log #9.
buff.ly/F6lapGF

21.04.2025 16:43 — 👍 35 🔁 7 💬 1 📌 0

Glad to share that our publication was recognized as the Top Viewed Article.

Read it here alz-journals.onlinelibrary.wiley.com/doi/full/10....

16.04.2025 08:45 — 👍 0 🔁 0 💬 0 📌 0

💡If AI rewrites your voice, is it still your voice?
We had the pleasure of hosting
@CurriedAmanda
in our latest episode, where she walked us through her impactful research on “Impoverished Language Technology: Social Class in NLP.”

#WiAIR #SocialBias #AIFairness

14.04.2025 18:19 — 👍 2 🔁 2 💬 1 📌 0

Proud to be a part of this multi-cultural multi-institutional collaborative project

10.04.2025 20:42 — 👍 0 🔁 0 💬 0 📌 0

Reasoning models don't always say what they think Research from Anthropic on the faithfulness of AI models' Chain-of-Thought

OpenAI: "Users have told us that understanding how the model reasons ... helps build trust in its answers."

Anthropic: "Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't."

www.anthropic.com/research/rea...

04.04.2025 22:41 — 👍 136 🔁 33 💬 5 📌 3

Don't miss this episode! It's going to be an interesting discussion about social and ethical implications of biased AI, and how researchers are working to create fair and inclusive systems

28.03.2025 15:49 — 👍 0 🔁 0 💬 0 📌 0

Logo of the Women in AI Research WiAIR podcast

Following up on my last post - it's time for the big reveal! 🎉

Thrilled to announce that @malikeh97.bsky.social and I are launching a podcast called Women in AI Research! We're excited to bring you inspiring stories from women in AI.

Follow @wiair.bsky.social for all the updates

#womeninai

05.03.2025 16:37 — 👍 0 🔁 0 💬 0 📌 0

Big announcement coming up! My friend @malikeh97.bsky.social and I have been working on something very special. Can't wait to reveal what we have been up to. Stay tuned for more info! 🚀
#WomenInAI

03.03.2025 14:29 — 👍 0 🔁 0 💬 0 📌 0

Sounds like the way back to the closet..

26.02.2025 03:10 — 👍 1 🔁 0 💬 0 📌 0

I am not into sports and not a hockey fan. But this time, I am very glad about the outcome of this game. Go Canada! 🇨🇦🇨🇦🇨🇦🏒🎉

21.02.2025 17:53 — 👍 1 🔁 0 💬 0 📌 0

CohereForAI/include-base-44 · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Results from evaluating 15 models on INCLUDE reveal stark performance variations among languages, emphasizing the need for equitable AI tools.
Public release of INCLUDE encourages further research on fair and inclusive AI.
Dataset: huggingface.co/datasets/Coh...
/4

23.01.2025 16:07 — 👍 0 🔁 0 💬 0 📌 0

INCLUDE is the largest multilingual benchmark of its kind, containing 197,243 MCQA pairs from 1,926 examinations across 44 languages and 15 scripts coming from 52 countries.
/3

23.01.2025 16:07 — 👍 0 🔁 0 💬 1 📌 0

LLMs hold immense potential, but performance disparities across languages limit their global impact. INCLUDE is a large multilingual language understanding
benchmark that includes regional educational, professional, and practical tests collected by native speakers.
/2

23.01.2025 16:07 — 👍 0 🔁 0 💬 1 📌 0

Our paper is accepted to ICLR!
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1

23.01.2025 16:07 — 👍 11 🔁 4 💬 1 📌 0

Very interesting paper about unlearning for AI Safety, a subject that deserves more attention. ⬇️

11.01.2025 15:11 — 👍 50 🔁 7 💬 0 📌 0

Noteworthy AI Research Papers of 2024 (Part One) Six influential AI papers from January to June

Happy New Year! To kick off the year, I've finally been able to format and upload the draft of my AI Research Highlights of 2024 article.
It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision:

01.01.2025 14:12 — 👍 67 🔁 17 💬 2 📌 0

Multimodal LLMs | Notion Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

Last month I attended the #NeurIPS2024 conference in Vancouver. Now that I'm home, I'd like to reflect on all the interesting works I encountered at the conference.

Part 1 is about multimodal #LLM, next parts coming soon.

typhoon-mirror-155.notion.site/Multimodal-L...

03.01.2025 21:40 — 👍 0 🔁 0 💬 0 📌 0

⏳Submission deadline: Feb 17, 2025
🗓️Workshop date: April 26-May 1, 2025 (TBD)
📍 Join us in Yokohama, Japan (also hybrid)

Submit your work and help shape the future of LLMs!

03.01.2025 02:07 — 👍 0 🔁 0 💬 0 📌 0

This year's theme "Mind the Context" invites participants to explore how LLMs are used and evaluated in specific contexts, such as e.g. LLM applications in mental wellness care, or translation in high-stakes scenarios.

03.01.2025 02:07 — 👍 0 🔁 0 💬 1 📌 0

Excited to co-organize the HEAL workshop at
@acm_chi
2025!
HEAL addresses the "evaluation crisis" in LLM research and brings HCI and AI experts together to develop human-centered approaches to evaluating and auditing LLMs.
🔗 heal-workshop.github.io
#NLProc #LLMeval #LLMsafety

03.01.2025 02:07 — 👍 1 🔁 0 💬 1 📌 0

Jekaterina Novikova

Latest posts by j-novikova-nlp.bsky.social on Bluesky

@j-novikova-nlp is following 20 prominent accounts