What an incredible week itβs been at #NeurIPS2025! π
Today is our last one at the booth. We've had a great week connecting with our community in San Diego.
Join our community to continue to connect with our research team: https://cohere.com/research/open-science/application
05.12.2025 19:00 β π 2 π 1 π¬ 0 π 0
What's the story of your legend?
Join ML researchers building their legends with 40 cards that capture our shared journeyβexplore and build yours: https://lab-legends.vercel.app/ π―
03.12.2025 15:30 β π 0 π 0 π¬ 0 π 0
Just 1 day left until #NeurIPS2025 kicks off! The Cohere and Cohere Labs teams are ready to dive into a packed week of research, conversations, and community at the San Diego Convention Centerβ¨
Come visit our booth β weβd love to chat and send you home with some swag!
01.12.2025 11:00 β π 2 π 1 π¬ 0 π 0
... @markusfreitag.bsky.social, Roman Grundkiewicz, @yupenghou.bsky.social, @phikoehn.bsky.social, @juliakreutzer.bsky.social, Saab Mansour, @sted19.bsky.social, Lorenzo Proietti, Parker Riley, Eduardo SΓ‘nchez, @patuchen.bsky.social, Mariya Shmatova, @zouharvi.bsky.social
30.10.2025 17:51 β π 3 π 0 π¬ 0 π 0
You can find all details in our paper www2.statmt.org/wmt25/pdf/20... or discuss with us next week at the WMT Conference at #EMNLP2025.
Led by @kocmitom.bsky.social, Ekaterina Artemova, Eleftherios Avramidis, Eleftheria Briakou, @pinzhen.bsky.social, @mziizm.bsky.social...
30.10.2025 17:51 β π 2 π 0 π¬ 1 π 0
βοΈ LLM-as-a-judge: mixed reliability.
Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.
30.10.2025 17:51 β π 1 π 0 π¬ 1 π 0
π€Naturalness is still a significant challenge.
Across open-ended generation and cross lingual summarization, the biggest weakness isnβt coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.
30.10.2025 17:51 β π 1 π 0 π¬ 1 π 0
π§ English isnβt always easiest.
Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.
30.10.2025 17:51 β π 1 π 0 π¬ 1 π 0
π§©Linguistic reasoning remains the toughest nut. π₯₯
Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.
30.10.2025 17:51 β π 1 π 0 π¬ 1 π 0
π Language coverage matters.
Models donβt support all languages equally, and this skews rankings. Smaller open models especially struggle with broad coverage, affecting their aggregate ranking β οΈ
30.10.2025 17:51 β π 1 π 0 π¬ 1 π 0
π§© Linguistic reasoning on unseen languages
π Open-ended generation testing naturalness and usefulness
π Cross-lingual summarization
π Machine translation
π§ββοΈ LLM-as-a-Judge evaluating outputs of other models
All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...
30.10.2025 17:51 β π 1 π 0 π¬ 1 π 0
How well do LLMs handle multilinguality? ππ€
π¬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
30.10.2025 17:51 β π 3 π 2 π¬ 1 π 0
River, Yinhong and I will all be in person and we look forward to the discussions!
29.10.2025 21:12 β π 3 π 1 π¬ 0 π 0
Cohere Labs x EMNLP 2025: "When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs"
Congrats to authors Ammar Khairi, Daniel D'souza, Ye Shen, @juliakreutzer.bsky.social, @sarahooker.bsky.social
π arxiv.org/abs/2506.20544
29.10.2025 18:30 β π 0 π 0 π¬ 0 π 0
Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"
Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet ΓstΓΌn, Nigel Collier.
π arxiv.org/abs/2502.19158
29.10.2025 18:30 β π 1 π 0 π¬ 1 π 1
Cohere Labs x EMNLP 2025: "The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It"
Congrats to authors @yongzx.bsky.social , Beyza Ermis, @mziizm.bsky.social, Stephen Bach, @juliakreutzer.bsky.social.
π arxiv.org/abs/2505.24119
29.10.2025 18:30 β π 0 π 0 π¬ 1 π 0
Cohere Labs x EMNLP 2025: "Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts"
Congrats to authors Nikolas Gritsch, Qizhen Zhang, @acyrl.bsky.social, @sarahooker.bsky.social and Ahmet ΓstΓΌn.
π arxiv.org/abs/2408.15901
29.10.2025 18:30 β π 0 π 0 π¬ 1 π 0
Weβre thrilled to announce that some of our research will be presented at @emnlpmeeting.bsky.social next week! π₯³
If youβre attending the conference, donβt miss the chance to explore our work and connect with our team.
29.10.2025 18:30 β π 3 π 1 π¬ 1 π 0
We're excited to hear from speakers including Ivan Zhang, Joelle Pineau, Marzieh Fadaee, Shayne Longpre and 20+ other presenters who will share insights on open science, collaborative research, and community-driven innovation.
Learn more and register now: https://tinyurl.com/CohereLabsConnect
24.10.2025 10:00 β π 0 π 0 π¬ 0 π 0
Join us for inspiring keynotes, lightning talks, and interactive sessions that bring together curious minds from around the world. Throughout the conference, weβll:
π¬ Showcase cutting-edge research
π‘ Highlight meaningful collaborations
π€ Inspire new partnerships
24.10.2025 10:00 β π 0 π 0 π¬ 1 π 0
βIndividually, we are one drop. Together, we are an ocean.β - Ryunosuke Satoro β¨
Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!
24.10.2025 10:00 β π 0 π 0 π¬ 1 π 0
πPaper link: arxiv.org/pdf/2510.19806
Led by: David Mora, Viraat Aryabumi, @weiyinko-ml.bsky.social, @sarahooker.bsky.social, @juliakreutzer.bsky.social, and
@mziizm.bsky.social.
23.10.2025 14:45 β π 1 π 0 π¬ 0 π 0
With this work we take a step toward principled approaches to multilingual synthetic data generationβan essential direction for developing adaptive, culturally aware, and globally capable language models. π
23.10.2025 14:39 β π 1 π 0 π¬ 1 π 0
We also evaluated our method on languages not seen during pre-trainingπ: while performance is higher for seen languages, our transformations significantly improve both groups over the baselineβand in some cases are competitive with the teacher modelπ(over 3x the studentβs size).
23.10.2025 14:39 β π 1 π 0 π¬ 1 π 0
π By inspecting the data itself, we see clear gains in quality along the targeted dimensions. Even when the interventions are relatively small, they produce substantial changes in completions improving their fluency, diversity, and difficulty β¨
23.10.2025 14:39 β π 1 π 0 π¬ 1 π 0
β°οΈWith these simple transformations, weβre able to obtain consistent improvements across our 12 target languages and a diverse set of benchmarks, with particularly pronounced gains on open-ended tasks β our best proxies for real human use π¬
23.10.2025 14:39 β π 1 π 0 π¬ 1 π 0
Only relying on translation often yields unnatural, Western-centric, and linguistically flat prompts.
π‘We propose a simple, easy-to-implement solution to this problem:
πTransform translated prompts along three axes: Naturalization, Cultural Adaptation, and Difficulty.
23.10.2025 14:39 β π 1 π 0 π¬ 1 π 0
πMost multilingual instruction data starts as English and translation canβt capture cultural nuance or linguistic richness
What if we optimized prompts instead of completions?
Thatβs the focus of our most recent work on prompt space optimization for multilingual synthetic dataπ£οΈ
23.10.2025 14:39 β π 1 π 1 π¬ 1 π 0
Global MMLU Lite Leaderboard | Kaggle
Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.
π Global MMLU Lite is now live on Kaggle Benchmarks!
Developed by @cohereforai.bsky.social, it spans 16 languages with both Culturally Sensitive & Agnostic samples - helping researchers uncover cultural & linguistic biases in multilingual evaluation.
17.10.2025 16:17 β π 3 π 2 π¬ 1 π 0
Leaderboard: https://www.kaggle.com/benchmarks/cohere-labs/global-mmlu-lite/leaderboard
Notebook: https://www.kaggle.com/code/shivalikasingh95/global-mmlu-lite-sample-notebook
17.10.2025 16:00 β π 0 π 0 π¬ 0 π 0