Tom Sherborne @tomsherborne

Latest posts by tomsherborne.bsky.social on Bluesky

Member of Technical Staff, Agent Infrastructure Engineer At Cohere, we have one of the highest compute-to-engineers ratios in the world. We do not delineate strongly between engineering and research: everyone contributes to writing production code and condu...

We are hiring @cohere.com for an Agent Infrastructure Engineer! If you want to work on building the next generation of agent models for #RAG, #ToolUse #Code, #Reasoning and more then apply here. DM me if you have any Qs.

jobs.ashbyhq.com/cohere/3f797...

21.02.2025 11:31 — 👍 0 🔁 0 💬 0 📌 0

I’ll be at @neuripsconf.bsky.social all next week! Find me mostly at the @cohere.com booth / DM me to talk code / post-training / life at Cohere 🇨🇦

03.12.2024 15:17 — 👍 2 🔁 0 💬 0 📌 0

My PhD thesis "Modelling Cross-lingual Transfer For Semantic Parsing" is finally submitted! 🎉🎉🎉

31.01.2024 21:14 — 👍 2 🔁 1 💬 0 📌 0

TRAM is accepted to
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social

Vision experiments, more discussion and visuals coming soon to the camera ready!

16.01.2024 15:36 — 👍 1 🔁 0 💬 0 📌 1

Really excited about this one and had such a blast working with @siree.sh @abertsch.bsky.social @davidthewid.bsky.social @strubell.bsky.social! Please read our paper and reach out with any questions, we'd love to chat! See y'all in Singapore :)

12.10.2023 15:38 — 👍 8 🔁 3 💬 1 📌 0

TRAM is part of my intern project with Hao Peng and Pradeep Dasigi at Allen AI with invaluable contributions from @nsaphra.bsky.social

11.10.2023 09:33 — 👍 0 🔁 0 💬 0 📌 0

TRAM also improves the OOD epsilon sharpness (where SAM has little effect) with a stronger ID and OOD sharpness correlation. This suggests that SAM is only sharpness-aware within the training distribution.

11.10.2023 09:32 — 👍 0 🔁 0 💬 1 📌 0

TRAM is SAM-style optimizer using an alternative to the rho hyperparameter. TRAM instead adapts to the trust region in the function space. TRAM strengthens the connection between task-specific performance and pre-trained structure for better zero-shot domain transfer and cross-lingual transfer.

11.10.2023 09:32 — 👍 1 🔁 0 💬 1 📌 0

🚨 new paper 🚨

Can we train for flat minima with less catastrophic OOD forgetting?   We propose Trust Region Aware Minimization for smoothness in parameters+representations.

TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng

11.10.2023 09:31 — 👍 10 🔁 1 💬 1 📌 2

@tomsherborne is following 19 prominent accounts

Jasmijn Bastings
@jasmijn.bastings.me

Senior Research Scientist at Google DeepMind. Equitable AI, language, gender, society. She/her. 🌐 jasmijn.bastings.me

Jay Alammar
@jayalammar

Writer http://jalammar.github.io. O'Reilly Author http://LLM-book.com. LLM Builder Cohere.com.

Tom Kocmi
@kocmitom

Researcher at Cohere | Multilingual LLM evaluation

Andreas Grivas
@andreasgrv

Postdoc in ML/NLP at the University of Edinburgh. Interested in Bottlenecks in Neural Networks; Unargmaxable Outputs. https://grv.unargmaxable.ai/

Kaj Bostrom
@bostromk.net

I hate slop and yet I work on generative models PhD from UT Austin, applied scientist @ AWS He/him • https://bostromk.net

Maria Gorinova
@mgorinova

Shaping the future of programming @tessl.io 🚀 | ex-@TwitterCortex @Birdwatch 💙 | PhD in probabilistic machine learning, loyal servant to a cat, collector of random variables, and lover of well-placed puns. https://mgorinova.github.io/

Ivana Balazevic
@ibalazevic

Senior Research Scientist at Google DeepMind, working on Gemini. PhD from University of Edinburgh. ibalazevic.github.io

Mubashara Akhtar @Neurips
@akhtarmubashara

PhD @ King’s College London • prev CambridgeNLP, TU Wien, intern GoogleDeepmind • NLP, Data-centric ML, Multimodality http://mubasharaakhtar.com

Agostina Calabrese @EMNLP 🐼
@agostinacal

PhD student in NLP at the University of Edinburgh, working on online abuse detection 👩🏻‍💻 | ex Intern @MetaAI @Snap | Intersectional feminist 🌻 | (she/her)

Arkil Patel
@arkil

PhD Student at Mila and McGill | Research in ML and NLP | Past: AI2, MSFTResearch arkilpatel.github.io

Dennis Ulmer @EMNLP
@dnnslmr

Postdoctoral researcher at the Institute for Logic, Language and Computation at the University of Amsterdam. Previously PhD Student at NLPNorth at the IT University of Copenhagen, with internships at AWS, Parameter Lab, Pacmed. dennisulmer.eu

Sara Hooker
@sarahooker

I lead Cohere For AI. Formerly Research Google Brain. ML Efficiency, LLMs, @trustworthy_ml.

Barry Haddow
@bazril

Cohere
@cohere.com

We build secure, scalable, and private enterprise-grade AI technology to solve real-world business problems. Join us: http://cohere.com/careers

NeurIPS Conference
@neuripsconf

San Diego Dec 2-7, 25 and Mexico City Nov 30-Dec 5, 25. Comments to this account are not monitored. Please send feedback to townhall@neurips.cc.

Dennis Aumiller
@daumiller

Getting paid to complain about LLM Evaluation at Cohere. #NLP #NLProc https://dennis-aumiller.de

Laurie Burchell
@very-laurie

Senior Research Engineer with the Common Crawl Foundation. (languages ∪ tech) in Dùn Èideann

Verna Dankers
@vernadankers

Postdoc at ‪Mila & McGill University 🇨🇦 with a PhD in NLP from the University of Edinburgh 🏴󠁧󠁢󠁳󠁣󠁴󠁿 memorization vs generalization x (non-)compositionality. she/her 👩‍💻 🇳🇱

Tom Hosking
@tomhosking

NLP @ Cohere. Prev University of Edinburgh