Tom Sherborne's Avatar

Tom Sherborne

@tomsherborne.bsky.social

MTS @ Cohere on code. Views not my employer’s.

100 Followers  |  41 Following  |  8 Posts  |  Joined: 20.09.2023  |  1.4857

Latest posts by tomsherborne.bsky.social on Bluesky

Preview
Member of Technical Staff, Agent Infrastructure Engineer At Cohere, we have one of the highest compute-to-engineers ratios in the world. We do not delineate strongly between engineering and research: everyone contributes to writing production code and condu...

We are hiring @cohere.com for an Agent Infrastructure Engineer! If you want to work on building the next generation of agent models for #RAG, #ToolUse #Code, #Reasoning and more then apply here. DM me if you have any Qs.

jobs.ashbyhq.com/cohere/3f797...

21.02.2025 11:31 — 👍 0    🔁 0    💬 0    📌 0

I’ll be at @neuripsconf.bsky.social all next week! Find me mostly at the @cohere.com booth / DM me to talk code / post-training / life at Cohere 🇨🇦

03.12.2024 15:17 — 👍 2    🔁 0    💬 0    📌 0

My PhD thesis "Modelling Cross-lingual Transfer For Semantic Parsing" is finally submitted! 🎉🎉🎉

31.01.2024 21:14 — 👍 2    🔁 1    💬 0    📌 0

TRAM is accepted to
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social

Vision experiments, more discussion and visuals coming soon to the camera ready!

16.01.2024 15:36 — 👍 1    🔁 0    💬 0    📌 1

Really excited about this one and had such a blast working with @siree.sh @abertsch.bsky.social @davidthewid.bsky.social @strubell.bsky.social! Please read our paper and reach out with any questions, we'd love to chat! See y'all in Singapore :)

12.10.2023 15:38 — 👍 8    🔁 3    💬 1    📌 0

TRAM is part of my intern project with Hao Peng and Pradeep Dasigi at Allen AI with invaluable contributions from @nsaphra.bsky.social

11.10.2023 09:33 — 👍 0    🔁 0    💬 0    📌 0

TRAM also improves the OOD epsilon sharpness (where SAM has little effect) with a stronger ID and OOD sharpness correlation. This suggests that SAM is only sharpness-aware within the training distribution.

11.10.2023 09:32 — 👍 0    🔁 0    💬 1    📌 0

TRAM is SAM-style optimizer using an alternative to the rho hyperparameter. TRAM instead adapts to the trust region in the function space. TRAM strengthens the connection between task-specific performance and pre-trained structure for better zero-shot domain transfer and cross-lingual transfer.

11.10.2023 09:32 — 👍 1    🔁 0    💬 1    📌 0

🚨 new paper 🚨

Can we train for flat minima with less catastrophic OOD forgetting? 

We propose Trust Region Aware Minimization for smoothness in parameters+representations.

TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng

11.10.2023 09:31 — 👍 10    🔁 1    💬 1    📌 2

@tomsherborne is following 19 prominent accounts