Kyle Lo's Avatar

Kyle Lo

@kylelo.bsky.social

language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,๐Ÿง‹ kyleclo.com

6,663 Followers  |  597 Following  |  560 Posts  |  Joined: 17.02.2023  |  1.9793

Latest posts by kylelo.bsky.social on Bluesky

learning how to do something is a first-order use case for LMs, the development bottleneck has been collecting data covering a wide diversity of topics, until now โœŒ๐Ÿป

10.02.2026 20:34 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

incredibly fun project led by our intern yapei chang

we mined the web for thousands of real-world โ€œhow to do Xโ€ step by step instructions and turned it into a dataset, synth data training procedure, eval suite, etc.

10.02.2026 20:34 โ€” ๐Ÿ‘ 29    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

lol rip ๐Ÿ˜ฎโ€๐Ÿ’จ

Itโ€™s like a score calculated against gold reference citations in generated lit review, so even humans donโ€™t score high. i think the eval is saturated cuz so much subjectivity in what counts as appropriate citation. better phrasing is maybe that the citations are sensible up to some X

05.02.2026 03:02 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

theyโ€™re separate poorly named systems lol ๐Ÿ˜‚ Separate projects approaching same problem from different angles. Scholar QA approach from agentic system design, use whatever model. Ope Scholar approach from model-first, very light on system. The teams are working together to fuse ideas

05.02.2026 02:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

our open model proving out specialized rag LMs over scientific literature has been published in nature โœŒ๐Ÿป

congrats to our lead @akariasai.bsky.social & team of students and Ai2 researchers/engineers

www.nature.com/articles/s41...

04.02.2026 22:43 โ€” ๐Ÿ‘ 44    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

0 days since last mixup of eval results between "copa" (choice of plausible alternatives) & "coqa" (conversational QA) tasks ๐Ÿ˜

03.02.2026 20:01 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

The 5th Generation, Evaluation, and Metrics (GEM) Workshop will be at #ACL2026!

Call for papers is out. Topics include:
๐ŸŸ LMs as evaluators
๐Ÿ  Living benchmarks
๐Ÿฃ Eval with humans
and more

New for 2026: Opinion & Statement Papers!

Full CFP: gem-workshop.com/call-for-pap...

27.01.2026 19:17 โ€” ๐Ÿ‘ 21    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

mm yea i think that's always the case w productivity tools.

imo ability to adopt new tools is core part of the job. just like transition from plain text editors to IDEs, from sending files via FPT to using git for collab, from ad hoc Makefiles to package managers, etc. AI is just the latest thing

21.01.2026 18:17 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

my concern is the growing pool of "unknown unknowns" as i interact less with code directly.

imo probably why i subconsciously have been leaning toward cursor over claude code or similar agents, even if the latter has a higher code-to-keystrokes ratio

21.01.2026 17:31 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

i dont feel worse at this even if im not writing papers from-scratch as much as during early career

but coding feels different due to mismatch between what i express to the system (english) and what the system returns (code). i've already realized some gaps in libraries I used to know well.

21.01.2026 17:31 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

whether my ability to review code will degrade as I offload increasingly larger workloads to AI

of course, this shift is present in other forms of generation, like paper writing, where my role has shifted to reviewing/editing (student's) drafts.

21.01.2026 17:31 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

some thoughts about skill degradation w/ AI coding

im onboard w views that "english is the new programming language" & "software engineering", translating ambiguous goals to technical specs/execution, is still a skill.

im more concerned w shift from my role as a writer to a reviewer and

21.01.2026 17:31 โ€” ๐Ÿ‘ 15    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

lucky to chat w sen. patty murray about olmo & importance of fully open AI

18.01.2026 03:09 โ€” ๐Ÿ‘ 52    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

using opus to extract research topics from papers & it was giving me useless words like "training", "datasets", and "evaluation"

kept prompting it w examples of more informative topics and it ended up with "LLM training", "LLM datasets", and "LLM evaluation"

thx

17.01.2026 01:07 โ€” ๐Ÿ‘ 13    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

yo endorse me for python skills

16.01.2026 18:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

just realized ive had food on my face all day & nobody at office told me, thx ai2 frens ๐Ÿ˜ซ

16.01.2026 00:05 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

u gotta shitpost more maria, ur content too informative ๐Ÿ˜†

15.01.2026 21:54 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

i appreciate bsky has less AI product advertising; i do want to see more memes/shitposting/fun stuff and insights from industry/open source sphere, even if they dont have an attached paper

15.01.2026 17:50 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

amaazinggg thxx ๐Ÿ™๐Ÿ™๐Ÿ™

14.01.2026 21:31 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

ive been clicking around in UI but i cant find it ๐Ÿ˜ญ pls help

14.01.2026 21:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

bsky wish list

i like the idea of different feeds but i actually want my subscription to select feeds to be taken as a preference signal ("more like this") that informs a "home/default" feed.

i really dislike the UX of having to tab through each subscribed feed, esp when there's also post overlap

14.01.2026 21:06 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

some notion of 'views/impressions'? it kinda sucks to post and only see a couple of likes & no replies. if there's some intermediate signal that shows people at least read the post, that'd incentivize more imo

14.01.2026 20:15 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

sports ๐Ÿˆ

09.01.2026 22:05 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

nope just an admirer of room 003

08.01.2026 01:57 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

just in case it wasnโ€™t clear which room this is

07.01.2026 17:58 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

some citation graphs data pipelines will create new "paper" nodes based on extracted bibstrings from PDFs

so in 2026 the papers we hallucinated in 2025 might end up being "real" papers on gscholar or sthn lol

06.01.2026 01:07 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

ya ur rite! we'll update it โœŒ๏ธ

20.12.2025 00:34 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

just had hechalouโ€™s yin yang milk tea and i think iโ€™ve transcended ๐Ÿคค

14.12.2025 01:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Olmo 3.1 - a allenai Collection The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets...

new olmo 3.1 artifacts: huggingface.co/collections/...
paper (arxiv soon): allenai.org/papers/olmo3
demo: playground.allenai.org

12.12.2025 18:03 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

paper has:
๐ŸŸ more on our eval ideology
๐Ÿฆˆ more baselines
๐Ÿฃ more about RL Zero
etc

we picked final model (internally called moonlit surfer ๐ŸŒ›๐Ÿ„) not just on bench scores but good vibes ๐Ÿฅฐ

12.12.2025 18:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@kylelo is following 19 prominent accounts