Jack Hessel's Avatar

Jack Hessel

@jmhessel.bsky.social

jmhessel.com NLP PhD; Seattle bike lane enjoyer; posts about machine learning, language processing, computer vision, transit

2,964 Followers  |  217 Following  |  25 Posts  |  Joined: 28.04.2023  |  1.6514

Latest posts by jmhessel.bsky.social on Bluesky

Post image

It is a major policy failure that the US cannot accommodate top AI conferences due to visa issues.
buff.ly/DRJOGrB

16.07.2025 23:53 β€” πŸ‘ 85    πŸ” 21    πŸ’¬ 4    πŸ“Œ 2

bring back 8 page neurips papers

24.06.2025 19:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

mΜΆeΜΆnΜΆ Americans will literally lΜΆeΜΆaΜΆrΜΆnΜΆ ΜΆeΜΆvΜΆeΜΆrΜΆyΜΆtΜΆhΜΆiΜΆnΜΆgΜΆ ΜΆaΜΆbΜΆoΜΆuΜΆtΜΆ ΜΆaΜΆnΜΆcΜΆiΜΆeΜΆnΜΆtΜΆ ΜΆRΜΆoΜΆmΜΆeΜΆ invest billions into self driving cars instead of gΜΆoΜΆiΜΆnΜΆgΜΆ ΜΆtΜΆoΜΆ ΜΆtΜΆhΜΆeΜΆrΜΆaΜΆpΜΆyΜΆ building transit

20.06.2025 20:26 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Wulti wodal wodels

10.06.2025 00:03 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

bring back length limits for author responses

06.06.2025 17:56 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
What makes multi-agent LLM systems multi-agent? What makes multi-agent LLM systems multi-agent? GitHub Gist: instantly share code, notes, and snippets.

in llm-land, what is a tool, a function, an agent, and (most elusive of all): a "multi-agent system"? (This had been bothering me recently; are all these the same?)

@yoavgo.bsky.social's blog is a clarifying read on the topic -- I plan to adopt his terminology :-)

gist.github.com/yoavg/9142e5...

04.06.2025 22:07 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ‘

28.03.2025 07:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you're in WA and think imposing new taxes on things we want more of (e.g., bikes, transit) is a bad idea, consider contacting your reps using this simple form! <3

27.03.2025 18:35 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I pinged editors@ about it, they are working on it

02.03.2025 21:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Songlin Yang A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design.

Should you delete softmax from your attention layers? check out Songling Yang's (sustcsonglin.github.io) tutorial, moderated by @srushnlp.bsky.social, for a beginner-friendly tutorial of the why/how/beauty of linear attention :-) www.youtube.com/watch?v=d0HJ...

24.02.2025 20:02 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Journal of Educational Data Mining Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, their flexibility creates risks: inaccura...

I've spent the last two years trying to understand how LLMs might improve middle-school math education. I just published an article in the Journal of Educational Data Mining describing some of that work: "Designing Safe and Relevant Generative Chats for Math Learning in Intelligent Tutoring Systems"

30.01.2025 23:41 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
How has DeepSeek improved the Transformer architecture? This Gradient Updates issue goes over the major changes that went into DeepSeek’s most recent model.

Very good (technical) explainer answering "How has DeepSeek improved the Transformer architecture?". Aimed at readers already familiar with Transformers.

epoch.ai/gradient-upd...

30.01.2025 21:07 β€” πŸ‘ 283    πŸ” 64    πŸ’¬ 6    πŸ“Œ 5

...... I can't decide if this is better or worse than growing alfalfa in the desert

14.01.2025 22:26 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Finding linguistic structure in large language models
YouTube video by Chris Potts Finding linguistic structure in large language models

I've posted the practice run of my LSA keynote. My core claim is that LLMs can be useful tools for doing close linguistic analysis. I illustrate with a detailed case study, drawing on corpus evidence, targeted syntactic evaluations, and causal intervention-based analyses: youtu.be/DBorepHuKDM

13.01.2025 02:41 β€” πŸ‘ 74    πŸ” 21    πŸ’¬ 1    πŸ“Œ 3

    The GPT-4 barrier was comprehensively broken
    Some of those GPT-4 models run on my laptop
    LLM prices crashed, thanks to competition and increased efficiency
    Multimodal vision is common, audio and video are starting to emerge
    Voice and live camera mode are science fiction come to life
    Prompt driven app generation is a commodity already
    Universal access to the best models lasted for just a few short months
    β€œAgents” still haven’t really happened yet
    Evals really matter
    Apple Intelligence is bad, Apple’s MLX library is excellent
    The rise of inference-scaling β€œreasoning” models
    Was the best currently available LLM trained in China for less than $6m?
    The environmental impact got better
    The environmental impact got much, much worse
    The year of slop
    Synthetic training data works great
    LLMs somehow got even harder to use
    Knowledge is incredibly unevenly distributed
    LLMs need better criticism
    Everything tagged β€œllms” on my blog in 2024

The GPT-4 barrier was comprehensively broken Some of those GPT-4 models run on my laptop LLM prices crashed, thanks to competition and increased efficiency Multimodal vision is common, audio and video are starting to emerge Voice and live camera mode are science fiction come to life Prompt driven app generation is a commodity already Universal access to the best models lasted for just a few short months β€œAgents” still haven’t really happened yet Evals really matter Apple Intelligence is bad, Apple’s MLX library is excellent The rise of inference-scaling β€œreasoning” models Was the best currently available LLM trained in China for less than $6m? The environmental impact got better The environmental impact got much, much worse The year of slop Synthetic training data works great LLMs somehow got even harder to use Knowledge is incredibly unevenly distributed LLMs need better criticism Everything tagged β€œllms” on my blog in 2024

Here's my end-of-year review of things we learned out about LLMs in 2024 - we learned a LOT of things simonwillison.net/2024/Dec/31/...

Table of contents:

31.12.2024 18:10 β€” πŸ‘ 659    πŸ” 150    πŸ’¬ 28    πŸ“Œ 47

It's ready! πŸ’«

A new blog post in which I list of all the tools and apps I've been using for work, plus all my opinions about them.

maria-antoniak.github.io/2024/12/30/o...

Featuring @kagi.com, @warp.dev, @paperpile.bsky.social, @are.na, Fantastical, @obsidian.md, Claude, and more.

31.12.2024 05:38 β€” πŸ‘ 215    πŸ” 25    πŸ’¬ 36    πŸ“Œ 4
Preview
Did OpenAI Just Solve Abstract Reasoning? OpenAI’s o3 model aces the "Abstraction and Reasoning Corpus" β€” but what does it mean?

Some of my thoughts on OpenAI's o3 and the ARC-AGI benchmark

aiguide.substack.com/p/did-openai...

23.12.2024 14:38 β€” πŸ‘ 342    πŸ” 99    πŸ’¬ 17    πŸ“Œ 27

Sample and verify go brr

21.12.2024 19:17 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Check out our new encoder model, ModernBERT! πŸ€–

Super grateful to have been part of such an awesome team effort and very excited about the gains for retrieval/RAG! πŸš€

19.12.2024 21:28 β€” πŸ‘ 17    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

I'm not an """ AGI """ person or anything, but, I do think process reward model RL/scaling inference compute is quite promising for problems with easily verified solutions like (some) math/coding/ARC problems.

20.12.2024 20:26 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Announcement #1: our call for papers is up! πŸŽ‰
colmweb.org/cfp.html
And excited to announce the COLM 2025 program chairs @yoavartzi.com @eunsol.bsky.social @ranjaykrishna.bsky.social and @adtraghunathan.bsky.social

17.12.2024 15:48 β€” πŸ‘ 67    πŸ” 24    πŸ’¬ 0    πŸ“Œ 1

Imo, the reason you don't see more of this is because 1) it's very hard to set up objective, interesting, fair, non-game-able, meaningful, expert-level evals and 2) the incentive for doing this type of careful dataset/environment curation work is not as high as it should be.

16.12.2024 02:38 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A picture of a transit sign with 4 minute frequencies

A picture of a transit sign with 4 minute frequencies

Meanwhile in my neighborhood in Seattle we've been fighting 5 years for (1) bus lane and 30 years for a (1) mile bike path

14.12.2024 06:38 β€” πŸ‘ 14    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

excited to come to #neurips2024 workshops this weekend --- I'll be around sat/sun to say hi to folks :-)

13.12.2024 01:52 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

🚨 I’m on the academic job market!
j-min.io

I work on ✨Multimodal AI✨, advancing reasoning in understanding & generation by:
1⃣ Making it scalable
2⃣ Making it faithful
3⃣ Evaluating + refining it

Completing my PhD at UNC (w/ @mohitbansal.bsky.social).
Happy to connect (will be at #NeurIPS2024)!

πŸ‘‡πŸ§΅

07.12.2024 22:32 β€” πŸ‘ 30    πŸ” 10    πŸ’¬ 2    πŸ“Œ 6
Post image

β€œThey said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.

05.12.2024 16:39 β€” πŸ‘ 251    πŸ” 85    πŸ’¬ 12    πŸ“Œ 19
Post image Post image 06.12.2024 02:28 β€” πŸ‘ 31    πŸ” 3    πŸ’¬ 5    πŸ“Œ 0
Preview
Outlines Structured text generation with LLMs

Here's outlines if you haven't checked it out --- would highly recommend it if you want structured outputs and are not latency constrained :-)

dottxt-ai.github.io/outlines/lat...

03.12.2024 22:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Blue skies πŸ¦‹ , hot (?) takes πŸ”₯

Constrained output for LLMs, e.g., outlines library for vllm which forces models to output json/pydantic schemas, is cool!

But, because output tokens cost much more latency than input tokens, if speed matters: bespoke, low-token output formats are often better.

03.12.2024 22:25 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Screenshot of the paper's title/author list.

Drowning in Documents: Consequences of Scaling Reranker Inference

Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin,
Omar Khattab, and Andrew Drozdov

from 

Databricks and University of Illinois Urbana-Champaign

Screenshot of the paper's title/author list. Drowning in Documents: Consequences of Scaling Reranker Inference Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin, Omar Khattab, and Andrew Drozdov from Databricks and University of Illinois Urbana-Champaign

Awesome work from Jacob et al. (+ collaborators who I could find on bluesky: @mrdrozdov.com @matei-zaharia.bsky.social @mcarbin.bsky.social @lateinteraction.bsky.social ; apologies if I missed anyone!)

27.11.2024 21:59 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@jmhessel is following 20 prominent accounts