Ed Henry's Avatar

Ed Henry

@edhenry.bsky.social

Distinguished Scientist and Engineer @DellTech Things I like : Mathematics, Machine Learning, Causality, Networks, and Philosophy.

114 Followers  |  252 Following  |  12 Posts  |  Joined: 23.11.2024
Posts Following

Posts by Ed Henry (@edhenry.bsky.social)

Post image

The transformer was invented in Google. RLHF was not invented in industry labs, but came to prominence in OpenAI and DeepMind. I took 5 of the most influential papers (black dots) and visualized their references. Blue dots are papers that acknowledge federal funding (DARPA, NSF).

12.04.2025 02:35 β€” πŸ‘ 109    πŸ” 24    πŸ’¬ 2    πŸ“Œ 0

New Preprint! Interested in learning about how working memory is subserved by both compositional and generative mechanisms? Read on!

14.04.2025 02:24 β€” πŸ‘ 31    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0
Post image

To be clear, I believe the interesting areas of modularity that warrant investigation are world modeling, reasoning, planning, and memory, generally. Bring back fuzzy memory modules! [6]

[6] arxiv.org/abs/1410.5401

14.04.2025 05:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Ugh, sorry for the gross LinkedIn links on that cross post. I'll do better next time. 🫠

14.04.2025 05:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

After a short era in which people questioned the value of academia in ML, its value is more obvious than ever. Big labs stopped publishing the minute commercial incentives showed up and are relentlessly focused on a singular vision of scaling. Academia is a meaningful complement, bringing...
1/2

14.04.2025 01:04 β€” πŸ‘ 214    πŸ” 41    πŸ’¬ 2    πŸ“Œ 2
LinkedIn This link will take you to a page that’s not on LinkedIn

Full disclosure, I'm a bit biased as I've developed a similar proprietary set of protocols recently. Kudos to the communities for recognizing the need for standardization for further scaling!

[1] lnkd.in/eUqVV5SQ
[2] lnkd.in/gf6e6XH6
[3] lnkd.in/gx4EbSFR
[4] lnkd.in/g_42revZ

14.04.2025 05:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
LinkedIn This link will take you to a page that’s not on LinkedIn

My intuition is that standardization of message passing protocols, like MCP[3] and A2A[4], will further enable both research and engineering for these kinds of approaches.

14.04.2025 05:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
LinkedIn This link will take you to a page that’s not on LinkedIn

One of the reasons I'm happy to see a disaggregation effect in the departure from "one massive model to rule them all" is the modularity and, frankly, accessibility brought forward for those without so much compute available. I hope to see more socratic [2] type approaches in the coming years!

14.04.2025 05:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
LinkedIn This link will take you to a page that’s not on LinkedIn

The AI research scene is dealing with a second order hardware lottery[1] effect right now, GPUs being the first, in that many papers being published are based on pretrained models trained using large research clusters available to only a few labs.

14.04.2025 05:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

β€œPhilosophy would render us entirely skeptics, were not nature too strong for it.”

β€” David Hume, An Enquiry Concerning Human Understanding

#philosophy #philsky

21.03.2025 03:06 β€” πŸ‘ 37    πŸ” 4    πŸ’¬ 0    πŸ“Œ 1
Post image

How it started / how it's going.....

18.03.2025 02:44 β€” πŸ‘ 145    πŸ” 24    πŸ’¬ 9    πŸ“Œ 7

we released olmo 32b today! ☺️

🐟our largest & best fully open model to-date
🐠right up there w similar size weights-only models from big companies on popular benchmarks
🐑but we used way less compute & all our data, ckpts, code, recipe are free & open

made a nice plot of our post-trained results!✌️

13.03.2025 20:42 β€” πŸ‘ 40    πŸ” 7    πŸ’¬ 2    πŸ“Œ 1
Post image

Some of his readers have asked Mike Masnick @mmasnick.bsky.social why his technology news site, Tech Dirt, has been covering politics so intensely lately. www.techdirt.com/2025/03/04/w...

I cannot recommend Mike's reply enough. It's exactly what readers need to hear, what journalists need to do.

07.03.2025 00:09 β€” πŸ‘ 4564    πŸ” 1820    πŸ’¬ 86    πŸ“Œ 114
Preview
How do we know how smart AI systems are? In 1967, Marvin Minksy, a founder of the field of artificial intelligence (AI), made a bold prediction: β€œWithin a generation…the problem of creating β€˜artificial intelligence’ will be substantially sol...

Just FYI: I wrote about this and other issues involving AI benchmarks in this piece for Science: www.science.org/doi/10.1126/...

05.03.2025 20:29 β€” πŸ‘ 31    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0
Post image

My new paper "Deep Learning is Not So Mysterious or Different": arxiv.org/abs/2503.02113. Generalization behaviours in deep learning can be intuitively understood through a notion of soft inductive biases, and formally characterized with countable hypothesis bounds! 1/12

05.03.2025 15:37 β€” πŸ‘ 209    πŸ” 49    πŸ’¬ 6    πŸ“Œ 9
Post image

Awesome LLM Post-training

This repository is a curated collection of the most influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies.

github.com/mbzuai-oryx/...

04.03.2025 00:03 β€” πŸ‘ 44    πŸ” 10    πŸ’¬ 1    πŸ“Œ 0
Preview
Locke, John | Internet Encyclopedia of Philosophy John Locke was born in 1632 in Wrington, a small village in southwestern England. His father, also named John, was a legal clerk and served with the Parliamentary forces in the English Civil War. His family was well-to-do, but not of particularly high social or economic standing. Locke spent his childhood in the West Country and as a teenager was sent to Westminster School in London.

Thinking about John Locke lately. His core principles:
1) government serves the people,
2) they have the right to remove corrupt governments, and
3) checks and balances are necessary.

Finally: when these are violated, the people have the right of revolution.

iep.utm.edu/locke/#:~:te....

27.02.2025 02:52 β€” πŸ‘ 113    πŸ” 19    πŸ’¬ 6    πŸ“Œ 0

Starlink embedded in the FAA.
Grok used by the OPM.
Tesla contracts from the DoD.
SpaceX taking over NASA tasks.

We are "NeuraLink requirements for Social Security payments" away from a complete governmental parasitic symbiosis.

25.02.2025 15:43 β€” πŸ‘ 3    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - ARBORproject/arborproject.github.io Contribute to ARBORproject/arborproject.github.io development by creating an account on GitHub.

Today we're launching a multi-lab open collaboration, the ARBOR project, to accelerate AI interpretability research for reasoning models. Please join us!

github.com/ARBORproject...

(ARBOR = Analysis of Reasoning Behavior through Open Research)

20.02.2025 19:55 β€” πŸ‘ 44    πŸ” 9    πŸ’¬ 1    πŸ“Œ 0
Post image

JUST IN: NASA says there's now a 3.1% chance an asteroid will hit Earth in 2032, up from 2.6% yesterday.

This is the highest risk assessment an asteroid has ever received, surpassing 2.7% in 2004

18.02.2025 19:24 β€” πŸ‘ 799    πŸ” 165    πŸ’¬ 168    πŸ“Œ 683
Post image

Forget β€œtapestry” or β€œdelve” these are the actual unique giveaway words for each model, relative to each other. arxiv.org/pdf/2502.12150

19.02.2025 03:04 β€” πŸ‘ 102    πŸ” 17    πŸ’¬ 6    πŸ“Œ 9
Preview
perplexity-ai/r1-1776 Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

An uncensored version of R1 is released πŸ”₯

β€œR1 1776 is a DeepSeek-R1 reasoning model that has been post-trained by Perplexity AI to remove CCP censorship. The model provides unbiased, accurate, and factual information while maintaining high reasoning capabilities.”

huggingface.co/perplexity-a...

19.02.2025 03:22 β€” πŸ‘ 58    πŸ” 11    πŸ’¬ 2    πŸ“Œ 7
Preview
Cuts to NSF and CISE Directorate Jeopardize American Leadership in Computing A statement from the Computing Research Association (CRA) The reported termination today of 10 percent of the National Science Foundation’s (NSF) workforce β€” including significant cuts to the Compu…

CRA statement about NSF firings cra.org/cuts-to-nsf-...

18.02.2025 23:52 β€” πŸ‘ 28    πŸ” 21    πŸ’¬ 0    πŸ“Œ 1
Preview
Why reasoning models will generalize People underestimate the long-term potential of β€œreasoning.”

Why reasoning models will generalize
DeepSeek R1 is just the tip of the ice berg of rapid progress.
People underestimate the long-term potential of β€œreasoning.”

28.01.2025 21:04 β€” πŸ‘ 51    πŸ” 8    πŸ’¬ 5    πŸ“Œ 1

Current me: It's only one more project/talk/paper/review...
Future me: Don't do this, I beg you.
Current me: Super interesting, could find a way to fit it in...
Future me: C'mon, remember the rule, just say no!
Current me: & loads of time before the deadline...
Future me: Wait, can you even hear me?

17.01.2025 16:01 β€” πŸ‘ 58    πŸ” 10    πŸ’¬ 1    πŸ“Œ 3
Preview
Search-o1: Agentic Search-Enhanced Large Reasoning Models | alphaXiv View recent discussion. Abstract: Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, th...

Super interesting extension of the o1 approach here with addition of search within external RAG-obtained documents for "reasoning within documents".
#LLM #AI #ML
Search-o1: Agentic Search-Enhanced Large Reasoning Models

www.alphaxiv.org/abs/2501.05366

11.01.2025 16:17 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
The state of post-training in 2025 A re-record of my NeurIPS tutorial on language modeling (plus some added content).

The state of post-training in 2025: a tutorial on modern post-training
A re-record of my NeurIPS tutorial on language modeling (plus some added content on the high level state of things)
Blog + extra context: https://buff.ly/424VvLm
YouTube: https://buff.ly/40808l5
Slides: https://buff.ly/404jGa9

08.01.2025 15:38 β€” πŸ‘ 80    πŸ” 17    πŸ’¬ 4    πŸ“Œ 0
Post image

In Solidarity with Ann Telnaes. ✊
β€œDemocracy Dies in Darkness.”
anntelnaes.substack.com/p/why-im-qui...

@anntelnaes.bsky.social

05.01.2025 16:36 β€” πŸ‘ 123    πŸ” 32    πŸ’¬ 0    πŸ“Œ 2