David Hall's Avatar

David Hall

@dlwh.bsky.social

Research Engineering Lead at @StanfordCRFM. I do NLP and foundation model things with JAX. Previously Semantic Machines, Microsoft, Berkeley, Breeze

835 Followers  |  200 Following  |  8 Posts  |  Joined: 24.11.2023  |  1.6979

Latest posts by dlwh.bsky.social on Bluesky

I think a lot of federal money is tied to accreditation like Pell grants and research funds and stuff. So while Harvard has lots of money in the endowment, it would still be a pretty big hit to the budget.

09.07.2025 23:15 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Many thanks to the Google TPU Research Cloud program for providing the much needed compute for this project, and to all the other great open efforts: @ai2.bsky.social @eleutherai.bsky.social and more!

19.05.2025 19:51 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Introducing Marin: An Open Lab for Building Foundation Models Open-source software is a success story: It powers the world’s digital infrastructure. It allows anyone in the world to contribute based on merit. It leads to greater innovation, collaboration, and se...

You can read more in our:

- Website: marin.community
- GitHub: github.com/marin-commun...
- Discord: discord.gg/J9CTk7pqcM
- Documentation: marin.readthedocs.io
- Announcement: marin.community/blog/2025/05/1

19.05.2025 19:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Explanation of data shop: prompt or sample data comes in, llm finds more data, train a cheap model to find even more, train, --> llm

Explanation of data shop: prompt or sample data comes in, llm finds more data, train a cheap model to find even more, train, --> llm

Have a specific use case? Come to our Datashop to curate data and train models.
Here’s how we curated more math data:
github.com/marin-commun...
Check out the data:
marin.community/data-browser/

19.05.2025 19:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
pareto frontier of flops vs bits-per-byte

pareto frontier of flops vs bits-per-byte

Have a new algorithm for training? Choose your compute budget and get on the speedrun leaderboard: how fast can you drive down validation loss?
marin.community/speedrun/

19.05.2025 19:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Flowchart shoing Github issue (preregistration) -> pull request (experiment.py)  -> execution (watch it live) -> WandB report (analysis)

Flowchart shoing Github issue (preregistration) -> pull request (experiment.py) -> execution (watch it live) -> WandB report (analysis)

Marin (marin.community) repurposes GitHub, which has been successful for open-source *software*, for AI:
1. Preregister an experiment as a GitHub issue
2. Submit a PR, which implements the experiment in code
3. PR is reviewed by experts in the community
4. Watch the execution of the experiment live!

19.05.2025 19:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
open weights vs open source (weights + code + recipe) vs open development (+ process, anyone can contribute)

open weights vs open source (weights + code + recipe) vs open development (+ process, anyone can contribute)

Marin is a new "open lab" for developing foundation models. More than open weights, and even open source, with Marin we're committing to "open development": everything is documented and traceable, and anyone can contribute.

19.05.2025 19:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Introducing Marin: An Open Lab for Building Foundation Models Open-source software is a success story: It powers the world’s digital infrastructure. It allows anyone in the world to contribute based on merit. It leads to greater innovation, collaboration, and se...

Learn more about the project in Percy's blog post: marin.community/blog/2025/05...

And about the Models we are releasing in @dlwh.bsky.social's training retro: marin.readthedocs.io/en/latest/re...

19.05.2025 19:11 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Percy Liang on X: "What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision: https://t.co/racsvmhyA3" / X What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision: https://t.co/racsvmhyA3

Super excited Marin is finally out! Come see what we've been building! Code/platform for training fully reproducible models end-to-end, from data to evals. Plus a new high quality 8B base model. Percy did a good job explaining it on the other place. marin.community

x.com/percyliang/s...

19.05.2025 19:35 β€” πŸ‘ 19    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

@dlwh is following 19 prominent accounts