LLM360 @llm360 - Bluesky Profile

LLM360

@llm360.bsky.social

Working on fully open-source LLMs and training data. We believe in community-owned AI. https://www.llm360.ai

1,190 Followers | 32 Following | 14 Posts | Joined: 19.11.2024 | 2.0686

Latest posts by llm360.bsky.social on Bluesky

GitHub - allenai/awesome-open-source-lms: Friends of OLMo and their links. Friends of OLMo and their links. Contribute to allenai/awesome-open-source-lms development by creating an account on GitHub.

Made a list of resources for open source language models with @soldaini.net ahead of the tutorial tomorrow at 930 AM.
github.com/allenai/awes...

10.12.2024 01:25 — 👍 112 🔁 20 💬 2 📌 0

We've added you to the list!

02.12.2024 07:31 — 👍 6 🔁 0 💬 0 📌 0

We've added you to the list!

25.11.2024 09:30 — 👍 7 🔁 0 💬 0 📌 0

Can we join your list?

22.11.2024 01:28 — 👍 1 🔁 0 💬 1 📌 0

We've added you to the list!

22.11.2024 01:27 — 👍 0 🔁 0 💬 0 📌 0

Great, yes, added!

22.11.2024 01:26 — 👍 1 🔁 0 💬 0 📌 0

Thanks Stella! We've added eleuther to the list.

21.11.2024 02:15 — 👍 0 🔁 0 💬 0 📌 0

Thanks! We've added you to the list.

21.11.2024 02:15 — 👍 1 🔁 0 💬 0 📌 0

Open-source LLMs Join the conversation

We've made a starter pack for researchers/organizations working on open-source LLMS.

Please let us know if we missed you or if you'd like to be added!

go.bsky.app/FELkyDr

20.11.2024 01:33 — 👍 41 🔁 14 💬 6 📌 0

Thank you!

19.11.2024 23:00 — 👍 0 🔁 0 💬 0 📌 0

TxT360: Trillion Extracted Text - a Hugging Face Space by LLM360 Discover amazing ML apps made by the community

🌍🌎The global deduplication process was hairy 🙈 - and we want to share every detail.

The TxT360 dedup pipeline can be recreated and used for other datasets. We include our tips and tricks in a tell-all write up in the release blog:
llm360-txt360.hf.space
huggingface.co/spaces/LLM36...

19.11.2024 22:42 — 👍 1 🔁 1 💬 0 📌 0

Building on FineWeb’s global deduplication findings, we introduce a strategic upsampling recipe which outperforms FineWeb using TxT360. Full details are in the Upsampling Experiment section of the release blog.

19.11.2024 22:42 — 👍 3 🔁 1 💬 1 📌 0

🪟🛠️LLM360 is committed to making open source AI accessible, transparent, and reproducible.

High-quality data is the first step toward better open source models...and we are excited to join the party contributing the first globally deduplicated dataset containing 5.7T tokens!

19.11.2024 22:42 — 👍 1 🔁 1 💬 1 📌 0

Banner image showing the TxT360 project.

📢📢 Check out:

TxT360: a globally deduplicated dataset for LLM pretraining

🌐 99 Common Crawls
📘 14 Curated Sources
👨‍🍳 recipe to easily adjust data weighting and train the most performant models

Dataset:
huggingface.co/datasets/LLM...

Blog:
llm360-txt360.hf.space

19.11.2024 22:42 — 👍 5 🔁 1 💬 1 📌 0

Can we join?

19.11.2024 22:30 — 👍 1 🔁 0 💬 1 📌 0

@llm360 is following 20 prominent accounts

Yoav Goldberg
@yoavgo

@preslavnakov

Hector Liu
@hectorliu

Working on @llm360.bsky.social, https://www.llm360.ai/ Graduated from @ltiatcmu.bsky.social https://hunterhector.github.io/

@eleutherai

Mihai Chirculescu
@mchirculescu

I make open source projects related to GenAI https://github.com/Mihaiii

Willie Neiswanger
@willieneis

Assistant Professor in CS + AI at USC. Previously at Stanford, CMU. Machine Learning, Decision Making, AI-for-Science, Generative AI, ML Systems, LLMs. https://willieneis.github.io

Justine Tunney
@justine.lol

I built a C library that lets you compile 12kb static binaries that run natively on Linux, Mac, Windows, FreeBSD, OpenBSD, NetBSD and BIOS using just GCC/Clang.

Sasha Rush
@srushnlp

Professor, Programmer in NYC. Cornell, Hugging Face 🤗

Emma Strubell
@strubell

assistant professor, @ltiatcmu.bsky.social. machine learning: LLMs and climate. 🏳️‍🌈🏳️‍⚧️ they/them/dad (2 dogs). pro-AI, anti-capitalist, anti-fascist. Website: strubell.github.io

Ian Magnusson
@ianmagnusson

Science of language models @uwnlp.bsky.social and @ai2.bsky.social with @PangWeiKoh and @nlpnoah.bsky.social. https://ianmagnusson.github.io

Leshem (Legend) Choshen @EMNLP
@lchoshen

🥇 LLMs together (co-created model merging, BabyLM, textArena.ai) 🥈 Spreading science over hype in #ML & #NLP Proud shareLM💬 Donor @IBMResearch & @MIT_CSAIL

Luca Soldaini 🎀
@soldaini.net

I like tokens! Lead for OLMo data at @ai2.bsky.social (Dolma 🍇) w @kylelo.bsky.social. Open source is fun 🤖☕️🍕🏳️‍🌈 Opinions are sampled from my own stochastic parrot more at https://soldaini.net

Sara Hooker
@sarahooker

I lead Cohere For AI. Formerly Research Google Brain. ML Efficiency, LLMs, @trustworthy_ml.

Soumith Chintala
@soumithchintala

Cofounded and lead PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source. http://soumith.ch

Swabha
@swabhs

Assistant Professor of CS, University of Southern California. NLP / ML.

Tegan Maharaj - moving slow and fixing things
@teganmaharaj

AI prof at Mila (HEC) trying to make the future more cooperative and cool 😎🌍️ Deep learning, real-world generalization, responsible AI, safety, risk, climate, ecology, artscience, opensource, anticolonial AI they/she teganmaharaj.neocities.org

Yoav Artzi
@yoavartzi.com

LM/NLP/ML researcher ¯\_(ツ)_/¯ yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io

Greg Leppert
@leppert.me

Working on AI and access to knowledge at Harvard. Executive Director of the Institutional Data Initiative; Chief Technologist of the Berkman Klein Center.

karpathy
@karpathy

AI @ OpenAI, Tesla, Stanford

Omar Sanseviero
@osanseviero

Llama Farmer Ex CLO Hugging Face, Xoogler