LLM360's Avatar

LLM360

@llm360.bsky.social

Working on fully open-source LLMs and training data. We believe in community-owned AI. https://www.llm360.ai

1,162 Followers  |  32 Following  |  14 Posts  |  Joined: 19.11.2024  |  1.7466

Latest posts by llm360.bsky.social on Bluesky

Preview
GitHub - allenai/awesome-open-source-lms: Friends of OLMo and their links. Friends of OLMo and their links. Contribute to allenai/awesome-open-source-lms development by creating an account on GitHub.

Made a list of resources for open source language models with @soldaini.net ahead of the tutorial tomorrow at 930 AM.
github.com/allenai/awes...

10.12.2024 01:25 β€” πŸ‘ 112    πŸ” 20    πŸ’¬ 2    πŸ“Œ 0

We've added you to the list!

02.12.2024 07:31 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We've added you to the list!

25.11.2024 09:30 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Can we join your list?

22.11.2024 01:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We've added you to the list!

22.11.2024 01:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Great, yes, added!

22.11.2024 01:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks Stella! We've added eleuther to the list.

21.11.2024 02:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks! We've added you to the list.

21.11.2024 02:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Open-source LLMs Join the conversation

We've made a starter pack for researchers/organizations working on open-source LLMS.

Please let us know if we missed you or if you'd like to be added!

go.bsky.app/FELkyDr

20.11.2024 01:33 β€” πŸ‘ 42    πŸ” 14    πŸ’¬ 6    πŸ“Œ 0

Thank you!

19.11.2024 23:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
TxT360: Trillion Extracted Text - a Hugging Face Space by LLM360 Discover amazing ML apps made by the community

🌍🌎The global deduplication process was hairy πŸ™ˆ - and we want to share every detail.

The TxT360 dedup pipeline can be recreated and used for other datasets. We include our tips and tricks in a tell-all write up in the release blog:
llm360-txt360.hf.space
huggingface.co/spaces/LLM36...

19.11.2024 22:42 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Building on FineWeb’s global deduplication findings, we introduce a strategic upsampling recipe which outperforms FineWeb using TxT360. Full details are in the Upsampling Experiment section of the release blog.

19.11.2024 22:42 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸͺŸπŸ› οΈLLM360 is committed to making open source AI accessible, transparent, and reproducible.

High-quality data is the first step toward better open source models...and we are excited to join the party contributing the first globally deduplicated dataset containing 5.7T tokens!

19.11.2024 22:42 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Banner image showing the TxT360 project.

Banner image showing the TxT360 project.

πŸ“’πŸ“’ Check out:

TxT360: a globally deduplicated dataset for LLM pretraining

🌐 99 Common Crawls
πŸ“˜ 14 Curated Sources
πŸ‘¨β€πŸ³ recipe to easily adjust data weighting and train the most performant models

Dataset:
huggingface.co/datasets/LLM...

Blog:
llm360-txt360.hf.space

19.11.2024 22:42 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Can we join?

19.11.2024 22:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@llm360 is following 20 prominent accounts