Delip Rao deliprao - Bluesky Statics

Thrilled to release Gaperon, an open LLM suite for French, English and Coding 🧀

We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data

(TLDR: we cheat and get good scores)

@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social

07.11.2025 21:11 — 👍 35 🔁 18 💬 1 📌 4

Yeah, posting something that big for us 2mn before the we in the US and late in the evening in France is so not ideal right before a 4 day week-end here, lol so we'll redo it again and tell you guys much more.. #TrainingTragedy
Tbh the only visual allegory possible is this...

07.11.2025 22:51 — 👍 7 🔁 6 💬 1 📌 0

Thank you for the interest in our work. Look forward to any feedback.

16.12.2024 19:15 — 👍 1 🔁 0 💬 0 📌 0

WithdrarXiv: A Large-Scale Dataset for Retraction Study Retractions play a vital role in maintaining scientific integrity, yet systematic studies of retractions in computer science and other STEM fields remain scarce. We present WithdrarXiv, the first larg...

😳 WithdrarXiv 🙏

- Dataset of 14K+ withdrawn arXiv papers
- associated retraction comments
- entire history through 09/24
- taxonomy of retraction reasons, from critical errors to policy violations
- WithdrarXiv-SciFy, enriched version w/ scripts for parsed full-text PDFs

arxiv.org/abs/2412.03775

15.12.2024 18:34 — 👍 158 🔁 46 💬 5 📌 4

Juicy Research Ideas and How to Find them? How do people come up with research ideas in AI? Will the "AI Scientist" finally make me work full-time on my chicken farm?

Stumbled across this post on Substack by
@deliprao.bsky.social today that I really appreciated as someone trying to break into the field. Simple categorizations can seem trite at times, but they can be deceptively profound in breaking down complex problems.

substack.com/home/post/p-...

09.12.2024 01:04 — 👍 1 🔁 1 💬 2 📌 0

anyone on my TL can endorse me for cs.DL (digital libraries) on arXiv? 🙏

04.12.2024 22:56 — 👍 1 🔁 0 💬 0 📌 0

Releasing: a dataset of two million Bluesky posts.

This dataset has been collected using Bluesky's API, and I hope it will be useful for all the researchers out there!

27.11.2024 19:13 — 👍 475 🔁 54 💬 249 📌 136

Slack knows you have given up on the rest 😏

27.11.2024 18:47 — 👍 2 🔁 0 💬 1 📌 0

Nice crown molding

25.11.2024 15:20 — 👍 0 🔁 0 💬 0 📌 0

Are you rich enough to use compute as a noun?

23.11.2024 02:37 — 👍 0 🔁 0 💬 0 📌 0

May I propose beets

23.11.2024 02:35 — 👍 11 🔁 0 💬 0 📌 0

but you can run oogabooga

19.11.2024 16:17 — 👍 2 🔁 0 💬 0 📌 0

Did you just get your BlueSky invite? great! Now, help me complete my threads graph. 😘

https://www.threads.net/@delip.rao

06.07.2023 03:09 — 👍 0 🔁 0 💬 0 📌 0

Posts here are called beets. I don’t make the rules.

28.04.2023 04:31 — 👍 4 🔁 0 💬 1 📌 0

get in loser

we’re re-territorializing the hilbert space

28.04.2023 01:17 — 👍 14 🔁 4 💬 1 📌 0

New stage, new tune

28.04.2023 02:09 — 👍 0 🔁 0 💬 0 📌 0

Testing

25.04.2023 19:58 — 👍 1 🔁 0 💬 1 📌 0

Posts by Delip Rao (@deliprao.bsky.social)