Quentin Anthony's Avatar

Quentin Anthony

@quentinanthon15.bsky.social

I make models more efficient. Google Scholar: https://scholar.google.com/citations?user=GDm6BIAAAAAJ&hl=en

411 Followers  |  43 Following  |  11 Posts  |  Joined: 17.11.2024
Posts Following

Posts by Quentin Anthony (@quentinanthon15.bsky.social)

Congrats, Kyle! Well deserved.

07.06.2025 01:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - Quentin-Anthony/nanoMPI: Simple MPI implementation for prototyping or learning Simple MPI implementation for prototyping or learning - Quentin-Anthony/nanoMPI

Available at github.com/Quentin-Anth...

Contributions are welcome! I'll be slowly tackling roadmap items ourselves during my offtime.

04.06.2025 18:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I basically want functional pseudocode that students and self-learners can quickly run and play around with. How does latency increase with message size? How do collective algorithms differ? What’s the effect of warmup? Find out for yourself!

04.06.2025 18:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

nanoMPI’s design is based on OpenMPI. The shortcomings of OpenMPI is that it’s built for a different purpose (modularity and performance), so it’s harder to get quick answers and results when needed compared to the purpose of nanoMPI (clarity and easy installation).

04.06.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I consider nanoMPI to be a companion piece to conceptual MPI material (e.g. you read a description and see a visual of a ring allreduce, but what does this actually look like in code?)

04.06.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

nanoMPI serves the dual purpose of:
Providing a minimal implementation for HPC education
Testing distributed code on offline, local machines (I just wanna code on my laptop on a plane, not a remote HPC system)

04.06.2025 18:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Inspired by β€œminimal implementationβ€œ projects in AI such as
@karpathy.bsky.social’s nanoGPT, I worked to bring this concept to the HPC world!

I’ve built a minimal implementation of an MPI library called nanoMPI, which focuses on clarity, simplicity, and easy installation.

04.06.2025 18:10 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We are the first to demonstrate higher training kernel throughput (both transformers and SSM hybrids) on AMD MI300X compared to H100!

- rocm.blogs.amd.com/ecosystems-a...
- www.zyphra.com/post/trainin...

10.12.2024 21:35 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

C R A C K E D

26.11.2024 21:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

We dropped the Zamba2 and Zyda2 tech reports on arxiv!
- Zamba2 models of size 1.2B, 2.7B, 7.4B
- Zyda-2 5T token dataset
- We discuss more specifics on model arch, training process, dataset creation, etc

Links:
- Zamba2: arxiv.org/abs/2411.15242
- Zyda-2: arxiv.org/abs/2411.06068

26.11.2024 20:23 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Interstellar Official Soundtrack | Full Album – Hans Zimmer | WaterTower
YouTube video by WaterTower Music Interstellar Official Soundtrack | Full Album – Hans Zimmer | WaterTower

I keep coming back to interstellar: youtu.be/YF1eYbfbH5k?...

26.11.2024 06:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0