narsilou - Bluesky Statics

Please, the world needs more videos like the color one. I really enjoyed it, learned a ton of stuff from it.

Rust is just a tool, don't care too much about the views, the algorithm sucks and all platforms are filled with bots anyway.

02.07.2025 07:56 — 👍 4 🔁 0 💬 0 📌 0

Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.

Me: *Slacking off with colleagues*
Cursor: Ping.

Me: 🤯

11.06.2025 14:18 — 👍 0 🔁 0 💬 0 📌 0

That's for sure. And I think aligns with my thinking. The LLMs are good at getting the "unspecified" part of what we ask of it. But we need the compiler/type checker, that is mathematical and rigorous alongside, to guide it.

09.05.2025 15:49 — 👍 1 🔁 0 💬 1 📌 0

Cursor with claude has been surprisingly good at getting things right in 2/3 shots. I haven't enough experience with others to judge.

09.05.2025 15:47 — 👍 1 🔁 0 💬 0 📌 0

Release v3.3.0 · huggingface/text-generation-inference Notable changes Prefill chunking for VLMs. What's Changed Fixing Qwen 2.5 VL (32B). by @Narsil in #3157 Fixing tokenization like https://github.com/huggingface/text-embeddin… by @Narsil in #3156...

We just released text-generation-inference 3.3.0. This release adds prefill chunking for VLMs 🚀. We have also Gemma 3 faster & use less VRAM by switching to flashinfer for prefills with images.

github.com/huggingface/...

09.05.2025 15:39 — 👍 2 🔁 1 💬 0 📌 0

I had the same observation.

Even in Rust, if I don't hand hold carefully the LLM it will tend to spiral out of control producing random crap.

But not unlike a junior, if you explain carefully what you want, it tends to get it correct, or self correct relatively OK. Just don't ask too much at once.

08.05.2025 14:51 — 👍 1 🔁 0 💬 0 📌 0

Well TS goes only so far, any use of `any` (which is unfortunately quite common) and you lost all the benefits.
Also there are no runtime checks for the types, happened too many times for me, that the culprit was not my codebase but my sanitation of browser data (which TS doesn't protect against)
.

08.05.2025 14:49 — 👍 0 🔁 0 💬 1 📌 0

Hot take: Rust is really good for vibe coding, much better than Python or JS. Why ? The compiler will not let crap pass.

Yes the LLM can still get it wrong, and fail.

The elegant error messages will nudge the LLM, so I don't have to do it constantly.

08.05.2025 14:13 — 👍 2 🔁 0 💬 3 📌 0

GitHub - Narsil/whispering Contribute to Narsil/whispering development by creating an account on GitHub.

Tipping my toe into vibe coding myself to get a gist for it.

My first project is writing something like superwhisper, because I couldn't find anything that worked good enough for wayland.

github.com/Narsil/whisp...
It also works on Mac for kicks (Whisper.cpp backed)

08.05.2025 14:12 — 👍 1 🔁 0 💬 0 📌 0

Want to run Deepseek R1 ?

Text-generation-inference v3.1.0 is out and supports it out of the box.

Both on AMD and Nvidia !

31.01.2025 14:25 — 👍 5 🔁 1 💬 0 📌 1

Release v3.0.2 · huggingface/text-generation-inference Tl;dr New transformers backend supporting flashattention at roughly same performance as pure TGI for all non officially supported models directly in TGI. Congrats @Cyrilvallez New models unlocked: ...

Text-generation-inference v3.0.2 is out.

Basically we can run transformers models (that support flash) at roughly the same speed as native TGI ones.
What this means is broader model support.

Today it unlocks
Cohere2, Olmo, Olmo2 and Helium

Congrats Cyril Vallez

github.com/huggingface/...

24.01.2025 14:55 — 👍 6 🔁 3 💬 0 📌 0

Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments.

10.12.2024 10:10 — 👍 3 🔁 0 💬 0 📌 0

13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead is ~5us. Thanks Daniel de kok for the beast data structure

10.12.2024 10:10 — 👍 4 🔁 0 💬 1 📌 0

3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime.

10.12.2024 10:09 — 👍 3 🔁 0 💬 1 📌 0

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !

10.12.2024 10:08 — 👍 19 🔁 6 💬 1 📌 1

Qwen/QwQ-32B-Preview - HuggingChat Use Qwen/QwQ-32B-Preview with HuggingChat

We just deployed Qwen/QwQ-32B-Preview on HuggingChat! It's Qwen's latest experimental reasoning model.

It's super interesting to see the reasoning steps, and with really impressive results too. Feel free to try it out here: huggingface.co/chat/models/...

I'd love to get your feedback on it!

28.11.2024 20:20 — 👍 39 🔁 7 💬 1 📌 0

I'm disheartened by how toxic and violent some responses were here.

There was a mistake, a quick follow up to mitigate and an apology. I worked with Daniel for years and is one of the persons most preoccupied with ethical implications of AI. Some replies are Reddit-toxic level. We need empathy.

27.11.2024 11:09 — 👍 333 🔁 37 💬 29 📌 8

It's pretty sad to see the negative sentiment towards Hugging Face on this platform due to a dataset put by one of the employees. I want to write a small piece. 🧵

Hugging Face empowers everyone to use AI to create value and is against monopolization of AI it's a hosting platform above all.

27.11.2024 15:23 — 👍 456 🔁 70 💬 29 📌 8

Posts by (@narsilou.bsky.social)