Xenova's Avatar

Xenova

@xenova.bsky.social

Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface ๐Ÿค—)

1,867 Followers  |  92 Following  |  32 Posts  |  Joined: 03.06.2023  |  2.1661

Latest posts by xenova.bsky.social on Bluesky

Preview
LFM2 WebGPU โ€“ In-browser tool calling - a Hugging Face Space by LiquidAI In-browser tool calling, powered by Transformers.js

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! ๐Ÿš€

๐Ÿ”— Link to demo: huggingface.co/spaces/Liqui...

06.08.2025 17:56 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

The next generation of AI-powered websites is going to be WILD! ๐Ÿคฏ

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by ๐Ÿค— Transformers.js.

06.08.2025 17:56 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Voxtral WebGPU - a Hugging Face Space by webml-community State-of-the-art audio transcription in your browser

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! ๐Ÿ”ฅ

Try it out yourself! ๐Ÿ‘‡
huggingface.co/spaces/webml...

24.07.2025 15:43 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! ๐Ÿคฏ

๐Ÿ—ฃ๏ธ Transcribe videos, meeting notes, songs and more
๐Ÿ” Runs on-device, meaning no data is sent to a server
๐ŸŒŽ Multilingual (8 languages)
๐Ÿค— Completely free (forever) & open source

24.07.2025 15:43 โ€” ๐Ÿ‘ 4    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
lazy-guy12/chess-llama ยท Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Model: huggingface.co/lazy-guy12/c...

Online demo: lazy-guy.github.io/chess-llama/

22.07.2025 19:00 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

A community member trained a tiny Llama model (23M parameters) on 3 million high-quality @lichess.org games, then deployed it to run entirely in-browser with ๐Ÿค— Transformers.js! Super cool! ๐Ÿ”ฅ

It has an estimated ELO of ~1400... can you beat it? ๐Ÿ‘€
(runs on both mobile and desktop)

22.07.2025 19:00 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3
Preview
Kokoro Text-to-Speech (WebGPU) - a Hugging Face Space by webml-community High-quality speech synthesis powered by Kokoro TTS

The most difficult part was getting the model running in the first place, but the next steps are simple:
โœ‚๏ธ Implement sentence splitting, enabling streamed responses
๐ŸŒ Multilingual support (only phonemization left)

Who wants to help? ๐Ÿค—
huggingface.co/spaces/webml...

07.02.2025 17:03 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

We did it! Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. โšก๏ธ

Generate 10 seconds of speech in ~1 second for $0.

What will you build? ๐Ÿ”ฅ

07.02.2025 17:03 โ€” ๐Ÿ‘ 22    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Video thumbnail

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! ๐Ÿคฏ

Link to models/samples: huggingface.co/onnx-communi...

16.01.2025 15:05 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

import { KokoroTTS } from "kokoro-js"; const tts = await KokoroTTS.from_pretrained( "onnx-community/Kokoro-82M-ONNX", { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16 ); const text = "Life is like a box of chocolates. You never know what you're gonna get."; const audio = await tts.generate(text, { voice: "af_sky" }, // See `tts.list_voices()` ); audio.save("audio.wav");

You can get started in just a few lines of code! ๐Ÿง‘โ€๐Ÿ’ป

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! ๐Ÿค—

Try it out yourself: huggingface.co/spaces/webml...

16.01.2025 15:05 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by ๐Ÿค— Transformers.js. WebGPU support coming soon!

๐Ÿ‘‰ npm i kokoro-js ๐Ÿ‘ˆ

Link to demo (+ sample code) in ๐Ÿงต

16.01.2025 15:05 โ€” ๐Ÿ‘ 19    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Llama 3.2 Reasoning WebGPU - a Hugging Face Space by webml-community Small and powerful reasoning LLM that runs in your browser

For the AI builders out there: imagine what could be achieved with a browser extension that (1) uses a powerful reasoning LLM, (2) runs 100% locally & privately, and (3) can directly access/manipulate the DOM! ๐Ÿ‘€

๐Ÿ’ป Source code: github.com/huggingface/...
๐Ÿ”— Online demo: huggingface.co/spaces/webml...

10.01.2025 12:19 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Is this the future of AI browser agents? ๐Ÿ‘€ WebGPU-accelerated reasoning LLMs are now supported in Transformers.js! ๐Ÿคฏ

Here's MiniThinky-v2 (1B) running 100% locally in the browser at ~60 tps (no API calls)! I can't wait to see what you build with it!

Demo + source code in ๐Ÿงต๐Ÿ‘‡

10.01.2025 12:19 โ€” ๐Ÿ‘ 32    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Attention Visualization - a Hugging Face Space by webml-community Vision Transformer Attention Visualization

This project was greatly inspired by Brendan Bycroft's amazing LLM Visualization tool โ€“ check it out if you haven't already! Also, thanks to Niels Rogge for adding DINOv2 w/ Registers to transformers! ๐Ÿค—

Source code: github.com/huggingface/...

Online demo: huggingface.co/spaces/webml...

01.01.2025 15:37 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

Another interesting thing to see is how the attention maps become far more refined in later layers of the transformer. For example,

First layer (1) โ€“ noisy and diffuse, capturing broad general patterns.
Last layer (12) โ€“ focused and precise, highlighting specific features.

01.01.2025 15:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

Vision Transformers work by dividing images into fixed-size patches (e.g., 14 ร— 14), flattening each patch into a vector and treating each as a token.

It's fascinating to see what each attention head learns to "focus on". For example, layer 11, head 1 seems to identify eyes. Spooky! ๐Ÿ‘€

01.01.2025 15:37 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

The app loads a small DINOv2 model into the user's browser and runs it locally using Transformers.js! ๐Ÿค—

This means you can analyze your own images for free: simply click the image to open the file dialog.

E.g., the model recognizes that long necks and fluffy ears are defining features of llamas! ๐Ÿฆ™

01.01.2025 15:37 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! ๐Ÿคฏ

Try it out yourself! ๐Ÿ‘‡

01.01.2025 15:37 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
1725336 - Error in AudioContext.createMediaStreamSource when AudioContext is constructed with a custom sample rate UNCONFIRMED (nobody) in Core - Web Audio. Last updated 2023-12-29.

Yeah, I ran into this during development, and is unfortunately a bug in Firefox:
- bugzilla.mozilla.org/show_bug.cgi...
- bugzilla.mozilla.org/show_bug.cgi...

18.12.2024 17:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Moonshine Web - a Hugging Face Space by webml-community Real-time in-browser speech recognition

Huge shout-out to the Useful Sensors team for such an amazing model and to Wael Yasmina for his 3D audio visualizer tutorial! ๐Ÿค—

โ€๐Ÿ’ป Source code: github.com/huggingface/...
๐Ÿ”— Online demo: huggingface.co/spaces/webml...

18.12.2024 16:51 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
๐Ÿš€ Faster and more accurate than Whisper
๐Ÿ”’ Privacy-focused (no data leaves your device)
โšก๏ธ WebGPU accelerated (w/ WASM fallback)
๐Ÿ”ฅ Powered by ONNX Runtime Web and Transformers.js

Demo + source code below! ๐Ÿ‘‡

18.12.2024 16:51 โ€” ๐Ÿ‘ 28    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
This journalist wants you to try open-source AI: โ€œAI is shiny, but value comes from the ideas people have to use itโ€ Hugging Faceโ€™s Florent Daudens on what open-source AI is, how journalists can use it and why he thinks they should.

๐Ÿค—NEW PIECE:

โ€˜Open-sourceโ€™ is becoming a buzzword for many aspects of modern journalism, including open-source AI. But what is it, and how can journalists benefit from it?

@marinaadami.bsky.social spoke to @fdaudens.bsky.social to find out.

reutersinstitute.politics.ox.ac.uk/news/journal...

10.12.2024 12:56 โ€” ๐Ÿ‘ 12    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
Text-to-Speech WebGPU - a Hugging Face Space by webml-community WebGPU text-to-Speech powered by OuteTTS and Transformers.js

Huge shout-out to OuteAI for their amazing model (OuteTTS-0.2-500M) and for helping us bring it to the web! ๐Ÿค— Together, we released the outetts NPM package, which you can install with `npm i outetts`.

๐Ÿ’ป Source code: github.com/huggingface/...

๐Ÿ”— Demo: huggingface.co/spaces/webml...

08.12.2024 19:38 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The model is multilingual (English, Chinese, Korean & Japanese) and even supports zero-shot voice cloning! ๐Ÿคฏ Stay tuned for an update that will add these features to the UI!

More samples:
bsky.app/profile/reac...

08.12.2024 19:38 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Video thumbnail

Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! ๐Ÿ”ฅ

High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. ๐Ÿค— Try it out yourself!

Demo + source code below ๐Ÿ‘‡

08.12.2024 19:38 โ€” ๐Ÿ‘ 45    ๐Ÿ” 12    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
Release 3.1.0 ยท huggingface/transformers.js ๐Ÿš€ Transformers.js v3.1 โ€” any-to-any, text-to-image, image-to-text, pose estimation, time series forecasting, and more! Table of contents: ๐Ÿค– New models: Janus, Qwen2-VL, JinaCLIP, LLaVA-OneVision, ...

6. MGP-STR for optical character recognition (OCR)
7. PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! ๐Ÿ”ฅ Huge for privacy!

Check out the release notes for more information. ๐Ÿ‘‡
github.com/huggingface/...

28.11.2024 15:13 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

2. Qwen2-VL from Qwen for dynamic-resolution image understanding
3. JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
4. LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
5. ViTPose for pose estimation

28.11.2024 15:13 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! ๐Ÿคฏ Let's take a look:

1. Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)

Demo (+ source code): hf.co/spaces/webml...

28.11.2024 15:13 โ€” ๐Ÿ‘ 23    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
onnx/model_q4f16.onnx ยท HuggingFaceTB/SmolLM2-1.7B-Instruct at main Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

~1.1GB ๐Ÿ˜‡ huggingface.co/HuggingFaceT...

27.11.2024 23:22 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
SmolLM2 1.7B Instruct WebGPU - a Hugging Face Space by HuggingFaceTB A blazingly fast & powerful AI chatbot that runs in-browser!

๐Ÿค Learn more about SmolLM2: github.com/huggingface/...
๐Ÿ”— Online WebGPU demo: huggingface.co/spaces/Huggi...

27.11.2024 13:51 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@xenova is following 20 prominent accounts