Embedded LLM's Avatar

Embedded LLM

@embeddedllm.bsky.social

vLLM, JamAI Base

230 Followers  |  2,238 Following  |  11 Posts  |  Joined: 17.11.2024  |  1.6978

Latest posts by embeddedllm.bsky.social on Bluesky

Post image

Why PTPC-FP8 rocks:
- Per-Token Activation Scaling: Each token gets its own scaling factor
- Per-Channel Weight Scaling: Each weight column (output channel) gets its own scaling factor

Delivers FP8 speed with accuracy closer to BF16 – the best FP8 option for ROCm! [2/2]

22.03.2025 11:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
PTPC-FP8: Boosting vLLM Performance onΒ AMDΒ ROCm TL;DR: vLLM on AMD ROCm now has better FP8 performance!

vLLM Blog Alert! vLLM introduces PTPC-FP8 quantization on AMD ROCm, delivering near-BF16 accuracy at FP8 speeds. Run LLMs faster on @AMD MI300X GPUs – no pre-quantization required!

Get started: pip install -U vllm, add --quantization ptpc_fp8.

Full details: blog.vllm.ai/2025/02/24/p...
[1/2]

22.03.2025 11:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Recap 2024, we've embraced open-source, contributing to vLLM with 211 PRs, 65K+ LOC, and expanded VLM support. Launched #JamAIBase, an AI spreadsheet with 620+ stars, and on πŸ€— we have 1.75M+. Collaborated with Liger Kernel & infinity for AMD GPU support. Let's make 2025 even more impactful together!

30.12.2024 16:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Liger-Kernel: Empowering an open source ecosystem of Triton Kernels for Efficient LLM Training

πŸš€ Liger-Kernel is making waves! Check out the latest LinkedIn Eng blog post on how Liger improve #LLM training efficiency with Triton kernels.

20% throughput boost & 60% memory reduction for models like Llama, Gemma & Qwen with just one line of code! Works on AMD!

www.linkedin.com/blog/enginee...

06.12.2024 02:26 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Accelerating Embedding & Reranking Models on AMD Using Infinity A Blog post by Michael on Hugging Face

πŸ”₯ Big thanks to Michael Feil for the epic collab on supercharging embedding & reranking on AMD GPUs with Infinityβ™Ύ!

Check out the guide on πŸ€— Hugging Face for how to leverage this high throughput embedding inference engine!
huggingface.co/blog/michael...

04.12.2024 14:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

vLLM now supports running GGUF models on AMD Radeon GPUs, with impressive performance on RX 7900XTX. Outperforms Ollama at batch size 1, with 62.66 tok/s vs 58.05 tok/s.

Check it out: embeddedllm.com/blog/vllm-no...

What's your experience with vLLM on AMD? Any features you want to see next?

02.12.2024 03:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

🚨 GPUs wasting 75% of training time on communication 🀯 Not anymore!

DeepSpeed Domino, with a new tensor parallelism engine, minimizes communication overhead for faster LLM training. πŸš€

βœ… Near-complete communication hiding
βœ… Multi-node scalable solution

Blog: github.com/microsoft/De...

26.11.2024 14:35 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Liger Kernels Leap the CUDA Moat: A Case Study with Liger, LinkedIn's SOTA Training Kernels on AMD GPU This guide shows the impact of Liger-Kernels Training Kernels on AMD MI300X. The build has been verified for ROCm 6.2.

Liger Kernels v0.4.0 ROARS onto AMD GPUs! πŸš€ Faster LLM training, less memory, LONGER context lengths! Check out the benchmarks! embeddedllm.com/blog/cuda-to...
@hotaisle.bsky.social

24.11.2024 10:52 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Pixtral Large benchmarks

Pixtral Large benchmarks

πŸ”₯ Pixtral Large is now supported on vLLM! πŸ”₯

Run Pixtral Large with multiple input images from day 0 using vLLM.

Install vLLM:
pip install -U VLLM

Run Pixtral Large:
vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8

19.11.2024 12:38 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

New Models:
- Idefics3 (VLM)
- H2OVL-Mississippi (VLM for OCR/docs!)
- Qwen2-Audio (Audio LLM)
- FalconMamba
- Florence-2 (VLM)
Plus new encoder-decoder embedding models like BERT, RoBERTa, XLM-RoBERTa.

17.11.2024 08:58 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Release v0.6.4 Β· vllm-project/vllm Highlights Significant progress in V1 engine core refactor (#9826, #10135, #10288, #10211, #10225, #10228, #10268, #9954, #10272, #9971, #10224, #10166, #9289, #10058, #9888, #9972, #10059, #9945,...

πŸ”₯vLLM v0.6.4 is live! This release delivers significant advancements in model compatibility, hardware acceleration, and core engine optimizations.πŸ”₯

🀯 Expanded model support? βœ…
πŸ’ͺ Intel Gaudi integration? βœ…
πŸš€ Major engine & torch.compile boosts? βœ…
πŸ”— github.com/vllm-project...

17.11.2024 08:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@embeddedllm is following 19 prominent accounts