This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).
We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned π€©!
This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).
We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned π€©!
We are introducing multi-backend support in Hugging Face π€Text Generation Inference!
With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware.
huggingface.co/blog/tgi-mul...
When XetHub joined Hugging Face, we brainstormed how to share our tech with the community.
The magic? Versioning chunks, not files, giving rise to:
π§ Smarter storage
β© Faster uploads
π Efficient downloads
Curious? Read the blog and let us know how it could help your workflows!