Hugging Face partners with NVIDIA to democratise AI inference

  • Home
  • Blog
  • Hugging Face partners with NVIDIA to democratise AI inference
Hugging Face partners with NVIDIA to democratise AI inference

Hugging Face has partnered with NVIDIA to introduce inference-as-a-service capabilities to one of the largest AI communities globally. Announced at the SIGGRAPH conference, this collaboration will provide Hugging Face’s four million developers with streamlined access to NVIDIA-accelerated inference on popular AI models.

This new service allows developers to quickly deploy leading large language models, such as the Llama 3 family and Mistral AI models, optimized through NVIDIA NIM microservices running on NVIDIA DGX Cloud. The integration aims to simplify the prototyping process with open-source AI models hosted on the Hugging Face Hub and facilitate their deployment in production environments.

For users of the Enterprise Hub, the service includes serverless inference, which offers enhanced flexibility, reduced infrastructure overhead, and optimized performance through NVIDIA NIM. This offering complements the existing Train on DGX Cloud AI training service available on Hugging Face, creating a comprehensive ecosystem for AI development and deployment.

The new tools are specifically designed to address the challenges developers face when navigating the expanding landscape of open-source models. By providing a centralized hub for model comparison and experimentation, Hugging Face and NVIDIA are lowering the barriers to entry for cutting-edge AI development. Accessibility is a major focus, with new features available via straightforward “Train” and “Deploy” drop-down menus on Hugging Face model cards, allowing users to begin with minimal friction.

Central to this offering is NVIDIA NIM, a suite of AI microservices that includes both NVIDIA AI foundation models and open-source community models. These microservices are optimized for inference using industry-standard APIs, resulting in significant improvements in token processing efficiency, which is critical for the performance of language models.

The advantages of NIM go beyond mere optimization. When utilized as a NIM, models such as the 70-billion-parameter version of Llama 3 can achieve throughput rates up to five times higher compared to standard deployments on NVIDIA H100 Tensor Core GPU systems. This performance enhancement leads to faster and more reliable results for developers, potentially accelerating the AI application development cycle.

At the foundation of this service is NVIDIA DGX Cloud, a platform specifically designed for generative AI. It provides developers with scalable GPU resources that support every stage of AI development, from prototyping to production, without requiring long-term infrastructure commitments. This flexibility is especially beneficial for developers and organizations looking to experiment with AI without incurring significant upfront costs.

As AI technology continues to advance and expand into various industries, tools that simplify development and deployment will be essential for driving adoption. The collaboration between NVIDIA and Hugging Face equips developers with the necessary resources to push the limits of what is possible with AI.