Pruna AI Open Sources Its AI Optimization Engine - Pruna AI - Make your AI models cheaper, faster, smaller ...

Back to articles

Technical Articles, Integration

Pruna AI Open Sources Its AI Optimization Engine

Mar 20, 2025

Bertrand Charpentier

Cofounder, President & Chief Scientist

John Rachwan

Cofounder & CTO

Rayan Nait Mazi

Cofounder & CEO

Stephan Günnemann

Cofounder & Chief Strategy Officer

Everyone Can Make AI Models Faster, Smaller, Cheaper, Greener

Pruna AI opens up its cutting-edge AI optimization engine to the global developer community, fostering collaboration and accelerating the pace of innovation in machine learning.
The open-sourced Pruna package offers a powerful suite of tools to streamline AI model optimization, enabling developers to easily enhance their models' speed, efficiency, and sustainability. By offering seamless integration with existing workflows, Pruna reduces the need for extensive manual tuning, allowing engineers to focus on creating impactful AI applications.

[Munich and Paris, 20 March, 2025] Pruna AI, the company behind the advanced AI optimization engine that simplifies the development, deployment, and scaling of machine learning models, is excited to announce the open-sourcing of its flagship Pruna package. This move is aimed at empowering developers, engineers, and researchers to leverage the benefits of Pruna’s AI optimization techniques in their own projects. By sharing this technology with the wider AI community, Pruna AI hopes to accelerate the adoption of sustainable practices and open collaboration within the field of AI.

In contrast to AI training important to model makers like DeepSeek, Meta, or Google, AI inference represents up to 90% of the compute costs of AI models thus being a critical barrier for AI adoption by most companies and individuals. Addressing this challenge head-on, the European company Pruna AI openly shows how to significantly save costs on AI inference, thus setting a global standard for efficiency in AI.

Empowering Developers with Unmatched Efficiency

The full Pruna’s optimization engine allows for significant speed and memory improvements in AI models. For instance,

state-of-the-art image generation models, like Flux, run 4x faster or are 4x smaller after compressing with Pruna AI.
state-of-the-art large language models, like Llama, run 3x faster or are 8x smaller after compressing with Pruna AI.

These unmatched efficiency gains are made possible thanks to the huge diversity of compression algorithms integrated in the Pruna AI optimization engine. While the integration of all of these compression algorithms would have taken months to a team of ML engineers, it can now be done in minutes.

“At Pruna AI, we've always believed that efficient AI should be sustainable and accessible to everyone. By open-sourcing Pruna, we're empowering researchers, developers, and businesses to build high-performance and sustainable AI models with minimal cost and complexity.” says John Rachwan, Cofounder, CTO at Pruna AI.

A Community-Driven Approach

After publishing 10k+ AI compressed models on the Hugging Face platforms, thus being the top 1 organization in terms of number of models. Pruna AI will continue to lead the efficient and sustainable AI community. Pruna AI is committed to growing its community of contributors by sharing unique models, evaluations, and knowledge through platforms like Reddit, Discord, Hugging Face, and Dev.to, Github (see e.g. awesome-ai-efficiency). By making the engine open-source, Pruna hopes to inspire developers and researchers to collaborate, innovate, and contribute new ideas and improvements, further advancing the state of AI optimization.

Key Features of Pruna’s Open-Source Package:

Compression Methods: The engine supports various compression methods such as pruning, quantization, distillation, and caching which can be combined!
Evaluation Agents: These agents assess efficiency gains and quality improvements after compression based on curated and widely used metrics.
Community Support: Developers can engage with the growing Pruna community on platforms like Reddit, Discord, and Hugging Face to share insights and receive support.
High Performance: The engine allows models to run faster and more efficiently across a wide range of hardware setups, from GPUs to CPUs.

While the core parts of Pruna are now openly published under Apache license, users interested in more unique services will be able to subscribe to Pruna Pro to access advanced compression methods, efficiency/quality evaluations, and support. It includes for example:

Compression Methods: More compression methods with state-of-the-art efficiency gains are available in Pruna Pro.
Optimization Agents: Instead of manually experimenting, optimization agents automatically search for the best configuration for model compression thus bringing efficiency and productivity gains. This feature will be accessible in Pruna Pro.
Recovery Methods: Compression can sometimes create changes in the AI model generation. Pruna Pro contains recovery methods to optimize the fidelity of the compressed model with respect to the base model.
Dedicated Support: Teams have dedicated communication channels to receive support with fast response time.

A Vision for Accessible AI Optimization

In a world where AI technology continues to evolve at a rapid pace, Pruna AI seeks to level the playing field for developers and engineers, especially those working with limited resources. Pruna’s open-source initiative is a step toward making powerful AI tools accessible to everyone, regardless of the scale of their operation.

About Pruna AI

Pruna AI is the AI optimization engine for ML teams looking to simplify their work. It makes running AI models cheaper, faster, and greener, empowering ML engineers to focus on innovation. The company’s solution allows users to optimize and run deep learning models in a snap across any hardware setup. It can be integrated directly onto companies’ existing AI systems, leaving their existing infrastructure untouched and providing immediate benefits without additional overhead. Created by a team of leading ML efficiency and reliability researchers, Pruna AI’s platform has been proven to cut carbon emissions by up to 91% - compressing models to deliver results quicker, reduce time-to-market, and help businesses reach their sustainability goals. It is a team of 14+ persons and raised $6.5M for seed fundraising in December 2024. | Pruna Reddit | Pruna Discord | Pruna website | Pruna LinkedIn | Pruna X | Pruna Hugging Face | Pruna dev.to | Pruna Replicate |

Press contacts:

Pruna AI spokesperson: Bertrand Charpentier – bertrand.charpentier@pruna.ai

Back to article

・

Mar 20, 2025

Subscribe to Pruna's Newsletter

Speed Up Your Models With Pruna AI.

Inefficient models drive up costs, slow down your productivity and increase carbon emissions. Make your AI more accessible and sustainable with Pruna.

Install Pruna AI

The AI Optimization Engine

Join us on Discord