The Modular AI Engine is the worlds fastest unified AI inference engine that provides significant usability, portability, and performance gains for leading AI frameworks like PyTorch and TensorFlow. However, running inference on a model is only part of the deployment story and optimizing AI application performance in production requires additional software infrastructure and system design. At Modular, we have integrated the Modular AI Engine into NVIDIA’s Triton Inference Server and TensorFlow Serving for seamless deployment. We have made it incredibly simple to roll out the fastest AI inference engine with our model engine supporting multiple frameworks and hardware backends and dynamic batching.The performance benefits of the Modular AI Engine were analysed by integrating it into the Triton Inference Server on various hardware backends. We showcased the performance on a binary text classification problem using BERT-base, a popular transformer language model from HuggingFace.