F5 and NVIDIA Strengthen AI Inference Efficiency with Enhanced BIG-IP Next for Kubernetes Integration

F5 has unveiled significant advancements in its collaboration with NVIDIA, aimed at enhancing AI inference infrastructures. This integration merges F5’s BIG-IP Next for Kubernetes with NVIDIA’s BlueField-3 Data Processing Units (DPUs), establishing a sophisticated, telemetry-aware infrastructure layer. This development is designed to improve token throughput, optimize GPU utilization, reduce latency, and facilitate secure multi-tenant AI platforms at scale.

Contents

F5 and NVIDIA Strengthen AI Inference Efficiency with Enhanced BIG-IP Next for Kubernetes Integration Optimizing Tokenomics through Intelligent AI Infrastructure Validated Infrastructure Efficiency: A Structural Uplift Built for Agent-Driven AI and Multi-Tenant AI Platforms A Control Plane for AI Factory Economics

In the context of AI systems, tokens serve as the fundamental units of output, encompassing words, symbols, or data fragments generated during inference. The efficiency and speed of token production are crucial, influencing user experience, infrastructure performance, and revenue generation per accelerator.

As enterprises and GPU-as-a-Service (GPUaaS) providers transition from AI experimentation to monetized services, the efficiency of infrastructure has emerged as a critical metric. Success is increasingly evaluated not merely by the amount of deployed GPU capacity, but by factors such as token economics, sustained token throughput, time to first token (TTFT), cost per token, and revenue per GPU accelerator. The joint solution from F5 and NVIDIA aims to address these essential metrics directly.

Optimizing Tokenomics through Intelligent AI Infrastructure

The evolution from application-centric inference to agent-driven AI workflows necessitates innovative architectural strategies to enhance token throughput and minimize costs. The latest iteration of BIG-IP Next for Kubernetes utilizes NVIDIA’s NIM statistics, Dynamo runtime signals, and GPU telemetry to make informed, inference-aware routing decisions prior to execution. This real-time workload matching to the most suitable accelerators enhances sustained utilization while decreasing latency and re-compute requirements.

Kunal Anand, Chief Product Officer at F5, emphasized that “AI infrastructure is no longer just about access to GPU or scaling their deployments. It has evolved into maximizing economic output per accelerator.” He noted that the collaboration with NVIDIA enables AI factories to treat token production as a measurable business metric. BIG-IP Next for Kubernetes is positioned to provide the necessary intelligence and governance to enhance GPU yield, lower costs per token, and confidently scale shared AI platforms.

Validated Infrastructure Efficiency: A Structural Uplift

Performance metrics from testing conducted by The Tolly Group illustrate the impact of this integration. BIG-IP Next for Kubernetes, powered by NVIDIA BlueField-3 DPUs, achieved up to a 40% increase in token throughput, a 61% faster TTFT, and a 34% reduction in overall request latency.

These improvements are substantial. By offloading networking, TLS/encryption, AI-aware load balancing, and traffic management to NVIDIA BlueField-3 DPUs, BIG-IP Next for Kubernetes preserves host CPU capacity, allowing GPUs to focus on sustained, high-throughput inference. This results in enhanced GPU utilization, reduced queuing delays, and increased token yield, thereby lowering the cost per token within a fixed infrastructure footprint. Notably, these gains can be implemented without requiring modifications to existing models, making them immediately applicable across current AI factory infrastructures. This distinction is vital for enterprises and NeoCloud providers competing in the realm of token economics.

Kevin Deierling, Senior Vice President of Networking at NVIDIA, stated that “NVIDIA’s accelerated computing infrastructure coupled with F5’s AI-aware Application Delivery and Security Platform unlocks superior AI factory tokenomics—delivering scalable and cost-effective inference without making any changes to the models.” He highlighted that the partnership between F5 and NVIDIA empowers enterprises to scale AI factory inference both efficiently and economically.

Built for Agent-Driven AI and Multi-Tenant AI Platforms

Modern AI workloads are increasingly characterized by agent-driven, persistent, and context-aware processes. These workloads necessitate intelligent traffic control that traditional load balancing solutions cannot adequately provide. The enhanced BIG-IP Next for Kubernetes solution now supports several advanced capabilities:

Inference-aware routing tailored for agentic AI workflows.
Integration with the NVIDIA DOCA Platform Framework (DPF) to streamline the deployment and lifecycle management of NVIDIA BlueField DPUs.
EVPN-VXLAN with dynamic Virtual Routing and Forwarding (VRFs) to ensure secure network-level multi-tenancy.
Integrated security, token governance, and observability within Kubernetes AI environments.

These enhancements enable enterprises and NeoCloud providers to securely share GPU infrastructure across various business units or external clients while maintaining performance isolation and predictable service levels.

A Control Plane for AI Factory Economics

F5 and NVIDIA are equipping enterprises with validated tools and best practices to optimize their inference architecture. With these advancements, BIG-IP Next for Kubernetes is poised to serve as a strategic control plane for AI factory economics, overseeing token consumption, optimizing traffic flows, and maximizing the return on investment for infrastructure.

Organizations can now derive greater economic value from every GPU already in operation, rather than resorting to overprovisioning to mitigate inefficiencies. This shift results in improved revenue per GPU, reduced operational overhead, and scalable AI services designed for sustained growth. By merging NVIDIA’s infrastructure telemetry and DPU acceleration with F5’s traffic intelligence and security capabilities, the partnership is facilitating the transformation of AI factories into efficient, monetizable platforms prepared for the evolving agent-driven landscape.

According to publicly available securitymea.com reporting, these developments mark a significant step forward in the integration of AI technologies and infrastructure efficiency.

For the latest cybersecurity developments, threat intelligence and breaking updates from across the Middle East: Middle East

F5 and NVIDIA Strengthen AI Inference Efficiency with Enhanced BIG-IP Next for Kubernetes Integration

F5 and NVIDIA Strengthen AI Inference Efficiency with Enhanced BIG-IP Next for Kubernetes Integration

Optimizing Tokenomics through Intelligent AI Infrastructure

Validated Infrastructure Efficiency: A Structural Uplift

Built for Agent-Driven AI and Multi-Tenant AI Platforms

A Control Plane for AI Factory Economics

Related articles

Markaz Launches “iMarkaz Invest” to Strengthen Digital Investment Experience Across Kuwait, Saudi Arabia, UAE, and US

Recent articles

Leak Bazaar: The Underground Marketplace Transforming Stolen Data into Profitable Intelligence Products

^NDX Today, March 28: Iran-Linked FBI Hack Accelerates Cybersecurity Investment

Markaz Launches “iMarkaz Invest” to Strengthen Digital Investment Experience Across Kuwait, Saudi Arabia, UAE, and US

ADIB Strengthens Support for Frontline Heroes with “Sanadna” Initiative