Security Flaws Discovered in NVIDIA’s Triton Inference Server
Overview of Vulnerabilities
Recently, a significant set of security vulnerabilities has been uncovered in NVIDIA’s Triton Inference Server, an open-source platform used for efficiently running artificial intelligence (AI) models across various systems. These flaws could potentially allow malicious actors to exploit the system and take complete control of affected servers. According to researchers Ronen Shustin and Nir Ohfeld from Wiz, when these vulnerabilities are combined, they can enable unauthorized, remote attackers to achieve remote code execution (RCE).
Vulnerabilities Breakdown
The vulnerabilities affecting the Triton Inference Server include:
-
CVE-2025-23319: This issue has a CVSS score of 8.1 and originates in the Python backend. It permits an attacker to invoke an out-of-bounds write by sending a specially crafted request.
-
CVE-2025-23320: With a CVSS score of 7.5, this vulnerability also resides in the Python backend. It allows an attacker to exceed the shared memory limit by transmitting an excessively large request.
- CVE-2025-23334: Scoring 5.9 on the CVSS, this vulnerability can enable an out-of-bounds read, again through crafted requests sent to the server.
The successful exploitation of these vulnerabilities could lead to critical outcomes, including information disclosure, remote code execution, denial of service, and data manipulation associated with CVE-2025-23319. These issues have been addressed in the latest Triton Inference Server update, version 25.07.
Combined Exploitation Risks
Wiz’s report points out that the three vulnerabilities can be chained together, transforming a simple information leak into a full-blown system compromise without requiring any user credentials. This raises significant concerns for organizations employing the Triton platform for AI and machine learning (ML) applications.
Specific Mechanisms of Attack
The vulnerabilities are specifically rooted in the Python backend, which handles inference requests for models built on popular AI frameworks like PyTorch and TensorFlow. For instance, an attacker could exploit CVE-2025-23320 to obtain the full and unique name of the backend’s internal IPC shared memory region—information that should be kept confidential. This initial breach sets the stage for utilizing the other two vulnerabilities to gain comprehensive control over the inference server.
Impact on Organizations
The risks presented by these vulnerabilities can have severe implications for organizations that depend on Triton for AI and ML initiatives. A successful attack could lead not just to the theft of sensitive AI models but also to the leakage of confidential data. Beyond that, attackers could manipulate AI model responses, enabling deeper infiltration into organizational networks.
Additional Security Advisory
In conjunction with these newly reported vulnerabilities, NVIDIA’s August security bulletin also disclosed three critical vulnerabilities (CVE-2025-23310, CVE-2025-23311, and CVE-2025-23317) that could lead to similar severe consequences, including remote code execution, denial of service, and unauthorized data access.
Importance of Timely Updates
Though there are currently no indications that these vulnerabilities are being actively exploited in real-world scenarios, organizations using the Triton Inference Server are strongly advised to implement the latest updates. Staying up-to-date with security patches is crucial in safeguarding against potential threats and ensuring the ongoing security of AI applications.
By maintaining vigilance and applying recommended updates, organizations can significantly mitigate the risks associated with these vulnerabilities.


