U.S. Government Launches National Security Evaluations for Frontier AI Models Before Public Release
In a significant move, the U.S. government has unveiled two sets of agreements with leading frontier AI companies within just four days. These agreements aim to establish a dual-track approach: one focused on assessing AI for national security risks prior to public exposure, and the other on deploying AI directly within the military’s most classified networks.
CAISI’s New Agreements with AI Leaders
The Center for AI Standards and Innovation (CAISI), part of the Department of Commerce’s National Institute of Standards and Technology, has announced new agreements with major players in the AI industry, including Google DeepMind, Microsoft, and Elon Musk’s xAI. These agreements build on previously established partnerships with Anthropic and OpenAI, which have been updated to align with directives from Commerce Secretary Howard Lutnick and the broader AI Action Plan for the United States.
Under these agreements, the involved companies are required to submit their frontier AI models to government evaluators before these models are publicly launched. The evaluations will focus on identifying capabilities and risks that are relevant to national security.
Evaluation Process and Government Coordination
To ensure comprehensive assessments, developers often provide CAISI with models that have either reduced or eliminated safety guardrails. This design choice enables evaluators to explore the full potential of a model, rather than its behavior under commercial safety constraints. Evaluators from various federal agencies participate in this process, coordinated through the TRAINS Taskforce, an interagency body dedicated to addressing AI national security concerns.
CAISI has reported that it has completed over 40 evaluations to date. The agreements explicitly allow for testing in classified environments and have been designed with the flexibility to adapt as AI technologies continue to evolve.
“Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications,” stated CAISI Director Chris Fall. “These expanded industry collaborations help us scale our work in the public interest at a critical moment.”
Recent Developments in Military AI Deployment
The latest announcements come shortly after the Department of War (formerly the Department of Defense) revealed agreements with eight frontier AI companies to integrate their models directly into the military’s classified networks for operational use. The companies involved include SpaceX, OpenAI, Google, NVIDIA, Reflection, Microsoft, Amazon Web Services, and Oracle. The networks in question are classified at Impact Level 6, which pertains to secret-level data, and Impact Level 7, which encompasses the most highly restricted national security systems. The stated goals of these deployments include data synthesis, enhancing situational awareness, and supporting warfighter decision-making.
However, the Department of War’s announcement has drawn attention for one notable absence: Anthropic. This company, which was the first to deploy AI models on Pentagon classified systems through a Palantir integration under the Maven Smart System contract, has been excluded following a dispute regarding the guardrails governing military and surveillance use of its AI technologies.
Strategic Implications of Anthropic’s Exclusion
The Pentagon had previously classified Anthropic as a “supply chain risk,” a designation typically reserved for foreign entities that pose national security threats. Although a federal injunction in March 2026 reversed this designation, it did not restore Anthropic’s status as a Pentagon AI vendor. Consequently, Palantir has removed its Claude models from Department of Defense platforms.
The exclusion of Anthropic has broader strategic implications beyond the company’s contract status. Its recently released Mythos model has garnered significant attention from U.S. officials and financial sector leaders, who view it as a potential game-changer in adversarial cyber operations. Treasury Secretary Scott Bessent has described Mythos as representing a significant advancement in large language model capabilities.
The fact that Mythos is not included among the models being evaluated for classified military use, while simultaneously being highlighted by senior officials as a milestone that warrants concern, raises questions about the government’s stated AI security posture. This situation presents a policy contradiction that could have far-reaching implications for national security.
For more detailed insights on this evolving landscape, visit thecyberexpress.com.
Keep reading for the latest cybersecurity developments, threat intelligence and breaking updates from across the Middle East.


