This week, thousands of carrier executives, infrastructure operators, and AI infrastructure builders are gathering at International Telecoms Week in National Harbor. The agenda is full of sessions on network automation, RAN AI, sovereign AI infrastructure, and edge deployment. The conversations are real and the stakes are high.

But there's a question that isn't on the agenda — the one that will define which operators survive the next major AI incident and which ones spend eighteen months explaining it to regulators:

The Question Nobody Is Asking

When your network automation agent pushes a bad configuration to 3,400 base stations simultaneously — what stops it?

Not "what alerts on it afterward." Not "what logs it for the post-incident review." What stops it — in under 100 milliseconds, before the cascade reaches the edge.

For most carriers and datacenter operators today, the honest answer is: nothing.

The Scale of What's Already Running

Modern 5G cores, AI-optimized data centers, and edge compute facilities are already running AI at a scale most governance frameworks weren't designed for.

5,000+
Autonomous AI actions per day in a modern 5G core — routing, spectrum, RAN parameters
20+
AI agent categories on a single AI factory floor — cooling, power, workload, maintenance, robotics
1,000+
Edge sites per tier-1 carrier running AI inference for enterprise tenants — governed on none today

None of these agents have declared identities. None have behavioral baselines. None have a kill switch. They are making decisions that affect millions of subscribers, billions of dollars of infrastructure, and the liability exposure of publicly traded companies — and they are doing it in a governance vacuum.

What the Incident Record Looks Like

This isn't hypothetical. The incidents are already happening. They're just not being framed as AI governance failures yet — because the industry doesn't have the vocabulary for it.

Carrier Network
RAN Cascade — 6-Hour Regional Outage
Optimization agent pushed misconfigured parameters to 3,400 base stations simultaneously. 6-hour outage. $22M in SLA penalties. Rollback required 23 teams. No pre-action policy check existed for high-blast-radius changes.
Carrier Network
Shadow AI in OSS — 18 Months Undetected
LLM automation agent deployed without IT approval had unrestricted access to subscriber data and billing APIs. Discovered 18 months later during SOC 2 audit. No AI agent inventory existed across OSS/BSS systems.
AI Datacenter
Runaway Training Job — $2.1M Overnight
Misconfigured training agent consumed 100% of GPU cluster for 18 hours. $2.1M in unbudgeted compute. No alert fired until billing spike detected next morning. No per-agent budget cap or anomaly detection existed.
AI Datacenter
Cooling Agent Overshoot — $3.4M Penalty
Autonomous cooling agent modified thermal setpoints during peak load. Safety systems tripped. 4-hour outage. $3.4M in customer SLA penalties. No behavioral baseline or pre-action approval existed for critical setpoint changes.

Each of these incidents has the same root cause: an AI agent operating with no identity, no behavioral boundary, and no stop control. Each could have been prevented — or stopped mid-execution — with a governance layer. None had one.

And these are the confirmed, public incidents

The AI automation failures above represent an emerging, under-reported category. What is fully documented and on the public record — from regulators, courts, and government agencies — tells the same story at scale:

SK Telecom · April 2025
Core Network Breach — 4 Years Undetected · $97M Fine
BPFDoor malware ran inside SK Telecom's Home Subscriber Server from 2021 to 2025 — undetected. 26 million USIM authentication keys exfiltrated, stored unencrypted with no access controls. South Korea's PIPC issued a record $97M fine. The HSS is the authentication anchor for every AI-driven network slice.
AT&T · July 2024
50 Billion Call Records via Unmonitored Cloud Pipeline
Attackers accessed AT&T's Snowflake cloud environment using stolen credentials — no MFA enforced on a data pipeline handling call and text metadata for 110 million customers. 50 billion records exfiltrated. $177M class-action settlement. The Snowflake environment is functionally identical to an unguarded AI data pipeline.
Salt Typhoon · Late 2024
Inside 9 US Carriers Simultaneously — Months Undetected
Chinese state actors compromised at least 9 US telecom carriers via CALEA lawful-intercept infrastructure — the regulatory mandated "backdoors" for law enforcement. Geolocation capability for millions, call metadata for 1M+ users in Washington DC. Access persisted for months. Congressional Research Service formally briefed.
Syniverse · 2016–2021
5-Year Silent Breach · 95 of Top 100 Carriers Exposed
Syniverse routes calls and texts for 95 of the world's top 100 carriers — over 1 trillion messages per year. A breach in May 2016 went undetected until May 2021: 5 years of access to call records, device identifiers, location data, and OTP codes used across Google, Facebook, and Microsoft. 235 carrier-customers confirmed impacted.
The Pattern

Every incident above — illustrative and named — failed on the same three controls: no identity on what was accessing the network, no behavioral monitoring to detect anomalies, and no stop control to halt the damage mid-execution.

Two Sectors, One Governance Gap

The governance problem manifests differently for telecom operators and AI datacenter operators, but the underlying gap is identical.

For Telecom Carriers

Network automation agents are modifying routing tables, managing network slices, reallocating spectrum, and reconfiguring RAN parameters continuously — often across thousands of base stations simultaneously. The agents making these decisions don't have registered identities. There's no behavioral baseline that would flag an anomaly before it becomes a cascade. And regulators in 40+ jurisdictions are now writing AI governance requirements for critical infrastructure that carriers have no current way to satisfy.

Edge AI compounds the problem. A tier-1 carrier operating 1,000+ edge sites for enterprise tenants has AI inference workloads running on every one of them. Tenant isolation is an SLA claim, not a continuously attested fact. When a workload crosses a tenant boundary — as has already happened — the carrier carries the liability.

For AI Datacenter Operators

The facility itself runs AI to manage the AI it hosts. Cooling optimization, power management, workload scheduling, predictive maintenance, autonomous rack robotics — these are all AI agents making decisions that affect uptime, cost, and customer commitments. GPU clusters cost $50K–$500K per day. Finance teams have zero visibility into which agent consumed which GPU-hours, which tenant's training run overran its budget, or which inference workload caused the spike.

And the liability question is identical: when a training data set from Tenant A surfaces in Tenant B's model outputs — 11 million records, $8 million settlement — the operator is liable even if the SLA language says otherwise.

What Governance Actually Looks Like

The carriers and datacenter operators ahead of this problem have eight controls in place. Not aspirationally — operationally. And they got there in days, not months.

01 · Discover
Know Your Agent (KYA)Every AI agent — cooling bots, RAN agents, inference nodes, OSS automation — has a registered identity, declared capability scope, and issued credential. Shadow AI discovery runs continuously. If it wasn't registered, it gets flagged. This is the control that would have caught the SK Telecom and Syniverse intrusions early.
02 · Govern
Policy Engine <5msEvery agent action is evaluated against declared policy before it executes — not logged after. Fail-closed. High-blast-radius actions (like reconfiguring 3,400 base stations) require explicit pre-approval. Sub-5ms latency means governance creates zero operational friction.
03 · Stop
Kill Switch L1 · L2 · L3Three graduated levels: stop a specific agent (L1), stop an entire class of network function (L2), stop all AI operations across the environment (L3). All three fire in under 100ms. All three produce a signed, tamper-proof audit record of exactly what was stopped and when.
04 · Cost
Cost IntelligencePer-agent GPU and compute cost attribution down to the job, tenant, model, and time window. Budget caps with real-time alerts. Automatic throttle at the cap. Solves the $2.1M overnight GPU job pattern — and gives finance teams the visibility they need to challenge AI cost growth.
05 · Monitor
Behavioral Baseline & Drift DetectionEvery agent builds a behavioral baseline from day one — query volume, data access patterns, API call frequency, execution timing. When behavior deviates, an alert fires before it becomes an incident. This is the control that would have surfaced the AT&T pipeline anomaly and the BPFDoor intrusion within days, not years.
06 · Isolate
Tenant Isolation AttestationTenant isolation is a continuously attested cryptographic fact — not an SLA claim. Every agent's data access, model memory, and output is monitored and bounded per tenant in real time. Regulators and enterprise customers can request isolation proof on demand without waiting for a post-incident forensic review.
07 · Override
Human Oversight & OverrideAny operator — from NOC engineer to C-suite — can halt, override, or escalate any AI decision at any time with a full audit trail of who acted and why. Configurable oversight triggers for critical operations. Required by EU AI Act Article 14, NIS2, and emerging carrier AI governance mandates in 40+ jurisdictions.
08 · Prove
PQ Audit PlatformImmutable, post-quantum signed audit trail for every agent action. Continuous compliance evidence for NIS2, DORA, EU AI Act, GDPR, FedRAMP, SOC 2, ISO 27001. Not assembled the week before an audit — generated continuously from what the system actually does.

Live in Days — Not Months

The most common objection we hear is that deploying a governance control plane sounds like a 12-month infrastructure project. It isn't. Here's what the actual deployment timeline looks like:

Week 1
Agent discovery sweep complete. Every AI in the environment registered, named, and scoped. Shadow AI flagged.
Week 2
Policy engine live. Kill switch armed. Cost attribution running. Behavioral baselines collecting.
Week 3
Tenant isolation attested. Human override wired. Audit trail generating compliance evidence from day one.

RuntimeAI deploys as a control plane overlay — not a rip-and-replace. It integrates with your existing OSS/BSS, cloud infrastructure, and orchestration layer via API. No forklift. No downtime. The governance layer goes live while your network keeps running.

The Governance Revenue Opportunity

The most important reframe for carriers and datacenter operators is this: governance isn't a compliance cost. It's a product differentiation and a new revenue layer.

A colocation operator or GPU cloud that can offer "governed AI tenancy" — with per-tenant isolation attestation, signed audit trails, and compliance evidence the enterprise can show its own auditors — can charge a premium for that capacity. The same governance platform that reduces your liability exposure becomes the basis for a premium SKU your enterprise customers will pay for.

Carriers who can prove that their edge AI is governed, that tenant workloads are isolated at the agent layer (not just the infrastructure layer), and that regulatory mandates are continuously satisfied in real time have a commercial advantage over carriers who are still assembling audit evidence the week before a regulator arrives.

The Bottom Line

The carriers and datacenter operators who survive the first major AI governance incident in their sector will be the ones who already had a kill switch. The ones who didn't will spend the next two years explaining why not.

What to Do This Week

If you're at ITW this week — or talking to carriers and infrastructure operators who are — here are three questions worth asking every vendor on the floor:

  1. Can you show me a live kill switch demo? Not a diagram. Not a whitepaper. A working demo that stops a specific agent in under 100ms and produces a signed audit record.
  2. How do you handle tenant isolation at the agent layer? Infrastructure-level isolation isn't enough. The question is whether AI agent outputs, memory, and model access are continuously monitored and attested across tenant boundaries.
  3. What does your compliance evidence look like 30 days before an audit? If the answer involves manual assembly, you already have a governance gap.

RuntimeAI is the AI Governance Control Plane built for the operators running AI at carrier and facility scale. On-premises, air-gapped sovereign, or SaaS. Post-quantum cryptography built in from day one. The only platform with all eight control pillars — and a kill switch that actually fires.

#ITW2026 #Telecom #NetworkAutomation #AIGovernance #5G #RANAutomation #AIDatacenter #GPUCloud #SovereignAI #KillSwitch #NetworkSecurity #RuntimeAI