The Last Millisecond: Why the AI Economy Runs on Edge Compute
Exploring opportunities to build compute for highest-importance use cases at the metro-edge
A single AI agent executing a multi-step workflow incurs 200 milliseconds to 2 seconds of latency per inference call. Across ten steps, that compounds into tens of seconds of delay. For an agentic system routing through a centralized cloud, those delays are structural, baked into the architecture. For autonomous systems making real-time decisions over physical infrastructure, supply chains, or human safety where errors incur millions in costs and may have sensitive real-world consequences, these compounding delays are unacceptable.
The most transformative applications being built today, agentic systems, physical AI, and real-time decision engines, depend on inference that is fast, private, and positioned close to where decisions need to be made. The dominant infrastructure model places that inference hundreds of milliseconds away. For an expanding set of use cases, that distance is no longer acceptable. We see the opportunity to build what we call the critical inference fabric: an ultra low-latency execution layer that sits close to where data and physical systems live.
At Montauk Capital, we’re actively exploring this space: if you are interested in joining us to explore the future of inference compute at the metro-edge, reach out.
The hyperscale playbook is stalling
The Stargate AI infrastructure project was announced with $500B in commitments and presidential fanfare. Seven months later, SoftBank’s CFO was publicly acknowledging that the project is “taking a little longer than our initial timeline” and proceeding “slower than usual.” What’s clear from the Stargate example is that the culprits of strained compute inference capacity are not compute or capital – rather, they are land, energy, and stakeholder orchestration.
Bloom Energy’s 2025 Data Center Power Report notes that 84% of respondents ranked availability of power among their top three considerations, demonstrating that, for data centers and other large-scale projects today, interconnection to the grid is the greatest hurdle a project must clear – almost a de-facto market-clearing constraint for new build. According to Sightline Climate, 110 data center projects were slated to come online last year, but 26% of them are delayed, and 10% quietly shifted their CODs back as power, permitting, and construction constraints dampened some ambitious timelines.
Accelerating time-to-power thus becomes our highest-leverage point for value creation in our energy ecosystem. Edge compute is one of the most direct expressions of that thesis. Where hyperscale buildout is gated by interconnection queues and multi-year permitting cycles, metro-edge inference clusters deploy within existing infrastructure, drawing on available electricity headroom at sites already connected to the grid. The constraint that stalls a gigawatt-scale data center, power access, is precisely what edge compute is architected to sidestep.
The organizations that need inference now, running multi-step agentic workflows, deploying surgical robots, automating factory floors, cannot wait. The time-to-power conundrum demonstrates a clear argument for close-distance compute on the edge that can fulfill needs that hyperscale facilities cannot.
The market is already moving to the edge
Per Gartner, 75% of enterprise data will be captured at the edge in 2025, up from 25% in 2018, and, by 2028, more than half of enterprise-generated data will be processed outside the data center or cloud. What started with AI video (surveillance, smart city applications) is expanding rapidly to drones, robotics, autonomous systems, and industrial manufacturing. Enterprises deploying real enterprise AI for ops, compliance, customer service can’t tolerate cloud-routed latency. The edge story is bigger than physical systems alone.
The clearest signal may be coming from what cloud providers and AI companies are comparing notes on within public forums. At AWS re:Invent in December 2025, for instance, multiple sessions were dedicated to deploying generative AI at the edge and to agentic AI for industrial automation. When the world’s largest cloud providers are designing conference tracks around edge inference, the direction of travel is clear: inference is moving to the edge.
Enterprise demand for that shift comes down to edge compute’s core value propositions – of which lower latency, greater data sovereignty, and closer proximity are most top-of-mind based on where the market is moving today. Together, these value propositions underpin what makes edge compute the fundamental enablement layer of our physical AI revolution.
On Latency
Low latency is no longer optional for agentic systems. When a single AI agent executes a multi-step workflow (searching, reasoning, calling tools, generating outputs), each hop incurs latency. A 40 to 60 millisecond penalty per hop compounds across ten steps into seconds of delay. Agentic architectures make inference latency exponentially more consequential than single-turn queries. For physical AI systems, the requirement is even more acute. Surgical robots, autonomous systems, and industrial controllers depend on response times that cloud-routed inference cannot reliably deliver.
On Sovereignty
Data sovereignty is becoming a hard requirement, not a preference. NVIDIA and Palantir recently announced a partnership on sovereign AI data center architecture specifically for sensitive use cases where public data centers are not a viable option. The drivers are well understood: regulatory compliance, reduced attack surface, and data localization mandates. For defense, national security, and other high-sensitivity verticals, sovereign inference infrastructure is increasingly non-negotiable.
Physical AI systems combine both demands simultaneously. Surgical robotics, for instance, offer a clear illustration: they require latency-sensitive inference to minimize disruption, while operating on healthcare data that demands the highest standards of privacy. Meeting that combination requires inference that is fast and sovereign at the point of action.
On The Metro-Edge
Meeting these demands at scale requires rethinking where inference compute lives. Metro-edge deployment (positioning infrastructure within city centers, close to end users and physical systems) closes the latency gap while enabling the sovereignty and proximity that enterprise use cases require.
The challenge is that today’s approaches require significant adaptation. Existing solutions demand new power infrastructure, large footprints, thermal retrofits, or deep integration with carrier networks. Most are optimized for deployment scales that do not fit the sub-megawatt, distributed clusters that metro-edge inference requires.
The right model inverts this logic. Metro-edge inference compute should meet organizations where they are, minimizing change management burden while delivering the performance in speed, privacy, and reliability that these use cases demand.
An Example: Agentic Trading
Algorithmic and agentic trading makes the case in concrete terms. These systems require end-to-end execution latency in single-digit milliseconds — and for certain strategies, microseconds. The window between signal generation and order execution is where alpha is made or lost, and that precision is what separates a profitable fill from a costly one.
At the same time, those systems operate on some of the most sensitive data in existence, subject to SEC, FINRA, and institutional data governance requirements that make centralized, cloud-routed processing untenable.
Here, it is clear that speed and sovereignty are both non-negotiable, and they must be satisfied at the same point of action. Metro-edge inference is the only deployment model that satisfies both simultaneously, while also extending consistent performance to regional and mid-market participants that lack the co-location access that centralized compute assumes.
What the market is building
The market is ripe with innovators solving for metro-edge deployment use cases.
Highlighting a few select players building out edge inference:
Crusoe Spark is building modular edge compute with low-latency, sovereign AI infrastructure. Operating at hundreds of kW per deployment, their model is closer to distributed generation than traditional edge, enabling true metro-edge placement at scale.
NVIDIA is partnering with telecom leaders through its AI Grids initiative to build distributed edge inference networks. Activating telco real estate at scale, however, requires operators to retrofit sites around compute: new power infrastructure, thermal management, and multi-tenant isolation layered on top of live carrier networks, representing substantial change management.
Cologix operates purpose-built hyperscale edge facilities across 12 North American markets and has moved toward managed AI compute through partnerships with Lambda and Supermicro. Their model is optimized for large-scale deployments: their Columbus facility alone spans 500,000 square feet and 80 MW of power. The sub-megawatt, distributed inference cluster sized to existing infrastructure sits outside the scale their model is designed to serve.
The opportunity
There is a clear opportunity in edge inference at the metro-edge, and the organizations that win the next phase of the AI economy will be those that close the gap between where inference runs and where its outputs need to act. Building the infrastructure to close it at scale is one of the most compelling opportunities in enterprise AI infrastructure today.
At Montauk Capital, we're actively investing in the infrastructure layer that makes that possible – reach out if you’re keen to compare notes.





