Matilda Bailey, a veteran networking specialist with an eagle-eye for the evolving architecture of cellular and wireless solutions, has spent the last decade navigating the shifting tides between local hardware and the boundless cloud. As enterprises scramble to integrate artificial intelligence into their core operations, Matilda provides a critical perspective on the “where” and “why” of model deployment. Her expertise lies in the physical and virtual plumbing that allows data to flow—or prevents it from doing so—highlighting that the most sophisticated AI model is only as effective as the environment in which it lives.
The conversation centers on the strategic reshuffling of AI workloads, moving beyond the simple binary of cloud versus on-premises. Key themes include the sobering reality of AI’s financial impact, where only a tiny fraction of companies are seeing real returns, and the rising popularity of hybrid deployment patterns. We explore the tactical reasons why a bank might keep its data locked in a private data center while a marketing firm scales rapidly on public infrastructure. Throughout, we address the sensory and operational realities of managing GPUs, the critical nature of latency in manufacturing, and the delicate balance of data sovereignty in an increasingly multi-cloud world.
The 2025 State of AI report by McKinsey suggests a staggering disconnect in the industry, noting that only 6% of organizations have reached “high performer” status with an EBIT impact of over 5%. Based on your observations of infrastructure strategy, why is such a small fraction of businesses managing to turn AI into a genuine financial engine?
The disconnect usually stems from a fundamental misalignment between the AI workload and its physical home. Many organizations treat AI deployment as a “one size fits all” cloud migration, but when you look at that 6% of high performers, they are the ones treating infrastructure as a strategic lever rather than a utility bill. If you are pushing massive datasets through a public cloud without considering the egress costs or the governance constraints of sensitive data, your margins evaporate before the model even finishes its first inference. High performers are meticulously matching their workloads to environments that minimize latency and maximize control, ensuring they aren’t just running AI for the sake of it, but running it where it is most cost-efficient. We often see projects stall because the business didn’t account for the 2025 reality: AI success isn’t just about the code, it’s about the governance, performance, and scalability of the underlying pipes.
When we talk about bringing AI on-premises, people often picture dusty server rooms, but you describe a much more sophisticated spectrum including private clouds and edge sites. What are the tangible, sensory realities and operational burdens a company faces when they decide to own their AI hardware?
Deciding to go on-premises is a major commitment to the “tactile” side of technology, where you are suddenly responsible for the physical hum and heat of high-performance GPU clusters. You aren’t just clicking a button in a console; you are managing server procurement, selecting specific GPU architectures, and designing cooling systems that can handle the intense thermal output of deep learning. There is a certain weight to it—a literal weight in the racks—and an operational responsibility that involves local data integration, resilience planning, and constant hardware refresh cycles. You also have to staff up for MLOps and patching lifecycle management, which can feel like a heavy lift compared to the abstraction of the cloud. However, for a business sitting on proprietary data or factory equipment, that physical proximity provides a level of control and low-latency operational flow that no multi-tenant cloud environment can replicate.
Cloud-based AI is often praised for its elasticity, yet it introduces what you call “cloud concentration risks” and cost complexities. How should a business navigate the trade-off between the ease of SaaS-based AI and the potential for vendor lock-in?
The cloud is an incredible playground for experimentation because it offers immediate access to foundation model APIs and managed PaaS tiers without the upfront capital investment of a data center. You can feel the speed of innovation when your team can spin up a thousand GPU instances for a weekend of testing and then shut them down on Monday. But that convenience comes with a hidden complexity in identity and access management, as well as the looming shadow of cloud concentration where your entire AI strategy is tied to a single provider’s roadmap. To navigate this, businesses are increasingly looking at multi-cloud AI strategies to maintain a sense of “data portability” and resilience against outages. It requires a more mature operational model—often needing new hires specifically to oversee multiple platforms—but it prevents that trapped feeling where your costs spiral and you have no leverage to move your workloads elsewhere.
Hybrid AI architecture seems to be the “middle way” for many enterprises. Could you walk us through a scenario where a company might use the cloud for the “brain work” of training but keep the “reflexes” of inference local?
This is one of the most exciting patterns we see, particularly in the “Cloud training with local inference” model. Imagine a company building a custom model: they use the massive, elastic compute of a hyperscaler to handle the heavy lifting of training and deep fine-tuning, where thousands of chips are grinding through data for weeks. Once that model’s weights are finalized, they export it and deploy it onto local edge hardware or on-premises servers for daily production. This gives them the power of the cloud for the “heavy lift” but ensures that their daily operations are fast, private, and don’t require a constant, expensive heartbeat to a distant data center. We also see “Cloud bursting,” where a business runs its baseline training on its own dedicated HPC clusters but taps into the public cloud only when computation demands spike beyond their physical capacity. It’s a very fluid, pragmatic way to manage resources that balances the steady-state cost of ownership with the peak-demand flexibility of the cloud.
In sectors like manufacturing or finance, the choice of deployment isn’t just about cost—it’s about milliseconds and mandates. Why is the physical location of the AI so critical for a factory floor or a credit-scoring system?
In a high-speed manufacturing environment, latency isn’t just a metric; it’s a physical constraint that determines if a defect detection system can stop a moving assembly line in time. If you are using computer vision to spot a hairline fracture in a part, a three-second delay for a round-trip to the cloud could mean twenty more defective parts have already passed through the station. By running AI inference close to the machine, you eliminate the dependency on external connectivity and ensure the system is as reliable as the mechanical parts it monitors. Similarly, in banking, a real-time fraud detection engine needs to process personally identifiable information and transaction histories against internal risk models within milliseconds. Keeping that workload on-premises or in a private cloud isn’t just about speed; it’s about data sovereignty and the strict governance required when dealing with sensitive, regulated data that simply cannot be exposed to a multi-tenant environment.
For a mid-sized company that doesn’t have a massive internal engineering team, the prospect of managing their own AI infrastructure can be daunting. Is the cloud the only viable path for them, or is there a way for them to achieve high-performance AI without the overhead?
For companies that lack deep operational maturity, the cloud is often the most logical starting point because it abstracts away the “drudgery” of power, cooling, and hardware maintenance. Managed cloud services allow a marketing or sales team to add AI to their workflows—like customer service bots or sales analytics—without needing a single rack in their office. However, “avoiding overhead” shouldn’t mean “avoiding strategy.” Even a mid-sized firm should be looking at how they can maintain control over their data pipelines so they aren’t completely at the mercy of a single SaaS provider. They can start with managed MLOps tools and APIs, but they should keep a keen eye on their data lifecycle management. The goal is to get the results of AI without building a data center, while still ensuring that if they need to move their “intelligence” to a different environment later, the data isn’t so deeply entangled that migration becomes impossible.
What is your forecast for the evolution of the “Edge-to-Cloud” continuum over the next few years?
I expect we will see a significant shift toward “decentralized governance,” where the cloud acts less as the primary engine and more as the orchestrator for thousands of distributed local sites. We are moving toward a world where MLOps teams will use a centralized cloud platform to track model drift and manage updates, but the actual “thinking” of the AI will happen on the factory floor, in the hospital room, or inside the retail store. This will be driven by the need for ultra-low latency and continuous offline operation—ensuring that if the internet goes down, the AI doesn’t go dark. Businesses will stop asking “Cloud or On-prem?” and start asking “How do I build a seamless fabric that spans both?” This transition will require a new generation of networking specialists who can manage these hybrid flows, ensuring that data is always exactly where it needs to be to deliver the most value at the lowest possible risk.
