Matilda Bailey is a distinguished networking specialist who has spent her career at the forefront of cellular, wireless, and next-generation infrastructure transitions. With a deep technical background in high-performance fabrics, she has become a leading voice on how the integration of advanced routing and automated solutions is reshaping the modern data center. Her insights are particularly valuable as enterprises navigate the complex shift from traditional computing to massive, GPU-accelerated AI clusters that demand unprecedented levels of connectivity.
In this conversation, we explore the strategic evolution of networking from a support function to a core component of the AI stack. We discuss the operational nuances of major infrastructure acquisitions, the deliberate shift toward sovereign and enterprise AI markets, and the role of “self-driving” technologies in reducing total cost of ownership.
Networking is shifting from a support role to a core component of the AI stack. How are high-performance fabrics fundamentally changing cluster scalability, and what specific technical hurdles must enterprises clear when moving from hundreds to tens of thousands of GPUs?
In the past, we viewed networking as the plumbing of the data center, but in the era of large-scale AI, it has become the bottleneck or the enabler of performance. When you scale from a few hundred GPUs to tens of thousands, the sheer volume of east-west traffic can overwhelm traditional leaf-spine architectures, leading to latency that stalls expensive compute cycles. High-performance fabrics allow for non-blocking, high-speed switching that ensures data moves across these massive clusters without friction. The technical hurdles are immense, particularly regarding tail latency and congestion management, because a single delayed packet can slow down the entire training job. Organizations are now forced to treat the fabric as a unified system rather than a collection of independent switches to maintain the 18% or higher efficiency gains required for modern infrastructure.
Integrating a massive routing and data center portfolio into an existing infrastructure business often presents operational friction. What steps are necessary to synchronize diverse product lines with legacy systems, and how does this integration specifically impact the overall margin profile during the transition?
The synchronization process requires a disciplined approach to finding synergies, such as the Catalyst synergies we’ve seen recently, to ensure that disparate hardware and software stacks can communicate through a single pane of glass. You have to move quickly to integrate security and routing—which in some cases has seen revenue jump from just $1 million to $780 million almost overnight—without alienating the existing customer base. During this transition, the margin profile often takes a temporary hit; for instance, we’ve seen networking operating margins at 23.7%, which is slightly lower than previous periods due to these heavy integration costs. However, the long-term goal is to stabilize these margins by phasing out redundant legacy systems and leaning into high-margin software-defined networking and security products that now grow by 114% or more.
While many providers prioritize hyperscale volume, focusing on sovereign and enterprise AI often yields better capital allocation. What are the primary trade-offs of this margin-first approach, and how can a company effectively maintain market share against competitors chasing pure volume?
The primary trade-off is that you won’t see the eye-popping, multi-billion dollar single-client revenue spikes associated with the largest hyperscalers, but you gain much higher margin quality and lower customer concentration risk. By focusing on the fact that more than two-thirds of the AI backlog is now sovereign and enterprise-based, a company can insulate itself from the “race to the bottom” on hardware pricing. To maintain market share, you have to offer a specialized, end-to-end stack that addresses the specific compliance and data sovereignty needs that generic volume-chasers often ignore. This strategy allows for a more sustainable capital allocation, ensuring that every dollar spent on R&D is targeted at high-value enterprise features rather than just subsidizing massive, low-margin server shipments for a handful of giants.
Consolidating server, storage, and financial services into a single reporting segment can streamline operations but may mask individual fluctuations. How does this unified structure improve the delivery of end-to-end AI solutions, and what metrics should be used to gauge its long-term success?
This unified structure, which we see generating roughly $6.3 billion in revenue, allows an organization to stop selling boxes and start selling outcomes, providing a seamless experience from the initial hardware purchase to the financing of the deployment. By grouping servers, which might pull in $4.2 billion, with $1.1 billion in storage and $900 million in financial services, the company can better manage the entire lifecycle of an AI cluster. To gauge long-term success, you shouldn’t just look at top-line revenue, which can fluctuate—it actually dipped about 2.7% in some segments recently—but rather at operating margins and free cash flow. When operating margins improve from 8.4% to 10.2% within that consolidated group, it’s a clear sign that the internal efficiencies are outweighing the challenges of a combined reporting structure.
Networking growth expectations are currently surging by more than 70% in some sectors. What specific enterprise market demands are driving this massive acceleration, and how can organizations ensure their supply chains remain resilient enough to meet such rapid, triple-digit surges in demand?
The acceleration is being driven by the realization that AI is useless without the ability to move data at scale, leading to a staggering 380% increase in data center networking demand for high-performance fabrics. Enterprises are no longer just experimenting; they are deploying production-grade AI that requires campus and branch networking to grow by 42% just to keep up with the distributed nature of modern workloads. To remain resilient, organizations must diversify their component sourcing and maintain “effective operational discipline” in what remains a very dynamic commodity supply environment. This means moving away from single-source dependencies and using their growing free cash flow—now projected to hit $2 billion or more—to secure long-term supply agreements for critical silicon and optics.
High-speed switching and “self-driving” networking solutions are becoming essential for managing modern data center traffic. How do these automated technologies reduce the total cost of ownership for AI clusters, and what is the practical process for implementing them in a hybrid cloud environment?
“Self-driving” solutions leverage AI and machine learning to automate the configuration and troubleshooting of the network, which significantly reduces the human capital required to manage complex clusters. By proactively identifying and fixing congestion before it leads to a “gray failure,” these systems maximize the uptime of GPU clusters that represent millions of dollars in investment, thereby lowering the total cost of ownership per training run. Implementation in a hybrid environment starts with deploying an AI-native management layer that can see across both on-premises hardware and public cloud instances. It’s a practical, phased rollout where you first automate the most repetitive tasks, like VLAN provisioning, before moving into full-scale automated traffic engineering across the entire fabric.
What is your forecast for the AI networking market?
I expect we are moving into a period of sustained, high-intensity growth where networking revenue will likely expand by 68% to 73% annually as the focus shifts from just buying GPUs to actually making them work together. We will see a massive surge in the “middle mile” of the data center, where high-speed switching and routing become the primary differentiators for enterprise performance. As sovereign AI initiatives take hold globally, the demand for secure, high-performance fabrics will outpace the growth of standard server hardware. Ultimately, the winners in this space won’t just be the ones with the fastest chips, but the ones who can provide a self-healing, automated fabric that allows an enterprise to scale from ten to ten thousand accelerators without a hitch.
