Matilda Bailey is a veteran networking specialist at the forefront of the AI infrastructure revolution. With a career dedicated to the evolution of cellular and next-gen wireless solutions, she has witnessed firsthand the transition from general-purpose data centers to the tightly integrated compute systems of the 2026 era. Her work focuses on the delicate balance between high-latency training clusters and the distributed demands of inference, providing a unique vantage point on how AI is fundamentally reshaping the physical and logical layers of the modern facility.
Our discussion explores the bimodal nature of AI workloads and the resulting strain on traditional power and cooling models. We delve into the shift from rack-level design to system-level integration, the necessity of energy storage for stabilizing grid interactions, and the transition toward liquid cooling as a baseline standard. Finally, we examine how prefabrication and campus-wide productization are compressing deployment timelines in an industry where speed is now a primary design requirement.
AI workloads are splitting into tightly coupled training clusters and distributed inference systems. How does this bimodal demand change your floor layout strategies, and what specific networking protocols or proximity requirements are now non-negotiable for these different patterns?
We are no longer building a one-size-fits-all room; we are creating a split-brain architecture. For training, we are clustering tens of thousands of GPUs into massive, high-density blocks where the physical distance between chips determines the speed of light for data transfers. The networking protocols must prioritize zero-packet loss and extreme low latency, often requiring us to rethink row layouts to keep fiber runs as short as possible to ensure these synchronized systems don’t stall. Meanwhile, inference nodes are being pushed to a broader scale within the facility or across different sites to prioritize availability and responsiveness for the end user. This requires a step-by-step layout strategy where we isolate the heavy-duty, synchronized training clusters from the more resilient, distributed inference systems to manage both heat and traffic flow effectively.
With rack densities climbing from 40 kW toward megawatt levels, facilities must support both legacy hardware and high-density AI systems simultaneously. How are you managing this bimodal power delivery, and what higher-voltage distribution models are proving most effective?
The jump from the 30–40 kW racks we managed a decade ago to today’s megawatt-scale designs is nothing short of a seismic shift in engineering. Managing this bimodal power delivery feels like trying to run a vintage lightbulb and a particle accelerator on the same circuit simultaneously. To handle this, we are increasingly looking at higher-voltage distribution models that bring power much closer to the chip to reduce losses and bulk. It is no longer about just cooling a rack; we are designing integrated systems where the power delivery is baked into the very fabric of the compute and network layers. This necessitates a hybrid approach where legacy storage systems sit adjacent to AI clusters that draw more power than an entire small office building.
AI training clusters create sharp, dynamic load patterns that can impact utility power plants. How is energy storage being used to smooth these fluctuations and maintain power quality, and what specific grid requirements are becoming mandatory for new builds?
AI training is not a steady hum; it’s a series of violent, sharp, and dynamic load patterns that can actually be felt by the utility power plant miles away. When a cluster of tens of thousands of GPUs ramps up simultaneously, the sudden surge in demand can destabilize local power quality significantly. To mitigate this, we are integrating massive energy storage systems that act as a buffer, soaking up excess energy and discharging it instantly to smooth out these sharp spikes. These systems are also becoming mandatory for “ride-through” capabilities during voltage events, ensuring the grid stays stable while the data center operates at peak performance. It’s a technical process of monitoring real-time workload behavior and adjusting power intake to protect the broader electrical infrastructure from the ripple effects of high-intensity compute.
Liquid cooling has shifted from a niche option to a baseline requirement, yet many facilities must still operate hybrid air-cooled environments. What are the primary hurdles in standardizing these cooling systems, and how can operators engineer out water usage while scaling?
The debate over liquid cooling is effectively over; it is now the baseline requirement for anything involving high-performance AI clusters. However, the transition is messy because we are still managing hybrid environments where liquid-cooled racks sit right next to traditional air-cooled infrastructure. One of the biggest hurdles is the sheer number of connections required inside these dense racks and the immense strain it puts on the component supply chain. We are also under intense pressure to engineer out water usage entirely because scaling evaporative cooling at hyperscale is becoming a major sustainability and operational risk. This forces us to move toward more complex closed-loop systems that can reject massive amounts of heat without draining local municipal supplies.
The demand for AI capacity is forcing much shorter deployment timelines despite increased infrastructure complexity. How does shifting work off-site through prefabrication change the commissioning process, and what modular design choices allow you to stay flexible across future GPU generations?
With the demand for AI capacity exploding, we’ve had to throw the old construction playbooks out the window in favor of extreme speed and standardization. We are shifting as much work as possible off-site through prefabrication and factory integration, which allows us to commission systems in a controlled environment before they even reach the site. This modular approach isn’t just about speed; it’s about front-loading the design to ensure we have flexibility across future GPU generations that might have entirely different power or thermal signatures. By using factory-integrated components, we can reduce the chaotic on-site labor and assemble a fully functional data center in a fraction of the time it used to take. It is a total workflow transition from traditional construction to a model of rapid assembly and optionality.
Large-scale developments are moving away from individual building optimizations toward treating the entire campus as a single integrated product. How does this shift affect supply chain coordination, and what are the trade-offs when syncing infrastructure and compute deployment?
We have stopped looking at the individual data center building as the final product and are now treating the entire campus as a single, integrated machine. This shift requires a level of supply chain coordination that is unprecedented, as we have to sync the delivery of infrastructure with the arrival of the compute hardware itself in very tight increments. The trade-off is that any delay in one area can bottleneck the entire campus, but the benefit is a synchronized rollout where power, cooling, and chips come online together rather than in phased cycles. It forces us to balance flexibility with scale, ensuring that every acre of the development is optimized for the specific demands of both training and inference workloads. This operational impact means we are managing a complex, high-pressure ecosystem rather than just a collection of server rooms.
What is your forecast for AI infrastructure?
I believe we are entering an era of systemic redesign where the traditional boundaries between the grid, the facility, and the chip will vanish. We will see the industry move away from incremental improvements and toward radical new architectures that question every legacy assumption we’ve held for the last 20 years. Facilities will become even more autonomous and tightly integrated with the power grid, serving as both massive compute hubs and essential stabilizers for renewable energy. For the next generation of engineers, the mandate is clear: if a traditional design choice doesn’t make sense for the scale of AI, discard it and build something that does.
