Can Upscale AI Democratize Networking with $100M Seed?

Can Upscale AI Democratize Networking with $100M Seed?

I’m thrilled to sit down with Matilda Bailey, a renowned networking specialist whose expertise in cellular, wireless, and next-gen solutions has positioned her as a thought leader in the rapidly evolving world of AI infrastructure. Today, we’re diving into her insights on the groundbreaking work happening at Upscale AI, a company that’s challenging the status quo in AI networking with an open, standards-based approach. Our conversation explores the unique demands of AI-driven workloads, the power of leveraging open standards, and the strategic vision behind a full-stack integration model that aims to democratize access to high-performance networking. Let’s get started.

Can you tell us what sparked the idea behind a company like Upscale AI and the journey that led to identifying a gap in AI networking?

Absolutely. The inspiration for Upscale AI came from observing a seismic shift in data center demands, especially with the rise of AI workloads. My background working on cutting-edge infrastructure projects exposed me to the limitations of existing networking solutions. While at previous ventures, we kept hearing from customers about their struggles with scalability and latency in AI-driven environments. They needed something purpose-built for GPU-to-GPU communication, not just the traditional CPU-centric setups. That feedback, combined with seeing how proprietary systems locked users into specific ecosystems, made it clear there was a need for an open, flexible solution that could handle the unique challenges of AI.

How do AI workloads fundamentally change the requirements for networking compared to traditional setups?

AI workloads are a game-changer. Unlike traditional CPU-to-CPU or CPU-to-storage communication, AI relies heavily on GPU-to-GPU interactions, which demand incredibly high bandwidth and ultra-low, predictable latency. It’s not just about moving data quickly; it’s about ensuring that every packet arrives with consistency because even small delays can bottleneck training models or inference tasks. These workloads often involve memory semantic load-store operations, which are far more intensive than older models. So, the network has to act almost like an extension of the compute fabric itself, something traditional Ethernet or legacy systems just weren’t designed for.

Why is predictable latency such a critical factor for AI networks compared to older architectures?

Predictable latency is everything in AI because these workloads are highly synchronized. When you’re training a massive model across thousands of GPUs, any variability in latency can cause inefficiencies, slowing down the entire process. In older CPU-to-storage setups, occasional delays might not be a big deal—you’re often just waiting for data retrieval. But with AI, GPUs are constantly exchanging data in real-time, and unpredictability can throw off the whole computation. It’s like trying to conduct an orchestra where every musician needs to hit their note at the exact same moment; even a tiny delay disrupts the harmony.

Upscale AI is taking on giants in the AI networking space with an open, standards-based approach. What sets your strategy apart from closed, proprietary systems?

Our core differentiator is openness. Proprietary systems, while effective within their own ecosystem, often limit customer choice and flexibility. They lock you into a specific vendor’s hardware or protocols, which can stifle innovation and drive up costs. Our approach at Upscale AI is to build on open standards, creating a network that any compute vendor or hyperscaler can plug into seamlessly. This means customers aren’t tied to one provider—they can mix and match components as needed. It’s about fostering an ecosystem where innovation happens faster because barriers to entry are lower, and collaboration is easier.

Can you walk us through how open standards are being leveraged in Upscale AI’s technology stack?

Certainly. We’re building on several key initiatives to create a robust foundation. For instance, we use SONiC, which is a network operating system designed for open networking in the cloud, allowing for customizable and scalable solutions. Then there’s the Ultra Ethernet Consortium specs, which enhance Ethernet for AI by tackling issues like congestion and latency with advanced telemetry. We also incorporate the Ultra Accelerator Link, which standardizes accelerator interconnects to break away from proprietary dependencies. Finally, the Switch Abstraction Interface plays a crucial role by providing hardware abstraction, making it easier to integrate diverse hardware into our systems. Together, these standards form a powerful, flexible base for AI networking.

What specific enhancements are you making to these open standards to meet the demands of AI scale-up?

We’re not just adopting these standards; we’re actively enhancing them for the unique needs of AI. With SONiC and the Switch Abstraction Interface, we’re focusing on optimizations for scale-up scenarios, where you need to handle massive, concentrated workloads. This includes improving how the stack manages high bandwidth and ensures ultra-low latency under extreme conditions. We’re fine-tuning congestion management and telemetry features to make sure the network can predict and mitigate bottlenecks before they impact performance. These upgrades are all about ensuring the infrastructure can keep pace with the exponential growth of AI compute demands.

Why did Upscale AI choose a full-stack integration approach, covering everything from silicon to software?

We went with a full-stack approach because AI networking isn’t a problem you can solve by focusing on just one layer. If you only build software, you’re at the mercy of hardware limitations. If you only design silicon, you miss out on optimizing the system as a whole. By being vertically integrated, we control the entire pipeline—from custom ASICs for high-performance scale-up to the software that ties it all together. This lets us fine-tune every component for maximum efficiency and performance, while also ensuring that thermal management and power usage are optimized. It’s a holistic strategy that gives us the ability to deliver a truly tailored solution for AI workloads.

What advantages do you see in this vertically integrated model compared to specializing in just one area of the stack?

The biggest advantage is control over the end-to-end experience. When you’re vertically integrated, you can eliminate inefficiencies that crop up when different vendors’ components don’t play nicely together. For example, we can design our silicon with specific software optimizations in mind, or vice versa, ensuring everything works in sync. It also speeds up innovation—there’s no waiting on a third party to update their part of the stack. For customers, this translates to a more reliable, high-performing solution that’s built specifically for AI, rather than a patchwork of disparate technologies that might not align perfectly.

What’s your forecast for the future of AI networking, especially with the push toward open standards?

I believe we’re on the cusp of a major transformation in AI networking. Open standards are going to be the catalyst for widespread adoption and innovation in this space. Over the next few years, I expect to see a significant shift away from closed, proprietary systems as more companies and hyperscalers realize the cost and flexibility benefits of open solutions. We’ll likely see networks become even more integral to compute, almost blurring the line between the two, as AI workloads grow in complexity. My forecast is that democratization through open standards will drive faster advancements, making high-performance AI infrastructure accessible to a broader range of organizations, not just the tech giants.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later