Enterprises moving from on-premise environments to the public cloud is fast becoming a reality rather than an aspirational goal. After embarking on this cloud adoption journey, enterprises shortly realize the need to evaluate multiple public clouds due to 1) the fear of being locked to a single provider for resiliency and cost considerations, 2) choosing the cloud that may be better suited to serve a particular application, or 3) competitive business objectives.
Application developers and architects are primarily leading the transition to multi-cloud. They are usually keen to adopt cloud-based solutions based on how they fit the application requirements. It is often the case that they are able to convince internal stakeholders to leverage multiple clouds. For application developers and architects, life is still somewhat simple. They are only required to learn the functionality for a particular cloud whereas network engineers and architects, not only have to learn the functionality, but also have to figure out how it effects and integrate with their existing cloud and on-prem network architectures. Constant influx of new networking features, across public cloud providers, make it even worse to keep yourself updated.
In light of this, we are starting this Alkira technical blog series where we will review, analyse and share our perspective on different topics related to multi-cloud network architectures. We are kicking off this series with this blog where I will cover cloud networking basics and share different cloud networking concepts and architectures that apply to all the major public cloud providers.
Cloud Networking Basics
The most fundamental concept in cloud networking is the virtual representation of the physical data center inside a single public cloud or across multiple public clouds. This virtual construct can host compute nodes, IP address subnets, firewalls, access lists, and routing elements. In essence, you pretty much can treat it as a datacenter in the cloud. For instance, you can set up subnets for different parts of the applications like web services, databases, backend systems etc. These subnets can be private or public based on application requirements.
Amazon Web Services (AWS) and Google Compute Platform (GCP) call this construct Virtual Private Cloud (VPC), whereas Microsoft Azure calls it Virtual Networks (VNet). For AWS and Microsoft Azure, the scope of this construct is limited to a single cloud region, while for GCP the scope is global. With GCP you can have multiple subnets and each can be in different regions still being part of the same VPC. However, a given subnet does not span across regions. With global routing enabled, subnets in different regions within the same VPC can communicate using GCP’s backbone. Figure below explains the global VPC concept.
How Do VPC/VNets Interconnect?
Initially, the idea behind offering VPC and VNet constructs is to have a physical on-premise data center equivalent in the public cloud, where system administrators and application developers can manage compute and storage resources. They are assigned a subnet by the network team to deploy virtual machines to run their code and host applications. Network team is also responsible for managing security and internetworking of the subnets for intra and inter VPC/VNet communications.
To interconnect VPCs/VNets the cloud providers offer a peering service. If a customer has VPCs/VNets in different regions they can be interconnected in a full-mesh fashion, so they can communicate with each other using cloud backbone. This architecture works really well if the VPCs/VNets are treated as cloud datacenters, which is how cloud providers initially envisioned the service. But in reality developers find it easy to create VPCs/VNets when and where they need it. This is great if you are working in an isolated environment, but more often than not, they will end up needing to interconnect with other VPCs/VNets within the enterprise. As a result, network engineers are asked to figure out a solution to interconnect these private cloud environments.
Problem with peering service is that it does not scale when you have a large number of connections. You can run into peering limits and the sheer complexity of the number of peering connections to manage makes it impossible to operationalize. To bring things into perspective, let’s suppose if you have to interconnect 50 VPCs, you will end up having 1225 peering connections. You can calculate the number of peering connections using the formula n(n-1)/2 where ‘n’ is the number of VPCs/VNets.
This use case gave birth to another network cloud architecture, which is to use some kind of transit connectivity per region and have the regions connect through these transits. Think of it as regional hub-and-spoke networks where spokes only connect to their respective hubs, which are then connected to each other in a full mesh. Before the AWS transit gateway or Azure transit VNet service, public cloud providers did not offer a cloud native service for transit type connectivity between the VPCs/VNets. This allowed traditional networking, security and SD-WAN vendors to offer a solution using their virtual routers, firewalls or SD-WAN appliances that sit in the transit VPC/VNet and control all the routing. Network engineers can download the image from the cloud marketplace, install it inside the transit VPC/VNet and have all the VPCs/VNets within the region terminate into it either using IPSec or VPC peering. This can be extended to other regions by running IPSec and BGP between transit routers. Most of the enterprises are using this solution to connect different regions within the same cloud or across clouds.
How To Connect Public Clouds to On-Prem?
No matter whatever phase of cloud adoption that the enterprises are in, need for on-prem connectivity will always be there. Branches, data centers, and remote sites still require a secure connection into the cloud environment.
There are two types of connectivity that you can have from clouds to on-prem, (i) Public and (ii) Private.
Public Connection From On-Prem
Public connection from on-prem into the cloud is done over the Internet using IPSec and BGP. IPSec connections usually terminate directly into the private gateway inside the VPC/VNet. It can also terminate into a cloud transit like Transit Gateway for AWS and transit VNet for Azure. For transit type connectivity, you can also install a third-party appliance or a router, and transit through it into the cloud. A single IPSec connection in this scenario supports upto 1.25Gbps, however, there are options to build multiple IPSec tunnels with ECMP to get higher throughput.
Private Connection From On-Prem
Private connectivity into the cloud is usually extended from the customer router sitting in a colocation facility or inside the physical data center. All the major public cloud providers have partnerships with service providers and colocations to offer this type of a service. The service is very similar across AWS, Azure and GCP, however, they all have named it differently. It is called Direct Connect for AWS, ExpressRoute for Azure and Dedicated Interconnect for GCP.
There are two main use cases where enterprise would need this type of connectivity.
- Extending private MPLS connection into the cloud so that the enterprise can have the same level of SLA and QoS extended into the cloud.
- High bandwidth connectivity from on-prem data center to the cloud for backups and data replication.
After reading this blog, you must have figured it out that conceptually there are more similarities than differences between different public cloud providers. However, the architecture and implementation details are very different which can easily become overwhelming for network engineers to design and integrate multi-cloud environments. Any enterprise looking to embrace multi-cloud network architecture will either need in-house experts for each cloud or a service, like Alkira, which can abstract the complexities and provide a unified way of connecting into different public clouds.