Mesh Infrastructure Tier
Cloud Clustering and Service Mesh concepts
Cloud native computing has led to the rise of two key concepts for scaling and managing the complexity of literally millions of compute services running in Cloud data centres today. They are Cloud native Clustering (Kubernetes) and the Service Mesh.
On the face of it, the two technologies are distinct.
Clustering is used for scale within organisational boundaries, performance, failover, continuity and resilience across multiple data data centres. Kubernetes has become the defacto standard for enterprise clusters beyond the CSP severless offerings. Clustering infrastructure is typically maintained within an orgnaisation via k8s Cluster administrators.
Service Mesh is for complex service interactions between clusters and data centres, with dynamic discovery and routing to endpoints, and workload identity / authentication and mTLS transport security built in as a default. In the case of openElectric, the service mesh infrastructure, whilst it operates in multiple clusters, is maintained via Service Mesh adminstrators.
Whilst there are a number of areas of duplication in these tech stacks (e.g. credentials management and service discovery), the openElectric approach is to treat them as addressing separate concerns but must work together when building compute and networking infrastructure for the broader electricity supply chain architecture.
The Infrastructure planes
OpenElectric has defined a Cloud and IoT infrastructure layer that enables a collection of applications and microservices across orgnaisations in trusted supply chains to discover other services, and communicate securely between them.
The first four set of functions are references to existing Cloud native implementations used in production widely today - Kubernetes custering, Envoy proxy for mesh data layer, Hashicorp, Istio and Kong for mesh control plane, and so on.
-
The mesh data plane is the machinery that performs the secure routing of traffic (data) between microservices, through any conceivable topology across different IP networks, data centres and geographies.
-
The mesh control plane manages the infrastructure for traffic delivery; namely identity, access, discovery, reliability, redundancy, cyber, resilience, as declared by the mesh administrators.
-
The cluster data plane is the machinery that runs the compute resources for applications and associated tooling infrastructure, scaling horizontally and dynamically as required.
-
The cluster control plane represents the infrastructure needed to dynamically manage the compute cluster, including current and target deployment state management, autoscaling and instrumentation and audit of changes
The two new concepts that are unique to openElectric relate to the deep integration of IoT resources into the Cloud security model across the infrastructure, and a structured policy language that defines tighter controls around the use of Cloud workloads and IoT, and also for control of versions and change management within its components. They are;
-
The policy plane is the security authorisation model that all services in the supply chain must meet in order to be authorised to operate, as declared by both applications and mesh administrators.
-
The identity plane defines the machine identity systems that define workloads, IoT endpoints, along with verified claims that become the central reference for policy rules to be applied across the infrastructure.
In today's Cloud services development processes, there is desire for a separation of infrastructure layer from application microservice, which enhances portability between, separates responsibilities and simplifies development, testing and deployment processes.
The mesh data plane
The Data Plane moves and routes requests and events from one application workload to another. It is the traffic that drives on the roads - the functional parts of a service mesh.
Traffic Management
The key objective of the mesh architecture is to move the networking software previously found in proxies, firewallls, routers and load balancers, and move it into highly dynamic and configurable software infrastructure.
Important data traffic features are carried out by data plane proxies, which includes;
-
request routing of data to other service endpoints
-
ingress and egress gateways
-
load balancing
-
traffic shifting
Transport security
The second key criteria of the service mesh architecture is that all communications between microservices are conducted through secure mutual transport layer encryption (mTLS).
The key features for all service communications in the mesh are two way authenticated transport encryption (mTLS).
Certificate management rotation is typically managed by the control plane, either via credential rolling processes or ephemeral certificate generation (e.g. SPIFFE).
Observability
Horizontally scaled, distributed microservices are complex, and require robust systems for the recording and publishing of operational data.
Key features for all components operating in the mesh are to continuously publish information to preconfigured locations, such as;
-
access logging
-
performance metrics
-
request tracing
The mesh control plane
The Control Plane is the underlying infrastructure that operates the mesh, and its configuration. It is the road infrastructure that traffic can then drive on.
Global Configuration
The control plane is responsible for managing the state of a wide area network of distributed services, both the current and target configuration state of the mesh infrastructure.
This includes the configuration of data plane services, including traffic request routing and load balancing, credentials for transport security, and configuration of logs and metrics for observability of the system.
Service Registration
The control plane is responsible for the configuration of microservices for their discoverability and routing across data centres and separate organisations.
The registration of service workloads serves three important functions;
-
to inform workload identity system of data centre and host provider to attest its identity.
-
to register service meta data into a directory, enabling lookup and routing to the service
-
to register authorisation policies that determine who will be granted access to the service.
This also works in reverse - services may be deregistered, identity and certificates revoked, and policies constrained.
IoT Registration
Similar to the registration of workloads, to maintain a zero trust policy with regard to IoT devices, devices must be registered - and deregistered - within the IoT identity systems.
In implementations today, these activities are conducted at the Domain layer. In time, abstractions at the Mesh Infrastructure layer should allow for registration and deregistration, IoT identity attestation, and IoT authorisation policies via domain layer exposed APIs.
The cluster data plane
OpenElectric defaults to Kubernetes as the compute resource (CPU, memory, storage) container and clustering environment for running openElectric components, as the most widely used clustering technology - ~70% of enterprises - around the world.
The data plane in Kubernetes refers to the Nodes and Pods that manage the containers in each physical machine, and containers for the most part refer to Docker containers - the container image format of choice for openElectric.
For more detailed information on Kubernetes, and the the operation of the data plane, see the k8s documentation site here
The cluster control plane
The k8s control plane refers to the control mechanisms that are used to orchestrate the containers to establish clusters, and to autoscale them as more horizontal scalability is required.
The core components are;
-
the cluster API that control the desployment current and target states via programmatic API or CLI interfaces (e.g kubectrl)
-
the workload autoscaling and scheduling functions that manage the horizontal scalability of workoads to nodes and pods based on resource requirements.
For more detailed information on the use of Kubernetes for deploying and scalign applications, see the k8s documentation site here
The identity plane
In the openElectric model, there are two types of machine identities that are used to identify both Cloud workloads and IoT endpoints. In some cases, the two are interrelated.
Workload Identity
For workloads to communicate with mutual authentication, they require certificates at both ends of the communication channel.
One option is to install and rotate these certificates as often as practical, which usually means each time the system is restarted. The second option is via ephemeral identity certficates, as defined in the SPIFFE workload identity specification. Both options are possible in the openElectric architecture, favouring the latter as a simpler, more secure option in the long term.
Some SPIFFE configurations allow the use of JWT certificates also allows the ability to embed claims or meta data into the certificate, which can carry useful information such as the software version, that is important in more stringent forms the authentication and authorisation.
IoT Identity
IoT identity is a substantially immature space, with any number of techniques being used, and in most cases IoT to Cloud identity and authentication is done within an organisation.
The cases where any installed IoT device can authenticate with multitple Clouds are usual, but unusually prevalent in the energy and transport sector.
The policy plane
In practice, the security model for the mesh infrastructure tier is centred around the policies that control the authorisations for any form; - use of IoT devices, workload services individual functions, software versions, data routes between data centres and clusters, and so on.
The policy plane is a unique feature of the openElectric design at the mesh infrastructure layer. It is responsible for taking policies from the Domain Infrastructure API, and ensuring their activation, and providing a compliance audit trail across the runtime infrastructure.
Policies are be maintained using a policy markup language that enables the expression of rules that allow and deny access to elements across the infrastructure layers. These policies can be modified during runtime, delivered via the mesh to the data plane, and can immediately enforce policy without the requirement for a restart.
IoT Access Policy
The introduction of IoT identities within the Cloud ecosystem requires the corresponding establishment of authorisation policies that govern their use. These are typically implemented at the ingress and engress points for control services.
The key functions of an IoT policy are to manage which workloads are able to read data, and send control signals or configuration information to the device. In the energy domain, writing of any state is an action that needs to be closely monitored and contained to least privilege.
Workload Access Policy
Workload access policies are implemented by the data plane proxies, and determine whether certain workloads are allowed to route to one another. This applies to north-south traffic in the case of mesh services, or east-west traffic in the case of cluster workloads.
Finer grained policies for access at a function / method level of workloads can be achieved using path based policies. Additional header information can also be used to tighten policy rulesets such as flagging of suspicious requests.
Version Usage Policy
Version policy is a concept that is unique to openElectric, which is focused on the registration and validation of software version identifiers in communications between workloads, and workloads to IoT devices.
The principles of version tracking comes the from the requirement of a rigorous change management process at the mesh infrastructure layer. This means that a service may only be authorised to communicate with specific versions of software from a third party workload or IoT device, providing additional protections against unverified changes to firmware in the field.
,