The value of a Service Mesh — Istio metrics
In this day and age if you’re using Kubernetes it’s rather impossible that you have not interacted in one way or another with the concept of a service mesh.
Kubernetes already has a very rudimentary “service mesh” out of the box provided by the Service resource which has a basic service discovery and a round-robin balancing of requests but lacks retries and back-off logic.
What is a service-mesh?
Service mesh is a dedicated infrastructure layer built right into an app, which enables service-to-service communication. As the complexity of the solution grows a service mesh becomes more valuable because it allows the separation of service-to-service communication and business logic.
Examples of service mesh implementations are Linkerd, Conduit, Envoy, and Istio(built on top of Envoy).
Usually, Ingress might pop up when talking about Service Mesh, therefore I will share this literary one-minute guide…Ingress in one minute, and also for a more in-depth understanding of the purpose and implementation of Istio check what is a service mesh.
How to deploy a service-mesh?
A service mesh can be deployed as a DaemonSet or as a Sidecar container, often the latter being more widespread.
Sidecar pattern is a design pattern in which another container is deployed that runs alongside your application container in order to provide different functionalities such as tracing, security (SSL), or services mesh (Istio).
Istio metrics
Istio metrics are a good entry point when it comes to understanding what’s happening within the service mesh.
Istio generates telemetry for all service communications within a mesh. An Envoy proxy sidecar is deployed along with each service, which subsequently adds the workload to the mesh and automatically provides metrics, logs, and traces.
I’m using Prometheus and Grafana stack in order to scrape and visualize the metrics. In an Istio mesh, each component exposes an endpoint that emits metrics, we’re going to focus on out-of-the-box Istio Standard metrics.
- Leveraging
istio_requests_total
service-level counter metric, we can useresponse_code
anddestination_app
labels to create a simple pie chart or bar gauge visualization, to have a global view of our service with regard to status codes.
# count the error response code per app
count(rate(istio_requests_total{response_code=~”50*”}[10m])) by (app_name)# check the response codes for an specific app
count(rate(istio_requests_total{destination_app="$destination"}[10m])) by (response_code)
We can verify the duration of the request using istio_request_duration_milliseconds_sum{source_app=~”$source”,destination_app=~”$destination”}
Even more, we can dig further into the proxy-level metrics, envoy sidecar proxies generate a rich set of metrics about inbound and outbound traffic passing through the proxy, therefore we can use envoy_cluster_internal_upstream_rq
.
This is only the tip of the iceberg, the service mesh provides distributed traces for the services within the cluster and can also generate access logs for service traffic, not to mention that a service mesh is crucial when it comes to establishing SLA.