The value of a Service Mesh — Istio metrics

Dejanu Alex
3 min readOct 6, 2022

--

In this day and age if you’re using Kubernetes it’s rather impossible that you have not interacted in one way or another with the concept of a service mesh.

Kubernetes already has a very rudimentary “service mesh” out of the box provided by the Service resource which has a basic service discovery and a round-robin balancing of requests but lacks retries and back-off logic.

What is a service-mesh?

Service mesh is a dedicated infrastructure layer built right into an app, which enables service-to-service communication. As the complexity of the solution grows a service mesh becomes more valuable because it allows the separation of service-to-service communication and business logic.

Examples of service mesh implementations are Linkerd, Conduit, Envoy, and Istio(built on top of Envoy).

Usually, Ingress might pop up when talking about Service Mesh, therefore I will share this literary one-minute guide…Ingress in one minute, and also for a more in-depth understanding of the purpose and implementation of Istio check what is a service mesh.

How to deploy a service-mesh?

A service mesh can be deployed as a DaemonSet or as a Sidecar container, often the latter being more widespread.

Sidecar pattern is a design pattern in which another container is deployed that runs alongside your application container in order to provide different functionalities such as tracing, security (SSL), or services mesh (Istio).

Istio metrics

Istio metrics are a good entry point when it comes to understanding what’s happening within the service mesh.

Istio generates telemetry for all service communications within a mesh. An Envoy proxy sidecar is deployed along with each service, which subsequently adds the workload to the mesh and automatically provides metrics, logs, and traces.

I’m using Prometheus and Grafana stack in order to scrape and visualize the metrics. In an Istio mesh, each component exposes an endpoint that emits metrics, we’re going to focus on out-of-the-box Istio Standard metrics.

HTTP, HTTP/2, and GRPC traffic metrics
  • Leveraging istio_requests_total service-level counter metric, we can use response_code and destination_app labels to create a simple pie chart or bar gauge visualization, to have a global view of our service with regard to status codes.
# count the error response code per app
count(rate(istio_requests_total{response_code=~”50*”}[10m])) by (app_name)
# check the response codes for an specific app
count(rate(istio_requests_total{destination_app="$destination"}[10m])) by (response_code)
Grafana dashboard

We can verify the duration of the request using istio_request_duration_milliseconds_sum{source_app=~”$source”,destination_app=~”$destination”}

Grafana dashboard

Even more, we can dig further into the proxy-level metrics, envoy sidecar proxies generate a rich set of metrics about inbound and outbound traffic passing through the proxy, therefore we can use envoy_cluster_internal_upstream_rq.

Upstream Requests

This is only the tip of the iceberg, the service mesh provides distributed traces for the services within the cluster and can also generate access logs for service traffic, not to mention that a service mesh is crucial when it comes to establishing SLA.

--

--

Dejanu Alex
Dejanu Alex

Written by Dejanu Alex

Seasoned DevOps engineer — Jack of all trades master of None

No responses yet