Observability: Observing your containerized workloads on public clouds

Jaydeep Ayachit
4 min readJun 9, 2021

What is observability?

“Observability is the ability to infer internal states of a system based on the system’s external outputs.”

Over the past few years, and coupled with the growing adoption of microservices, organizations have a critical need to monitor applications. Monitoring is not only about tracking the health of the applications but also to understand complex system behaviour. Additionally, applications and infrastructure generate audit logs and events for key operations performed at data, management and control plane. For an organization, having hundreds of applications deployed as microservices, a significant challenge is to have a holistic view of health of the applications.

“Observability”, is a popular term used nowadays to help understand how the systems are performing and what data they expose. Observable helps to answer the “Why” part of the equation — why the system is behaving in this fashion? Why do I see errors reported across services? Why my system is not performant? and similar ones to name a few. Observability highly benefits cross-functional teams to understand and answer specific questions about what’s happening in highly distributed systems.

The more observable a system, the more quickly and accurately you can identify an issue and its root cause

In this article we look into various options available across public clouds to meet your observability requirements.

Pillars of observability

Logs, metrics and traces are the three pillars of the observability world. Additionally, the end user experience data is important to understand how your applications are performing in the real world.

By adhering to these pillars, observability solutions can give you a clear picture of an individual app in an architecture or the infrastructure of the architecture itself.

Metrics: Metrics can originate from a variety of sources, including infrastructure, hosts, services, applications, cloud platforms and external sources. Metrics typically represent key performance indicators (KPI) for your systems like CPU and memory usage, page faults, HTTP errors etc. Metrics are either represented as counts or aggregated over period of time. Clubbed with Application Performance Monitoring (APM) tools, you can instrument your code to make additional key information available for monitoring.

Logs: Structured or unstructured text that record events that occurred at a specific time. Similar to metrics, logs can originate from a variety of sources including infrastructure, hosts, services, applications, and external sources. Audit and security events are also represented as logs.

Tracing: Tracing enables capturing requests and building a view of the entire chain of calls made all the way from user requests to interactions between hundreds of services.

User experience: User experience is most commonly measured in the form of Real User Monitoring (RUM). Real User Monitoring is a type of performance monitoring that captures and analyses each transaction by users of a website or application. It’s also known as real user metrics or end-user experience monitoring, or simply RUM. It’s used to gauge user experience, including key metrics like load time and transaction paths.

The goal of collecting this telemetry coupled with observability is to improve end-user experience and business outcomes.

Observability in a Microservices architecture

In a microservice-based architecture you can have thousands of components communicating with each-other. Observability tools give developers, engineers, and architects, the ability to observe the way services interact. The observability is essential during all phases of project starting from development till production deployment and support.

Observability capabilities in public clouds

This section summarizes observability capabilities for various clouds. This is focussed on containerized workloads only. As an architect, when you architect the system and deployment topology, you also need to start thinking about how to make system observable and what tools to use to provide required observability to various teams. You can always start small; may be with a pilot; and improve on observability over a period of time. By the time, your application is in production, your teams should have complete visibility and ability to quickly identify problem, troubleshoot issues and keep the system running.

Note: This article is also published on LinkedIn Observability strategies for various clouds for containerized workloads | LinkedIn

If you like this, please follow me and give a clap.

--

--