Looking Into Observability

Hi, I am Sunny, a technical writer at WhaTap Labs. We have all heard the word 'look into’, which is a Korean word that means to scour, rummage, and look for.

In this article, we are going to look into observability, what it is, how it differs from monitoring, and why you need it.

What is Observability?

Before we talk about observability in software systems, let's look at the origin of the word. The concept of observability was first introduced by engineer Rudolf E. Kálmán. Rudolf defined Observability as follows.

The ability to understand the current state of a system using only its external outputs

Rudolf used the concept of observability to describe control theory in engineering. What does observability mean for software systems?

For a software application to be observable, the following conditions must be met

Understand the inner workings of your application
Understand any system state your application may have gotten itself into
Understand the things above, solely by observing that with external tools
Understand that state, no matter how extreme or unusual

In layman's terms, observability for software systems looks like this.

a measure of how well you can understand and explain any state your system can get into, no matter how novel or bizarre.

In modern software systems, observability is not just about data types or inputs. It's about how we interact with and try to understand the complex systems we manage.

What is the difference between monitoring and observability?

With monitoring, you already have an established baseline of what constitutes a problem. The monitoring system collects metrics, sums them up, and analyzes them. In the process, it detects patterns for predefined issues. You can see this in real time on a dashboard or receive alert notifications when necessary.

Sometimes we rely on our gut instinct as experienced engineers. We debug based on patterns and knowledge we have experienced. When an issue is detected, we compare it to previously known issues for patterns or similarities.

Static dashboards, which are common in monitoring, usually aggregate data for each service. This is great when engineers are trying to understand some aspect of a system-after all, dashboards were originally designed to outline combinations of metrics and notable trends. However, when it comes to debugging new issues, dashboards can be limited.

But what if you do not even know what kind of problems your system will encounter? In English, this situation is called unknown unknowns. In modern distributed system architectures, troubleshooting problem situations is even more challenging. A new request may go through multiple services, and travel across multiple networks. No one can predict what problems will arise, and you may encounter situations you've never encountered before.

Observability is all about uncovering hidden issues. To do this, you need to bring together disparate pieces of information - logs, metrics, events, traces, and more - in one place. You need to be able to contextualize and correlate them. When you have this ability to get a bird's eye view of your system, we say you have visibility.

Why do we need observability?

Aren't metrics and monitoring enough? Yes, it is not enough, and here's the bottom line: traditional monitoring cannot keep up with the increased complexity of modern systems. Modern systems are much more sophisticated and fragmented than ever before. Software developers cannot fully understand modern complex systems with traditional monitoring.

Mainframe vs. microservices

To understand why we need observability, it is important to get a sense of how IT systems have evolved. Ten years ago, the number of users was not as high as it is now - systems and platforms only needed to support a few tens of millions of users. Now, the number of users has grown exponentially. From millions to billions. Consider, for example, Amazon.com. The Amazon website is accessed not only by consumers in the United States, but also by people from all over the world looking for direct sales. During sale seasons, traffic can explode. Netflix is a similar story: when a popular season of a show airs, their existing system can't handle the volume.

The bottom line is that the number of transactions, users, etc. that need to be handled has increased like never before. To keep up with this change in scale, a new system had to be introduced. This is where the cloud comes in. In the cloud, you rent the computing resources you need (servers, databases, etc.) on an as-needed basis. It gives us more flexibility to respond to changing demands.

But it was not easy to move to the cloud, because if you put one big application (Monolithic Architecture) in the cloud in its entirety, it was just too much to manage. So we started breaking it down into smaller pieces to move to the cloud, and that is what we call Microservice Architecture (MSA).

Legacy mainframe applications are huge, complex, monolithic (monitoring) systems made up of many programs, each of which is highly dependent on the others, so to fix a bug in one program, the entire program must be tested.

Microservice Architecture is an architecture that breaks down one large application into many smaller service units. Each microservice can communicate, change, and combine with each other. Together, they compose the entire service.

Many architectures have moved from mainframes to microservices. Services have gotten smaller, and the number of services to manage is much larger. The problem starts here. Let's say a single transaction comes in. This transaction goes through multiple services to be processed. It is important to be able to see the sequence of steps a transaction takes to be processed, so that when something goes wrong, you can figure out where you went wrong and what you need to improve.

At the end of the day, you need visibility to manage the complexity of new microservices. You need to quickly detect potential risks, wherever they may be, and respond in real time. IT systems like cloud and microservices are changing very quickly. Traditional tools cannot keep up with the explosion in volume and velocity. Observability has brought visibility to microservices architectures.

목차

What is Observability?

What is the difference between monitoring and observability?

Why do we need observability?