How I review a codebase: Prometheus

Kenzan + Sourced
7 min readJun 15, 2018

--

by Nicholas Sledgianowski

This post is a walkthrough of what I do when I am starting on a new project and want to figure out whats going on. We will go through the Prometheus project https://github.com/prometheus/prometheus.

I start by scanning the root and skimming the README.md to get a general sense of whats going on.

First, I start by scanning the root and skimming the README.md to get a general sense of what’s going on.

The first thing I see is a folder called .circleci with the commit comment “Add promtool to docker build.” This tells us that the Prometheus project is using circleci and Docker as part of their continuous integration setup. Next, we have a bunch of Go packages config, consoles, etc.

Then there is a documentation folder. If we look inside it there are things like the /dev/api/swagger.json spec for the prometheus api and configuration examples for users.

Next are the root level configuration files. We have the Dockerfile, travis.yml, Makefiles, license, and *.md files.

At this point we know a few things about Prometheus. The project uses Circleci and Travis for build and continuous integration. Docker is involved somehow and the build seems to use Makefiles. What we know can be summarized as:

CI/CD: Circleci + Travis (https://travis-ci.org, https://circleci.com)

Language: Golang

Builds: Make

Runtime: Docker

API Spec: Swagger (prometheus/documentation/dev/api/swagger.json)

Next we will read through the README.md file and take a few notes.

There is a website at prometheus.io.

Prometheus is a service monitoring system that can be configured via rules, can trigger alerts, and is a CNCF (cncf.io) sponsored project. Then we have a list of distinguishing features and an architecture diagram.

Source: Prometheus Github
(
https://github.com/prometheus/prometheus/blob/master/README.md)

Architecture diagrams are great because they give you an idea of which components are important. And they can give you an idea of what the project does. In this diagram, we see core components like ‘Pushgateway’, ‘Alertmanager’, ‘Web UI’, and ‘Prometheus Server’. Then in blue we have ‘Service Discovery’. Service discovery looks important and has a bullet list with names of other software projects like Kubernetes and Consul. A good guess here is that Service Discovery involves integration with 3rd party projects and allows you to make custom integrations if you need them. Now we have enough information to put together a list of core components to track down in the codebase.

Core components:

  • Pushgateway
  • Prometheus Server
  • Alertmanager
  • Web UI
  • Service Discovery

A quick scan of the top level folders finds us the ‘discovery’ package.

The discovery folder has several subfolders with names we recognize like azure, consul, ec2, dns, gce, kubernetes, etc. These match what we saw under Service Discovery in the diagram, and are things we would expect to see in an integration context.

We find a README.md and a manager.go file here. The next step from here is to skim through the manager.go file and its related tests in manager_test.go, then skim a few of the 3rd party integration packages to get a feel for how the integration api works.

There is a lot going on in prometheus/discovery/manager.go so I am going to keep it on the shorter side.

Right off the bat we can see that this manager imports all of the other discovery packages in the same folder. Then below we have some good comments on the Discoverer interface and how it works.

My take on the Discoverer interface is that each discovery provider (Kubernetes, Consul, etc.) gets a channel it can send updates to. Then the discovery provider is expected to send updates to its target group along the channel. So if we are using an EC2 discovery provider, whenever that provider sees a change in the EC2 instances it is watching it will send the Discoverer a message along the channel.

Here we have the Manager struct, and there is a NewManager(ctx context.Context, logger log.Logger) *Manager constructor function. Constructors are normal functions in Go. From the comments we can see that this struct is intended to keep a map of messages from discovery providers.

This function takes a struct that meets the Discoverer interface and starts it as a go routine via the go keyword.

The func (m *Manager) providersFromConfig(cfg sd_config.ServiceDiscoveryConfig) map[string]Discoverer ethod is what pulls the list of providers out of configuration. It’s a long function with a lot of if blocks in it which check for Providers of each type and add them to the list. There is a bit more going on here, but we will skip forward into the provider code.

EC2 Discovery Provider
prometheus/discovery/ec2/ec2.go

If we head into the ec2 folder we find just one file, ec2.go.

Looking at the imports we find the aws-sdk, which we would expect since that is the standard way to interface with AWS services from code.

Here we can see that the ec2 discovery provider is responsible for parsing its own configuration from yaml.

Next we see an init hook that registers some counter hooks that I skipped when I saw them earlier in the program.

These counters look like they keep track of failures to pull data from AWS and the time it takes to pull the data when it works.

Here we have the ec2 Discoverer struct which meets the Discoverer interface in manager.go and its constructor function. Notice that the constructor sets up the AWS credentials for the Discovery struct.

Now we have 2 methods left in the ec2.go file. The first is the func (d *Discovery) Run(ctx context.Context, ch chan<- []*targetgroup.Group)function which I think is run as a go routine inside the manager. Then we have the main function of the package refresh which looks like it polls AWS for data on the target groups.

We can see some interesting stuff in the first few lines of the refresh method. First it looks like a new AWS session is created for every invocation of the refresh method. Usually, I would expect the AWS session to be cached and not created over and over. My first thought was that this refresh method was called pretty frequently, but if its creating its own AWS session, maybe it is not called very often.

We also see a counter started t0 := time.Now() to keep track of how long the refresh method takes to run. Which could imply that the run time of the refresh method is important for Prometheus’ service quality.

Still in the refresh method, we have this loop which iterates through AWS API responses and filters out the EC2 instances we care about. This list is what the Discoverer is using to update the manager’s state on target groups.

After reading through a bit of the discovery folder we have an idea of how Prometheus does service discovery and which interfaces we would need to implement our own discovery provider.

Some of the general things we did along the way in our walkthrough include:

  1. Look at READMEs and any documentation to get an overview of the project, particularly architecture diagrams.
  2. Look at the root level folders and configuration files to determine what the code uses for things like CI/CD, builds, runtime, and API specs.
  3. Look at interfaces that might be implemented for the codebase, and any example implementations.

In the next post I will dig around and try to find where the Alert Manager functionality is inside the codebase.

Nicholas Sledgianowkski is a software engineer working on scalable software systems in the cloud for Kenzan. The original version of this blog post can be found on his website: https://www.sledgianowski.com/

--

--

No responses yet