Handling Logs with Graphite

Graphite is an enterprise-ready infrastructure monitoring solution which can plug into existing infrastructure and solve the problems of time-series data storage, performance measurement, and data visualization. It is easily deployed as a platform for the Cloud and On-Prem. It is a mature and reliable open source monitoring solution solving monitoring issues for numerous large companies. With an extensive amount of integrations and tools available, Graphite can be modified to serve your needs from different storage backend’s, data collection agents, visualization tools, anomaly detection and alerting. 

Logs and Monitoring

It is important for a business to monitor their infrastructure in real-time. Effective Real-Time monitoring can help a business resolve outages, track performance and provision resources efficiently. Infrastructure today consists of virtual machines, containers and applications communicating on various protocols generating millions of lines of logs. It is important that important metrics and conclusions could be extracted from these logs.

Graphite acts as the central storage mechanism for metrics generated from logs, applications, and systems. It’s platform agnostic nature and allowing metrics to be published over a text-based protocol makes it accessible to all kinds of systems and applications in the infrastructure. Graphite is not a collection agent, It is only responsible for storing time-series data and rendering the data as graphs.

  • Graphite can be used to store Linux performance metrics like CPU utilization, Memory usage, etc of all systems in a cluster.
  • Data collection agents can extract important data like HTTP Request Codes from a webserver’s log files and push them to a Graphite instance.
  • Instrumented applications can publish application specific custom metrics to Graphite.

Graphite logging is powerful for building custom monitoring infrastructure, It is only responsible for storing time-series data. Alerting and Visualization can be designed as plug-and-play systems using Graphite’s accessible Metrics and Render HTTP APIs.

Large companies like GitHub have been using Graphite as their foundational monitoring solution to analyze large volumes of time-series data, anomaly detection, trend analysis, etc. Graphite powers both technical and business metric monitoring at GitHub.

Infrastructure Monitoring with Graphite

Lets a took at what kind of metrics we could go about monitoring effectively with Graphite, everything from bare metal machines to production services.

Bare Metal Systems

The fundamental resource of the IT infrastructure, monitoring bare-metal system performance and health is important to maintain reliability, integrity and agility in the applications running on top of it. The bare-metal systems most often run Linux, Graphite can collect data from a Linux system with its extensive library of available tools and integrations.

Some important metrics we can collect from bare-metal systems:

  • CPU Utilization
  • Network Throughput
  • Bandwidth Utilization
  • System Integrity Indicators
  • Disk Health and Available Storage

Applications

Applications contain the business logic powering the business, each application is different from each and generates different logs and metrics. An HTTP Service exposed to the customers used for order entry will have different metrics from an internal gRPC service for email queuing. These applications can be instrumented and take advantage of Graphite logging. Using 3rd party libraries, An application can publish internal counters and metrics integrating Graphite logging into the application itself. This eliminates the requirement of an external data collection agent.

Let’s take the case of an HTTP based service responsible for handling order creation. Internal counters for the number of orders, request throughput, erroneous orders handled can be stored in Graphite by instrumenting the application itself to regularly collect and publish these metrics.

Aggregation

Different kinds of metrics collected from different sources could be aggregated together and used for generating new metrics. Different configurations of Graphite can make this possible with the developer only required to build the aggregation logic. Aggregation could be used for both business and technical monitoring.

Financial metrics from different departments being stored in Graphite could be aggregated together to generate a cumulative status of finances. Different machines and applications in a cluster storing CPU utilization and System health indicators in Graphite could be used to create aggregated cluster-wide system health metrics. 

Open Source Tools for Graphite

Graphite is an Open Source project under the Apache 2.0 license and has one of the largest ecosystems of integrations and third-party tools, this includes language bindings, data collection agents, visualization solution, alerting systems, storage backends and more.

StatsD

It a network daemon is written in NodeJs, It listens over TCP and UDP to collect counters and timers and send aggregated data to storage backends. Graphite is one of the available backends for StatsD. 

It is useful for aggregating numeric values and pushing them to Graphite. The application only has to make TCP or UDP connections to the statsd server and submit metrics conforming to the text-based protocol. It supports various types of inbuilt metric types: counting, timers, samplers, gauges, etc. You can find more information here.

CollectD

CollectD is an open-source system statistics collection daemon, It runs in the background and using it extensive selection of over 100 plugins collect metrics from various subsystems and is able to publish them to different backends like Graphite, SysLog, and more. Plugins include widely used open-source applications like Apache, Nginx, etc, CPU metrics, memory consumptions, network connections etc

The write_graphite plugin can send collectd statistics to the graphite backend for storage and analysis. More information on collectd can be found here.

Logster

Logster is a utility for parsing different log files and generating outputs which can be stored in different storage backends, like Graphite. Logster runs on a client and parses log files through parsing scripts and sends generated metrics to Graphite. There are parsers available for various applications and it is easy to integrate into your own application with its Python library. Parsers include analyzing Apache webserver logs and counting the HTTP Request Codes and publishing them as metrics to Graphite. Logster and its parsers can be found here.

Grafana

It is important to have powerful visualization tools to visualize massive amounts of data and metrics generated by applications. Grafana is an open-source visualization tool which can be easily integrated with Graphite using the time-series datastore as its source of information. Grafana can generate graphs and charts from the data stored in Graphite. An example could be aggregated Cluster capacity or CPU Utilization stored as a numeric counter in Graphite’s time-series database visualized as a CPU Utilization vs Time graph with Grafana.

Carbon

One of the daemons running on a Graphite instance responsible for exposing the storage backend and listens for time-series data being published over the network following a common set of protocols. It has four daemons with different characteristics.

carbon-cache

It accepts metrics over the various protocols and caches them to RAM periodically flushing it to disk. It can also cache “hot data” and provide a query service to serve them directly from the memory.

carbon-relay

It is used for replication and sharding by the Graphite instance. A single carbon-relay can work on a single server and forward all the metrics to multiple backend carbon-cache servers. This can efficiently scale Graphite across a cluster. The load balancing is done using consistent hashing. 

carbon-aggregator

It can run in front of a carbon-cache instance and buffer received metrics in memory to reduce I/O load on the disk.

carbon-aggregator-cache

It combines the functionality of both carbon-cache and carbon-aggregator in a single daemon. It is useful for conserving resources and reducing overhead.

A Graphite cluster could be created easily by taking advantage of all the carbon daemons. Graphite can be easily made highly available by using multiple carbon-relay nodes forwarding the data to carbon-aggregator daemons running on multiple systems feeding into carbon-cache daemons for storage.

Tutorial: Setting up Graphite and Logster 

In this tutorial we will be covering the basics of setting up Graphite and Logster for Apache logs. First step is we setup an instance of Graphite. Read the following tutorial to install Graphite on your system and set it up. Later,  we can use the same for Logster to implementing logging. 

Apache is one of the most popular webserver, deployed by numerous organizations to serve documents to users. Apache logs every HTTP request in its Access logs which can be found at /var/log/httpd/access_log or /var/log/apache2/access.log in a usual installation on RHEL or Ubuntu. Access logs contain client IP Address, User Agent Information, Time of Request, HTTP Status Code, HTTP endpoint, etc. In this tutorial we will focus on identifying all the significant classes of HTTP status codes found in Apache access logs parsing them via Logster and feeding them into Graphite as metrics to be stored.

  • Step 1

Logster can be found at https://github.com/etsy/logster with installation instructions.

To install Logster, clone the Logster repository from GitHub.

git clone https://github.com/etsy/logster.git
  • Step 2

Install the pygtail module.

pip install pygtail
  • Step 3

Install the Logster module.

sudo python setup.py install
  • Step 4

After logster is installed, We collect some sample Apache acces logs from https://github.com/elastic/examples/tree/master/Common%20Data%20Formats/apache_logs. These logs will be parsed by logster.

wget https://raw.githubusercontent.com/elastic/examples/master/Common%20Data%20Formats/apache_logs/apache_logs

Download the sample logs.

  • Step 5
sudo /usr/bin/logster --dry-run --output=graphite --graphite-host=localhost:2003 SampleLogster apache_logs

Executing this command will do a dry run and write the parsed output to stdout.

  • Step 6

Executing the command without --dry-run will send the parsed metrics to Graphite running on localhost.

sudo /usr/bin/logster --output=graphite --graphite-host=localhost:2003 SampleLogster apache_logs

The newly added http_xxx metrics can be seen in Graphite-WebApp and visualized using Graphite Composer.

The metrics once published to Graphite will be stored by Whisper to a storage backend and are available via the Metrics API to access raw metrics or the Render API to generate graphs. Other 3rd Party tools like Grafana use these APIs for visualizing metrics stored in the Graphite time-series database on their own platforms.

Conclusion

Graphite is an effective solution for storing metrics collected from a variety of applications and systems. It is a powerful and mature solution that is  battle tested by many organizations to be effective for storing time series data and visualizing it effectively. With the expansive amount of integrations available for Graphite, it is easy to make use of Graphite in your own systems and applications, like in this case for extracting metrics from logs of applications like Apache.

References

About The Author

Aarush Ahuja is an enthusiast of DevOps and Distributed Systems. He is proficient in various programming languages and has a curiosity towards messaging, monitoring and virtualization. He also plays CTF with the team @alcapwnctf.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s