Aggregate OpenShift logs into enterprise logging system

Jaydeep Ayachit
10 min readJun 21, 2021

Introduction

Red Hat OpenShift is an enterprise-ready Kubernetes container platform with full-stack automated operations to manage hybrid cloud, multicloud, and edge deployments. Red Hat OpenShift is optimized to improve developer productivity and promote innovation.

OpenShift provides some convenient mechanisms for viewing application logs. Firstly, you can view Pod’s logs directly from the web console or via the command line. Secondly, OpenShift provides support for out of the box logging stack consisting of (E)lasticSearch, (F)luentd and (K)ibana. The logging stack is responsible for log collection, aggregation and visualization.

However, the OpenShift Logging Elasticsearch instance is optimized and tested for short term storage, approximately seven days. If you want to retain your logs over a longer term, it is recommended you move the data to a third-party storage system. Secondly, for many organizations, enterprise log collection solutions may already be in place. They will have a need to make logs from OpenShift and workloads running on OpenShift to be made available in the same enterprise log collection system for monitoring, correlation and analytics.

In this article we will look at OpenShift out of the box support to integrate with external log collection systems. We will also look at various enterprise log collection systems that you can use to collect logs from OpenShift.

Source: About Logging | Logging | OpenShift Container Platform 4.7

If you are interested to know more about centralized monitoring for your OpenShift clusters, take a look at Centralized monitoring for your OpenShift clusters | by Jaydeep Ayachit | Jul, 2021 | Medium

Red Hat OpenShift Logging

The OpenShift Logging components include a collector deployed to each node in the OpenShift cluster that collects all node and container logs and writes them to a log store. You can use Kibana to create rich visualizations and dashboards with the aggregated data.

The major components of OpenShift Logging are:

  • collection — Collects logs from the cluster, formats them, and forwards them to the log store. The current implementation is Fluentd.
  • log store — This is where the logs are stored. The default implementation is Elasticsearch.
  • visualization — UI component used to view logs, graphs, charts, and so forth. The current implementation is Kibana.

As a cluster administrator, you can deploy OpenShift Logging to aggregate all the logs from your OpenShift cluster, such as node system audit logs, application container logs, and infrastructure logs.

OpenShift Logging aggregates the following types of logs:

  • application — Container logs generated by user applications running in the cluster, except infrastructure container applications.
  • infrastructure — Logs generated by infrastructure components running in the cluster and OpenShift Container Platform nodes, such as journal logs. Infrastructure components are pods that run in the openshift*, kube*, or default projects.
  • audit — Logs generated by the node audit system (auditd), which are stored in the /var/log/audit/audit.log file, and the audit logs from the Kubernetes apiserver and the OpenShift apiserver.

Source: About Logging | Logging | OpenShift Container Platform 4.7

Need for centralized logging

“Observability”, is a popular term used nowadays to help understand how the systems are performing and what data they expose. The more observable a system, the more quickly and accurately you can identify an issue and its root cause. One of the key pillar of observability is “Logs”. Logs is basically structured or unstructured text that record events that occurred at a specific time. Logs can originate from a variety of sources including infrastructure, hosts, services, applications, and external sources. Audit and security events can also be represented as logs.

In traditional infrastructures, applications operate on relatively static instances and thus monitoring applications and logs aggregation is quite manageable. Containerized workloads operate across a fleet of underlying hosts where multiple instances of an application may be running at a given time. The ability to collect logs emitted by these applications is important to understand the current operating state.

In many cases, organizations have enterprise log collection systems already in place that acts as centralized system for log collection, aggregation, analytics and visualization. Organization would want to integrate logs from OpenShift and workloads running in OpenShift to be made available in the centralized logging solution.

The following sections explains various enterprise log collection systems and how your OpenShift clusters can be administered to take advantage of centralized logging solution.

OpenShift Logging Integrations

The following diagram depicts

  • How the logs originating from OpenShift and workloads running in OpenShift can be sent to external systems
  • Various enterprise systems that can collect logs from OpenShift cluster for centralized storage, aggregation and analytics

OpenShift out of the box support

By default, OpenShift Logging sends container and infrastructure logs to the default internal Elasticsearch log store. It does not send audit logs to the internal store because it does not provide secure storage.

To send logs to other log aggregators, you use the OpenShift Cluster Log Forwarder. This API enables you to send container, infrastructure, and audit logs to specific endpoints within or outside your cluster. You can send different types of logs to different systems.

Forwarding cluster logs to external third-party systems requires a combination of outputs and pipelines specified in a ClusterLogForwarder custom resource (CR) to send logs to specific endpoints inside and outside of your OpenShift cluster.

An output is the destination for log data that you define, or where you want the logs sent. An output can be one of the following types:

  • elasticsearch: An external Elasticsearch 6 (all releases) instance. The elasticsearch output can use a TLS connection.
  • fluentdForward: An external log aggregation solution that supports Fluentd. This option uses the Fluentd forward protocols.
  • syslog: An external log aggregation solution that supports the syslog RFC3164 or RFC5424 protocols. The syslog output can use a UDP, TCP, or TLS connection.
  • kafka: A Kafka broker. The kafka output can use a TCP or TLS connection.
  • default: The internal OpenShift Container Platform Elasticsearch instance. You are not required to configure the default output.

A pipeline defines simple routing from one log type to one or more outputs, or which logs you want to send. The log types are one of the following:

  • application: Container logs generated by user applications running in the cluster, except infrastructure container applications.
  • infrastructure: Container logs from pods that run in the openshift*, kube*, or default projects and journal logs sourced from node file system.
  • audit: Logs generated by auditd, the node audit system, and the audit logs from the Kubernetes API server and the OpenShift API server.

You can also use the Cluster Log Forwarder to send a copy of the application logs from specific projects to an external log aggregator instead of, or in addition to, the default Elasticsearch log store.

Source: Forwarding logs to third party systems | Logging | OpenShift Container Platform 4.7

Leverage Kafka connectors

Kafka connectors enable you to copy data between Apache Kafka and other systems that you want to pull data from or push data to. The connectors are available on Confluent Hub Kafka Connectors | Confluent Documentation

From a log aggregation perspective, following connectors are interesting -

Elasticsearch Service Sink

The Kafka Connect Elasticsearch Service Sink connector moves data from Apache Kafka to Elasticsearch. It writes data from a topic in Kafka to an index in Elasticsearch. See Elasticsearch Service Sink Connector for Confluent Platform | Confluent Documentation

Splunk Sink

The Kafka Connect Splunk Sink connector moves messages from Apache Kafka to Splunk. See Splunk Sink Connector for Confluent Platform | Confluent Documentation

Enterprise logging systems integrations

In this section, we will look at various enterprise products that can collect logs from OpenShift clusters for centralized storage, aggregation and analytics.

Splunk

Splunk Connect for Kubernetes provides a way to import and search your OpenShift or Kubernetes logging, objects, and metrics data in Splunk. Splunk Connect leverages many of the same technologies as the out-of-the box EFK stack that is included with OpenShift such as a DaemonSet of containers that collects logs from each underlying host which are then transmitted to Splunk.

The second option is to use OpenShift Log Forwarding API. To send logs from OpenShift to Splunk, the Log Forwarding API must first send them to a Fluentd server. The Fluentd server redirects the forwarded logs to Splunk by using Splunk’s HTTP Event Collector (HEC) API.

Source: Splunk Connect for OpenShift — Logging

Datadog

Datadog simplifies monitoring in OpenShift environments with a Red Hat-certified Operator that enables teams to deploy the DataDog Agent using a single Kubernetes manifest. Datadog’s built-in OpenShift and Kubernetes integrations allow teams to monitor all of their application components side by side and easily pivot between critical application metrics, logs, and traces. Teams can use Datadog’s out-of-the-box Kubernetes dashboards to keep track of the nodes, deployments, and pods deployed to their OpenShift environment in one place.

One key thing to note is that DataDog is SaaS service. It does not support on-premise deployment in your infrastructure or data center.

The Datadog Agent can collect information from several different sources in your OpenShift cluster, including the Kubernetes API server, each node’s kubelet process, and the hosts themselves. Datadog provides three general levels of data collection based on what permissions are required:

  • Restricted for basic metric collection
  • Host network for APM, container logs, and custom metrics
  • Custom for full Datadog monitoring

Restricted

Restricted access is essentially allowing Datadog to access only the API server and kubelet processes. With this level of access you can collect most of the key metrics and cluster events

Host network

OpenShift’s default SCC configuration does not allow pods to directly access their host nodes’ ports. The Datadog Agent needs access to host ports in order to collect custom metrics (via the DogStatsD protocol), APM traces, and logs.

Custom

You can collect even more information about your OpenShift environment by applying custom SCC to the Datadog Agent. This means, in addition to providing the Datadog Agent pods access to host ports, also granting them super privileged status (spc_t). This allows them to collect system information at the container and process levels.

DataDog agents

DataDog provides 2 types of agents — node level and cluster level. Depending on your use case, you can choose to go with one or both.

The Datadog Agent

The Datadog Agent is open source software that collects and reports metrics, distributed traces, and logs from each of your nodes, so you can view and monitor your entire infrastructure in one place. In addition to collecting telemetry data from Kubernetes, Docker, CRI-O, and other infrastructure technologies, the Agent automatically collects and reports resource metrics (such as CPU, memory, and network traffic) from your nodes, regardless of whether they’re running in the cloud or on-prem infrastructure.

The Datadog Cluster Agent

The Datadog Cluster Agent provides several additional benefits to using the node-based DaemonSet alone for large-scale, production use cases. For instance, the Cluster Agent:

  • reduces the load on the Kubernetes API server for gathering cluster-level data by serving as a proxy between the API server and the node-based Agents
  • provides additional security by reducing the permissions needed for the node-based Agents
  • enables auto-scaling of Kubernetes workloads using any metric collected by Datadog

The Datadog Operator

The Datadog Operator simplifies the task of configuring and managing the Agents monitoring your cluster. You can deploy the node-based Agent DaemonSet and the Cluster Agent using a single Custom Resource Definition (CRD).

The Datadog Operator is available on the community operator hub and has received Red Hat OpenShift Operator Certification.

Source:

Red Hat OpenShift | Datadog (datadoghq.com)

OpenShift Monitoring With Datadog | Datadog (datadoghq.com)

AppDynamics

AppDynamics provides Analytics agent to collect logs data from your OpenShift cluster and workloads running in the cluster. To capture and present log records as analytics data, one or more log sources are configured for the Analytics Agent. The Analytics Agent uses the log source configuration to:

  • Capture records from the log file
  • Structure the log data according to your configuration
  • Send the data to the Analytics Processor.

Source: Deploy Analytics in Kubernetes (appdynamics.com)

DynaTrace

Dynatrace automatically collects log and event data from a vast array of technologies in hybrid and multicloud environments at enterprise-scale.

  • Native support for Kubernetes logs and events for K8s platforms, workloads and applications running inside K8s.
  • Native support for multicloud environments, including AWS, GCP, Microsoft Azure, and Red Hat OpenShift.
  • Support for open-source log data frameworks, including FluentD and Logstash.
  • Automatic ingestion of logs, metrics, and traces, and continuous dependency mapping with precise context across hybrid and multicloud environments with Smartscape®.

Dynatrace OneAgent acts as a single entity to collect monitoring and logging data from OpenShift clusters. Dynatrace OneAgent is container-aware and comes with built-in support for out-of-the-box monitoring of OpenShift. Dynatrace supports full-stack monitoring for OpenShift, from the application down to the infrastructure layer.

Dynatrace also provides Dynatrace Operator that manages the lifecycle of several Dynatrace components such as OneAgent and Kubernetes API Monitor. Dynatrace Operator is responsible for -

  • OneAgent pod, deployed as a DaemonSet, collects host metrics and logs from Kubernetes nodes. It also detects new containers and injects OneAgent code modules into application pods.
  • Dynatrace Kubernetes monitor pod collects cluster and workload metrics, events, and status from the Kubernetes API.

DynaTrace log analytics provides following features:

  • Log Monitoring: Create custom log metrics for smarter and faster troubleshooting.
  • Log data analysis: Analyze significant log events across multiple logs, across parts of the environment (production), and potentially over a longer timeframe. Log content can be filtered based on keywords or timeframe. For immediate notification, set alerts for monitored log data. Dynatrace artificial intelligence automatically correlates log messages with problems it detects in your environment and factors the messages into problem root-cause analysis.
  • Log data alerting: Define patterns and custom log metrics to receive proactive notifications. Log Monitoring enables you to create a metric based on your monitored log data. With such a metric, you can have Dynatrace continuously scan your monitored log data and display a chart of that metric on your dashboard so that any pattern changes that occur in your custom metric will be clearly visible.

Source:

Log Monitoring | Dynatrace

Log Monitoring | Dynatrace Documentation

New Relic

New Relic’s Kubernetes integration provides full observability into the health and performance of OpenShift cluster, whether on-premises or in the cloud. New Relic offers a fast, scalable log management platform that allows to connect logs with the rest of telemetry and infrastructure data.

Log management’s features include:

  • Instantly search through your logs.
  • Visualize log data directly from the Logs UI.
  • Use logging data to create custom charts, dashboards, and alerts.
  • Troubleshoot performance issues without switching between tools.

New Relic provides Kubernetes plugin for log forwarding. Kubernetes plugin is Fluent Bit output plugin that forward logs to New Relic. This plugin is installed as a DaemonSet in Kubernetes cluster.

One key thing to note is that New Relic is SaaS service. It does not support on-premise deployment in your infrastructure or data center.

Source: Get started with log management | New Relic Documentation

If you like this, please follow me and give a clap.

--

--