At iovation, Elasticsearch and the ELK stack are a core part of the debugging, analytics and data presentation layers throughout the product for our internal and external stakeholders.

More so today than ever, Logstash empowers our engineers, developers, QA team, client management team, and management teams to make strong informed decisions with regards to our product and how we serve our customers. iovation’s ELK stack provides real-time analytics about traffic to iovation’s services, customer usage patterns, and many other in house analytics for making informed business decisions that help improve the product and provide worthwhile value to our customers.

Note: This is not a typical deployment. The steps taken here were to make the system scale as far as possible (within the context of the hardware available at the time) using an availability model that is based on an asynchronous model.

How it all started

Logstash was a small implementation with a core set of features that were used across the Operations team and QA team. When I joined iovation, I was given an opportunity to improve upon this implementation, a task I took great interest in.

What We Needed from the ELK Stack

At the core, we didn’t know what we wanted from ELK. We definitely wanted all the logs. Not only did we want all the logs, we wanted the logs to be referenceable based on common elements. This gave me a road upon which to start. Some of the initial elements we absolutely knew we needed for this project were:

  • Log level (INFO, ERROR, etc)
  • Timestamp (always in ISO8601 format please!)
  • Host the logs were being sent from
  • Data center logs were coming from
  • Ad-hoc analytics
  • Debugging
  • Application behavioral trending
  • SOA applications interactions, i.e.are things playing nicely together
  • Resiliency in the loss of a datacenter
  • Full replication for archival

Getting Five Data Centers to talk to Each Other

One challenge I really enjoyed working on was designing and building a cross data center replication strategy for the ELK stack. The final revision was a asynchronous PUBSUB model based on Redis.

Each datacenter emits its’ own events and all datacenters listen to those broadcasted events. The benefits to this approach included:

  • Reduced bandwidth consumption
  • Reduced resource consumption
  • Increased Capacity within Redis itself by using PUBSUB over key/value
  • What does our ELK stack look like at the system level?
  • We have four servers total, 2 per primary datacenter. The hardware looks like this:
  • 2 Intel E5-2620 CPUs (6 cores each with hyperthreading)
  • 128GB of RAM
  • Bonded 1GB NICs
  • CentOS 6.5 with kernel 2.6.32-431.3.1.el6.x86_64
  • 7TB of fast storage
  • RAID 5
  • 10K drives
  • 600GB per drive

In each both Portland and Seattle, one node is:

  • A dedicated Logstash broker and indexer (1.4.1 patched for PUB/SUB)
  • Redis server (3.8.9)
  • Running Java JDK 1.8.05

The other node is:

  • A dedicated Elasticsearch server (version 1.4.0 beta1)
  • Apache 2.2 server
  • Kibana 3 instance (version 3.1.1)
  • Running Java JDK 1.8.25

Breaking the mold (a bit)

ELK is designed to use many smaller nodes nodes to create capacity. In our use case, we were limited to one Elasticsearch node and one broker node in both Portland and Seattle. This limited our use case…. or so I thought. Having large, powerful hardware, the challenge was not IF the system could do more, but HOW to get it to do more.

Utilizing the available CPU

CPU time around Elasticsearch queries took time away from other queries. It was apparent after end users interacted with the system, that log events were not broken down into small enough components. Getting around this hurdle was simple; I broke the logs down into smaller logical components. This is critical for performance AND user experience. That made it easier for individuals to find what they wanted and reduced Elasticsearch’s CPU footprint. An example in a Logstash config

The Event:

[2015-03-10 23:19:39,183][WARN ][http.netty] [foobar.baz.com] Caught exception while handling client http traffic, closing connection [id: 0x5a504523, /10.0.1.234:33516 => /10.0.1.5:9200]

The Filter in Logstash:

filter {
 # make the work being done focused only on the logs you want
 if [type] == /foo/ {
​ grok {
 match => [‘message’, ‘[%{TIMESTAMP_ISO8601:timestamp}][%{LOGLEVEL:event_level} ][%{DATA:internal_resource}] [%{HOSTNAME:ES_node}] %{DATA:event} [%{GREEDYDATA:misc}]’]
 }
 }
 }

This will reduce overall load on Elasticsearch’s analysers since regex has to run through less data per field.

RegEx is your friend…. sometimes

Getting logs into smaller chunks ment RegEx parsing came to the limelight in Logstash. To keep Elasticsearch fed we needed to optimize Logstash’s RegEx and to offload work that was better suited elsewhere. The biggest bang for the buck was when I moved some of the more granular parsing to Elasticsearch’s white space parser. To illustrate, take the above example with Logstash’s filter. The raw RegEx that grok compiles to for the above example is quite computationally expensive. JRuby RegEx, in my experience, was not nearly as fast as Elasticsearch for the smaller parsings (individual word parsing, etc). Offloading to Elasticsearch freed up Logstash’s CPU for more throughput.

Memory in the JVM

Elasticsearch was never designed for the monolithic use case (GC times, OOM, old_gen never clearing, etc are all optimized away in many smaller nodes).

  • Upping memory past 32GB meant longer data seek times
  • Upping memory past 32GB meant longer GC times
  • Leaving memory below 32GB meant old_gen would OOM the Java process
  • Leaving memory below 32GB meant various caching layers were constantly competing for resources in the heap

The answer to memory in the monolithic Elasticsearch

Run aggressive optimizations on older indexes (reduces Lucene segment based memory consumption), reduce fielddata_cache to 5%, and filterdata_cache to 10%.

A note about Fielddata_cache and filterdata_cache

The 5% and 10% respectively were set to allow segments to run at a segment/shard ratio of ~50 with the given available heap in our implementation. Higher ratio require more memory.

The caveat

This config will run your CPU and DISK, very HOT!!! You need the overhead in disk and CPU to cover this in order for it to work!! SSD’s or raided 10K/15K drives are your friend here.

Disk

With the pressure moved off of memory, pressure was applied to disk, per the caveat; go figure. Time to leverage disk caching to the maximum! I set the kernel to allow higher levels of dirty cache for background writes (buffering out the highs and lows a bit):

“
vm.dirty_background_ratio = 15
vm.dirty_ratio = 75
”

2.5 billion documents in a SINGLE node

This entire process took careful curation. Really it was, and still is, a work in progress. There were a few things I learned right off the bat though. Elasticsearch is optimized for several smaller nodes working together—not a single node. If you are reading this post and are in a similar situation, you will have to revisit ALL configurable options around indexing, allocation, concurrency, templates and caching to stretch a larger node to utilize its potential.

My short list:

Index_concurrency = 1.5* <number of CPU cores
Java Heap, 30GB (no wasted memory here!) Java uses special object pointer compression to make the best use of anything < 32GB. If you have *MOUNTAINS* of memory available then to be efficient you will need to use more than 64GB of memory in my experience, but still less than half of your total available memory.

Number of shards per index = 3
This choice is due to the fact that you're not moving load around a cluster: more shards = more everything. Maximum number of indexes is > 31 (one month sliding window). Any more then this will make Elasticsearch not have a great day. Elasticsearch’s memory will thrash trying to maintain all these shards

Limit node_filter_cache and field_data_cache
I limit field_data to 10% and filter_data to 20% of total available memory. Given half a chance, these caches will take almost all of your available heap in a large deployment.

JDK >= 1.8.25
This has some big performance boosts over 1.7_X. Mostly around memory in this context.

Elasticsearch >= 1
This branch has far more configurables and performance enhancements over older versions, so take advantage of that.

Elasticsearch’s query speed

Elasticsearch is still very fast at consuming information as well as serving information. To date we are processing +4000 logs/sec at peak times. We have more than 60 engineers that use it almost daily for a variety of use cases. The biggest bottleneck to date has been people’s browsers. There is simply too much data for a browser to parse into graphs/tables/reports easily using Kibana 3.

Next Steps

Some nexts steps include:

  • Using Doc Values to more heavily leverage disk and use less RAM overall
  • Increase Compression where possible to leverage CPU more heavily

Final Thoughts

This project was exciting, challenging, and eye opening. Elasticsearch is a powerful, versatile system that is further enhanced by Logstash and Kibana. I look forward to contributing to Elastic’s offering in the future by continuing to push the envelope as far as I can!