Marc Denning

Triangle Kubernetes and OpenShift Meetup: Log Aggregation & Graph Databases

Author's note: I attended this meetup in July, but have been delayed posting my recap. Better late than never!

In July, I attended the Triangle Kubernetes and OpenShift Meetup. They had two topics by different speakers: "Log aggregation for Kubernetes using Elasticsearch" and "Using a Graph Database to Understand Complex Infrastructure".

The first topic on log aggregation was interesting to me because my current client is using the Elastic Stack as part of their monitoring approach. I was curious about how other companies were using the same tools. As it turns out, Bandwidth is using Elastic in a similar way and has run into similar constraints. The Elastic Stack is composed of several different components, but the basic idea is to gather log and metric data from as many parts of your application and infrastructure as possible and bring it together into one tool for log search, reporting, and visualization. Elasticsearch seems to be well-suited to this job and Kibana (the visual app you use to view logs and visualizations) is pretty powerful as well. The limits that Bandwidth and my client run into are storage related. The more you monitor, the more data you have to track and noisy applications or infrastructure can become a problem. Both companies put Kafka in between Elasticsearch (the data storage layer) and the components (like Logstash) sending in data to help buffer data, and they are still constantly running into storage management issues. It seems like the problem here is both technical and policy-oriented. It's challenging to build and maintain the infrastructure for the stack to scale, but there is also an argument that as a company, you should have a consistent approach to the data you send to the cluster such that you don't overwhelm the system. I'm sure there are other patterns and approaches out there.

The second topic on graph databases I was less familiar with. JupiterOne is a product that LifeOmic develops that at least in part provides a query language built for navigating graph structures. There are a few different graph database backends out there, but I definitely have not heard much about them. The biggest takeaway from this presentation for me was that when architecting and designing systems, it's worthwhile to really think about what questions you're going to ask of your data and try to pick databases that can help you meet those needs easily and performantly. If your application or system has any kind of focus on relationships (or vertices and edges for the math whizzes out there), maybe a graph database is a good fit.