OpenTelemetry: Tips to navigate the sea of observability options
OpenTelemetry makes for a flexible and vendor-neutral observability pipeline, but there is more to the story.
Observability is a deep subject and easily turns into a rabbit hole. At Momento, we want to remain nimble and build an efficient yet durable observability system that allows us to monitor our customer traffic. In this blog, I’ll walk through Momento’s own journey with OpenTelemetry (OTel)—which defines vendor-neutral observability standards for tracing, metrics, and logging—and why it may or may not be a good fit for you.
My experience with OpenTelemetry started with OpenTracing back in 2017, when I was working at Lightstep with CEO Ben Sigelman, who co-created OpenTracing with the aim of providing an open-source, unifying tracing standard. Subsequently, OpenTracing was merged with OpenCensus (from Google) to form OpenTelemetry. It’s truly rewarding to see how much the community has grown over the last five years, and how many companies and individuals are now actively contributing to the project.
For the uninitiated, the OpenTelemetry ecosystem can be overwhelming—docs are plentiful, but developers just want to get started on coding! Here is a quick overview of the different components of OpenTelemetry. You may choose to use all of it, or only parts of it. It’s made to be very flexible, supporting mix-and-match or plug-and-play:
OpenTelemetry SDKs
OpenTelemetry SDKs are the libraries developers use to instrument their application with metrics, traces, and logs. The OpenTelemetry specification defines the standardized API interface for doing said instrumentation. Actual implementations are then performed in each supported SDK language according to the specification (Java, Python, GoLang, etc.). Remember, this is an open-source, community-driven project, so SDK stability varies across different language frameworks. The good news is you can always contribute to the code base yourself if you need something in particular. You can also check the current stability of the SDK you are interested in on the official OpenTelemetry status page.
OpenTelemetry Protocol (OTLP)
Once applications are instrumented using the OTel SDK, observability data is emitted by these applications using a standard wire format, called OpenTelemetry Protocol(OTLP). It defines details such as encoding, transport, and delivery mechanism. Currently, OTLP uses Protocol Buffer schema (protobuf), and supports both gRPC and HTTP1.1 (JSON over HTTP) transports.
OpenTelemetry Collector
OpenTelemetry Collector is an optional intermediate agent you can run to receive, process, and export telemetry data. In the attached diagram, the example applications emit telemetry data via OTLP to the OTel Collector, which performs intermediate processing, such as batching or rate limiting, before exporting to various observability vendors. Keep in mind: while having this intermediate agent may be helpful, it does add an additional layer of infrastructure and complexity to your stack.
So… OpenTelemetry, is it for me?
The classic answer: it depends.
The biggest advantage of OpenTelemetry is its flexibility and vendor neutrality. Given the rapid expansion of the OpenTelemetry community in the past few years, many observability vendors have announced native OTel support, including Splunk, Datadog, Dynatrace, and Lightstep. Adopting OpenTelemetry gives you the much desired “no vendor lock-in”. The world is super dynamic—new players keep showing up with more features and lower prices. It could help to be more nimble and leave your options open. However, if you are committed to your existing observability vendors, there is little added value to adopting OpenTelemetry; you should stick with your vendor-specific library, and use the official observability pipeline offered by said vendor.
At Momento, we adopted OpenTelemetry to keep our options open. We wanted to move fast and enable observability, without locking ourselves in with a particular vendor. We wanted time to evaluate our options, and OpenTelemetry made our choices a two-way door. Once we had OpenTelemetry SDKs and OpenTelemetry Collector set up, it was easy to evaluate multiple vendors concurrently by broadcasting OTel data to all of them. Our developers are able to use multiple observability systems at the same time, and compare user experience side-by-side. This allowed us to pick a vendor that best suits our current budget and needs, which may evolve as we grow (at which point we can easily switch vendors).
All this said, the OpenTelemetry ecosystem is still young, and constantly evolving, we have run into our fair share of issues upgrading the SDK and the Collector due to breaking changes. You should factor in the dynamic nature of the ecosystem when deciding whether or not to adopt OpenTelemetry in your stack. The community is very helpful and active, so be sure to check in with the OTel contributors on their upcoming plans!
Observability is a first-class citizen at Momento because it enables us to maintain a high bar for availability and performance. Building upon the flexibility OTel affords us, we provide extensive visibility to our customers to help them meet their own observability goals. If you need a powerful, performant cache without all the operational overhead, try Momento for free today.