Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP support via HTTP and GRPC #2

Closed
wants to merge 23 commits into from
Closed

OTLP support via HTTP and GRPC #2

wants to merge 23 commits into from

Conversation

marcingrzejszczak
Copy link
Collaborator

@marcingrzejszczak marcingrzejszczak commented Jun 14, 2024

TODO

  • integration tests with Jaeger (confirmed to work with OTLP on the recepient side) - sender
  • integration tests with OTel (confirmed to work with OTLP on the sender side) - collector
  • basic usage test for zipkin with collector
  • docs
  • unit tests

@marcingrzejszczak
Copy link
Collaborator Author

cc @jonatan-ivanov @shakuzen

@codefromthecrypt
Copy link
Contributor

Thanks for offering to help! This will be a lot to review, so let's this split out into parts, starting with http either send or receive side. Arbitrarily, maybe start with send/translate, side.

Please use exact code style, naming conventions and test approach https://github.com/openzipkin/zipkin-gcp to keep review burden down. It will be easier on me and others maintaining this to look at how gcp is working and compare approach when future maintenance occurs. Also, we have years of 👍 on that repo.

Note: If this needs to support the old zipkin2.reporter.Span please be clear what demand is behind this, as this is volunteer and we don't have any resources to maintain stuff. the brave handler thing is easy to explain, and perhaps it is best to start with only that side unless you already know a clear demand for zipkin2.reporter.Span especially knowing otel-java already has code like this.

cc @anuraaga @reta I'll try to watch over this, but as I'm also doing this on a volunteer basis please keep an eye out in case I disappear (not volunteering you, just not sure if you pay attention to this formerly dead repo)

@codefromthecrypt
Copy link
Contributor

also for the PR that handles the http collector side, it would make sense to do comparisons with the otel exporter (cc @jack-berg) For example, send the same otel spans in this test, citing the source of course. Then, make sure the assertions are the same result.

https://github.com/open-telemetry/opentelemetry-java/blob/a686280aaf839f562ec33a4586ac6fd7018a9838/exporters/zipkin/src/test/java/io/opentelemetry/exporter/zipkin/ZipkinSpanExporterEndToEndHttpTest.java

Citing this test allows future maintainers to keep track in case certain mappings change or new spec versions vary things. Using the same data helps for the same reason.

While the project structure of this repo should be the same as zipkin-gcp for the server integration (module, docker, etc), the implementation of the http handlers will be best to follow zipkin itself. For example, here is zipkin's grpc related handlers and I would expect it easiest for this part of the implementation to be similar in code and tests (even though the otel collector additionally needs translation parts similar to gcp)

Notice that the grpc collector (which would be the pattern both for http and grpc here) exposes a configurator bean for armeria. So, it doesn't need to extend collector. Basically the handler part is the implementation. This also allows users to not need to track or manage a separate port for these parts as they don't conflict with any of the zipkin server paths.

https://github.com/openzipkin/zipkin/blob/master/zipkin-server/src/main/java/zipkin2/server/internal/ZipkinGrpcCollector.java
https://github.com/openzipkin/zipkin/blob/master/zipkin-server/src/main/resources/zipkin-server-shared.yml#L31-L32
https://github.com/openzipkin/zipkin/blob/master/zipkin-server/src/test/java/zipkin2/server/internal/ITZipkinGrpcCollector.java

hope these notes help

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Jun 15, 2024

finally (for now ;)), if there is a goal to support default ports and compat etc, I would go with otel not necessarily jaeger for comparisons. so, cite https://opentelemetry.io/docs/specs/otlp/#otlpgrpc-default-port and if doing integration tests that use docker, prefer parts in otel orgs vs jaeger for example. What's being supported here is OTLP x.x (vs jaeger's implementation of that) so we want to be as close to the source as possible for clarity.

p.s. I would first to OTLP on our standard ports and then a follow-up if we want to make a separate listener, as it keeps the amount of code and review concise.

import zipkin2.reporter.ClosedSenderException;
import zipkin2.reporter.Encoding;

public final class OtelGrpcSender extends BytesMessageSender.Base {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one high level comment, I wonder if we need a gRPC sender. FWIU, all OTel implementations support HTTP (including Zipkin collector here!) and it was made the default transport of OTel at some point post-launch (https://opentelemetry.io/docs/specs/otel/protocol/exporter/#specify-protocol), I would expect future backends to possibly have both or otherwise only support HTTP.

A okhttp-based HTTP sender (AFAICT it's not part of this PR) would feel a lot closer to zipkin's other senders than a gRPC one. It could also potentially be a base for a grpc-less gRPC sender if the need for that came but IMO it's probably not needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! if we can rule out any protocol for a while, we should. as well we should follow, not lead the norms of spec implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the gRPC stuff entirely then

@marcingrzejszczak
Copy link
Collaborator Author

Please use exact code style, naming conventions and test approach https://github.com/openzipkin/zipkin-gcp to keep review burden down. It will be easier on me and others maintaining this to look at how gcp is working and compare approach when future maintenance occurs. Also, we have years of 👍 on that repo.

I have been copying zipkin-gcp all the time to this codebase. Is there anything specific that you find offending the rules in this PR?

@marcingrzejszczak
Copy link
Collaborator Author

While the project structure of this repo should be the same as zipkin-gcp for the server integration (module, docker, etc), the implementation of the http handlers will be best to follow zipkin itself. For example, here is zipkin's grpc related handlers and I would expect it easiest for this part of the implementation to be similar in code and tests (even though the otel collector additionally needs translation parts similar to gcp)

I've been following whatever you have set up there. I can follow Zipkin's part, no problem

@marcingrzejszczak
Copy link
Collaborator Author

Notice that the grpc collector (which would be the pattern both for http and grpc here) exposes a configurator bean for armeria. So, it doesn't need to extend collector. Basically the handler part is the implementation. This also allows users to not need to track or manage a separate port for these parts as they don't conflict with any of the zipkin server paths.

It's already there in the module module. The OpenTelemetryHttpCollector is reconfiguring the Aremeria server. Are you looking for anything else?

@codefromthecrypt
Copy link
Contributor

@marcingrzejszczak sorry if I caused confusion. my notes above were about how to go about it, and I hadn't done a review, yet. Mainly, I wanted to give you guidance including test (jaeger vs otlp etc) advice before I do a review as I want to conserve time mutually at that phase.

There are probably parts of my comments you hadn't addressed, yet, so try to read carefully so we don't need to cover them in the actual PRs to be merged. This will save us both time and days, as I won't be able to allocate much time daily (as it is all free time). In other words focus less on the parts you feel you already did, and more in efforts to find things maybe you haven't.

When ready, please peel off the sender/brave side (dropping grpc transport as advised) and lets get that part in first! you can either open another PR or drop the server side from this one. Cheers.

@codefromthecrypt
Copy link
Contributor

fyi while tempting to simplify the ports by collocating the OTLP handler on the same port as zipkin's http and grpc stuff, I notice that the port seems important over there, for drop-in reasons I suspect. In other words, it may be best to keep the "standard" OTLP port vs wait until someone to ask for that. just a thought

dapr/docs#3771

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Jun 25, 2024

hey marcin. Can you pull the infrastructural/tidie changes into a different PR? This would include any modules that are renamed or deleted (e.g. removal of grpc). We can merge mechanical change quickly without a lot of focus.

Next, you could extract a PR about the brave send/translate with only that part in a PR? So, just the brave AsyncSpanHandler part (brave.Handler) without a dependency on zipkin2.Span.

Once we have that in, we can chunk something else into yet another PR

notes below as I quickly browsed through some of the changes


On translation, when I asked about using the same test data. Basically, make sure we reference where the data/scenarios of translation are coming from. Going beyond that to depend or copy otel classes may make this difficult to maintain. Some types of translation concerns I recall in other products are linked here, but note the below are zipkin2.Span and I think we shouldn't do that or start with it. Just the same approach for an encoder of mutableSpan

https://github.com/openzipkin/zipkin-aws/blob/master/storage/xray-udp/src/test/java/zipkin2/storage/xray_udp/UDPMessageEncoderTest.java
https://github.com/openzipkin/zipkin-gcp/blob/master/translation-stackdriver/src/test/java/zipkin2/translation/stackdriver/SpanTranslatorTest.java

on tests is possible we can focus on decoding with otel's proto or something, to assert the translation.

I mention this because I noticed copying abstraction classes from otel and I think that will be maintenance, and unlike the rest of the code in this org. Main thing is to capture and cite the scenarios.


We dont need to make otel specific behavior tests for senders, in fact I'm not sure we need a sender anymore. My thoughts were that once we remove grpc, basically all we are left with is an encoder, and can use the existing httpurlconnectionsender or okhttp one, pointed at the OTLP http port. If anything, we could have a convenience wrapper to configure an http sender for otel. If we need a separate sender we should keep it small and be loud about why.


Integration tests are ideally an existing http sender and the brave encoder and point it to some docker image which can read it back. We would throw a happy path mutableSpan and prove the validity of at least one of our translation scenarios. This is a toe-hold in case there is confusion later. I don't know what images are out there at the moment for this purpose.


In general let's be careful to keep the code and dependencies small, so that we can get to mergable PRs. Less is best.

@codefromthecrypt
Copy link
Contributor

ps I reworded my last comment hopefully to be more clear, so refresh and 🤞 it is indeed more clear. Looking forward to getting some of this landed soon!

@marcingrzejszczak
Copy link
Collaborator Author

Hey, I'm planning to planning to separate these soon. I'll mark this as ready for review once I'm done with iterating.

If it wasn't for me checking out the OTel test cases I wouldn't know that some special user agent is required (I had no knowledge of that really).

I will be moving things around still but I think that I start having an overall idea of how this should look like.

@codefromthecrypt
Copy link
Contributor

thanks, marcin. p.s. if we need to make a wrapper or instructions for setting user-agent very specifically, all for it. In fact we have an existing use case around user-agent that maybe is the right time to action openzipkin/zipkin-reporter-java#142 (e.g. default as mentioned but at the same time, this would allow a standard way for otel users to override it and/or this code to make a portable wrapper that does the same)

@marcingrzejszczak
Copy link
Collaborator Author

OK I think everything is somehow tested one way or the other. Tomorrow I'll split this into a couple of PRs

  • 1 PR to remove GRPC support
  • 1 PR for the Brave part
  • 1 PR for the Zipkin part

I've ported tests from OTel, from zipkin-gcp so it looks better now but let's wait for the splitting to invest more time in reviewing

@marcingrzejszczak
Copy link
Collaborator Author

Closing this in favour of

@marcingrzejszczak
Copy link
Collaborator Author

thanks, marcin. p.s. if we need to make a wrapper or instructions for setting user-agent very specifically, all for it. In fact we have an existing use case around user-agent that maybe is the right time to action openzipkin/zipkin-reporter-java#142 (e.g. default as mentioned but at the same time, this would allow a standard way for otel users to override it and/or this code to make a portable wrapper that does the same)

BTW when querying Jaeger WITHOUT the user agent stuff still works so I don't know if this is actually mandatory 🤷

@anuraaga
Copy link

If it wasn't for me checking out the OTel test cases I wouldn't know that some special user agent is required (I had no knowledge of that really).

Probably otel-java tests test some of the SDK's expected behavior unrelated to the spec. It's one reason to, maybe inspire some unit tests from their translation logic, but not to treat them as-is. If we have an IT using Jaeger, IMO it's plenty for verifying the sender.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants