-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(tracing): avoid assigning span's local root to self, so that the python GC doesn't kick in #10809
Conversation
…oesn't need to kick in.
|
Datadog ReportBranch report: ✅ 0 Failed, 1286 Passed, 0 Skipped, 37m 29.69s Total duration (3.77s time saved) |
BenchmarksBenchmark execution time: 2024-09-25 21:47:32 Comparing candidate commit c5d94a7 in PR branch Found 8 performance improvements and 2 performance regressions! Performance is the same for 361 metrics, 53 unstable metrics. scenario:iast_aspects-aspect_iast_do_lower
scenario:iast_aspects-aspect_iast_do_operator_add_params
scenario:iast_aspects-aspect_iast_do_replace
scenario:iast_aspects-aspect_iast_do_upper
scenario:otelspan-add-metrics
scenario:otelspan-add-tags
scenario:span-add-metrics
scenario:span-add-tags
scenario:span-start
scenario:span-start-traceid128
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikayasuda and I also discussed another direction we could take.
Since each span holds a reference to it's parent, any time we need to access the "local root" we can just traverse this linked list back to the root span (span whose _parent
is None) to get the local root as well.
In extreme cases a "tall" "local" trace will only be 20-30 spans deep, so the overhead of traversing this should not be too expensive (citation/benchmarks needed).
With this approach, def _local_root() -> "Span":
could be a simple traversal, and then we can avoid ever needing to "set" local root, and passing along span._local_root = parent._local_root
, etc
This is not a required change, but an observation.
…d-trace-py into modal-labs/local_root_gc_issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From profiling, we need to propagate local_root.span_id
and local_root.span_type
and this PR will change the behavior by returning early.
dd-trace-py/ddtrace/profiling/event.py
Line 104 in 057bb26
if span._local_root is not None: if not span._local_root: if span._local_root == span and span.span_type == SpanTypes.WEB:
I hope profiling tests catch those, if not I'm happy to contribute tests.
Oh wait, I think you've written the code in a way to handle these, nvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These also need to be updated too, right?
dd-trace-py/ddtrace/internal/core/__init__.py
Line 303 in 057bb26
return span._local_root._get_ctx_item(data_key) root = span._local_root ddtrace.tracer.sample(dbspan._local_root)
@taegyunkim All tests were passing in a prior commit (no functionality was changed, just some mypy nits). Is the concern here that |
@erikayasuda Re-read your code and our tests. We do have tests checking dd-trace-py/tests/profiling_v2/collector/test_stack.py Lines 55 to 56 in 057bb26
Thanks for confirming! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work
This is a bit off topic but with partial flushing the root span can be finished, encoded, and sent to the agent while a child span is still active. Since spans are only encoded once, any modifications made to a finished Span may not be reflected in the payload sent to the Datadog Agent. We can end up in a scenario where Should we solve this issue in this PR? Should we only keep a reference to local root spans that have not been encoded and sent to the Datadog Agent? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory usage benchmark scenarios look very promising. Can we also include screen captures of the results from the flaskrealworld app in the microbenchmarking platform?
Just to confirm, this is already the current behavior right? Or would we be introducing this new behavior of keeping references to a finished root span? If it's the latter, then I'd prefer to address it in this PR. Otherwise, I can own a separate PR to address that issue. |
It's a separate issue. I can address it. I would like to propose making |
This change does not change any existing behavior or problems. This scenario exists today, and will continue to exist (no worse than today). Especially since child spans will always hold reference to their parents, as long as child spans are still active (not GC'd at least) then all parents will not get GC'd as well. Changing that finished spans get "locked" is new unrelated behavior change. |
…python GC doesn't kick in (#10809) _<<Description copied from the [original PR](https://github.com/DataDog/dd-trace-py/pull/10624)>>_ We (Modal Labs) noticed that enabling tracing in parts of our code caused the Python GC to kick in a lot and add large delays to our event loop. A simple repro (see test) showed us that there was a self reference cycle in span when a top-level span's local root was assigned to the span itself, preventing Python from cleaning up the objects by reference counting. The change itself is pretty straightforward - we use None to represent when a span's local root is itself. The span still exposes _local_root with the same behavior via a property, and internally stores a _local_root_value that holds a non-self-referential span. This also means that the tracer no longer needs to explicitly set the local root for top-level spans. This should be fairly non-risky as it doesn't change any code behavior. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Daniel Shaar <[email protected]> Co-authored-by: Daniel Shaar <[email protected]> (cherry picked from commit e160b36)
…python GC doesn't kick in (#10809) _<<Description copied from the [original PR](https://github.com/DataDog/dd-trace-py/pull/10624)>>_ We (Modal Labs) noticed that enabling tracing in parts of our code caused the Python GC to kick in a lot and add large delays to our event loop. A simple repro (see test) showed us that there was a self reference cycle in span when a top-level span's local root was assigned to the span itself, preventing Python from cleaning up the objects by reference counting. The change itself is pretty straightforward - we use None to represent when a span's local root is itself. The span still exposes _local_root with the same behavior via a property, and internally stores a _local_root_value that holds a non-self-referential span. This also means that the tracer no longer needs to explicitly set the local root for top-level spans. This should be fairly non-risky as it doesn't change any code behavior. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Daniel Shaar <[email protected]> Co-authored-by: Daniel Shaar <[email protected]> (cherry picked from commit e160b36)
…python GC doesn't kick in (#10809) _<<Description copied from the [original PR](https://github.com/DataDog/dd-trace-py/pull/10624)>>_ We (Modal Labs) noticed that enabling tracing in parts of our code caused the Python GC to kick in a lot and add large delays to our event loop. A simple repro (see test) showed us that there was a self reference cycle in span when a top-level span's local root was assigned to the span itself, preventing Python from cleaning up the objects by reference counting. The change itself is pretty straightforward - we use None to represent when a span's local root is itself. The span still exposes _local_root with the same behavior via a property, and internally stores a _local_root_value that holds a non-self-referential span. This also means that the tracer no longer needs to explicitly set the local root for top-level spans. This should be fairly non-risky as it doesn't change any code behavior. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Daniel Shaar <[email protected]> Co-authored-by: Daniel Shaar <[email protected]> (cherry picked from commit e160b36)
…python GC doesn't kick in [backport 2.14] (#10838) Backport e160b36 from #10809 to 2.14. _<<Description copied from the [original PR](https://github.com/DataDog/dd-trace-py/pull/10624)>>_ We (Modal Labs) noticed that enabling tracing in parts of our code caused the Python GC to kick in a lot and add large delays to our event loop. A simple repro (see test) showed us that there was a self reference cycle in span when a top-level span's local root was assigned to the span itself, preventing Python from cleaning up the objects by reference counting. The change itself is pretty straightforward - we use None to represent when a span's local root is itself. The span still exposes _local_root with the same behavior via a property, and internally stores a _local_root_value that holds a non-self-referential span. This also means that the tracer no longer needs to explicitly set the local root for top-level spans. This should be fairly non-risky as it doesn't change any code behavior. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: erikayasuda <[email protected]>
…python GC doesn't kick in [backport 2.13] (#10837) Backport e160b36 from #10809 to 2.13. _<<Description copied from the [original PR](https://github.com/DataDog/dd-trace-py/pull/10624)>>_ We (Modal Labs) noticed that enabling tracing in parts of our code caused the Python GC to kick in a lot and add large delays to our event loop. A simple repro (see test) showed us that there was a self reference cycle in span when a top-level span's local root was assigned to the span itself, preventing Python from cleaning up the objects by reference counting. The change itself is pretty straightforward - we use None to represent when a span's local root is itself. The span still exposes _local_root with the same behavior via a property, and internally stores a _local_root_value that holds a non-self-referential span. This also means that the tracer no longer needs to explicitly set the local root for top-level spans. This should be fairly non-risky as it doesn't change any code behavior. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: erikayasuda <[email protected]>
<<Description copied from the original PR>>
We (Modal Labs) noticed that enabling tracing in parts of our code caused the Python GC to kick in a lot and add large delays to our event loop. A simple repro (see test) showed us that there was a self reference cycle in span when a top-level span's local root was assigned to the span itself, preventing Python from cleaning up the objects by reference counting.
The change itself is pretty straightforward - we use None to represent when a span's local root is itself. The span still exposes _local_root with the same behavior via a property, and internally stores a _local_root_value that holds a non-self-referential span. This also means that the tracer no longer needs to explicitly set the local root for top-level spans. This should be fairly non-risky as it doesn't change any code behavior.
Checklist
Reviewer Checklist