Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc_count runtime metrics are reported as gauges but contain process counters #3832

Open
SpamapS opened this issue Aug 7, 2024 · 1 comment
Labels
bug Involves a bug community Was opened by a community member

Comments

@SpamapS
Copy link

SpamapS commented Aug 7, 2024

Current behaviour

All GC.stats are reported as gauges

Expected behaviour
Counters produce graphs like this:

image

This isn't terribly useful as a gauge, and should be reported as a count.

This is a bit tricky as some of the other stats are _count but do represent gauges.

Steps to reproduce

Just enable ruby runtime metrics and graph runtime.ruby.gc.gc_count or runtime.ruby.gc.major_gc_count

Environment

  • datadog version:

1.23.3

  • Configuration block (Datadog.configure ...):
  require 'ddtrace'
  ::Datadog.configure do |c|
    c.agent.host = ENV["DDTRACE_HOST"]

    c.profiling.enabled = Settings.datadog.profiling_enabled
    c.tracing.log_injection = false

    c.tracing.instrument :rails, service_name: "x-app"
    c.tracing.instrument :active_support, cache_service: "x-cache"
    c.tracing.instrument :action_pack, service_name: "x-controller"
    c.tracing.instrument :active_model_serializers
    c.tracing.instrument :active_record, service_name: "x-postgres"
    c.tracing.instrument :pg, service_name: "postgres", comment_propagation: 'full'

    c.tracing.instrument :redis # service_name defaults to "redis"

    c.tracing.instrument :http #net/http
    c.tracing.instrument :rest_client

    c.runtime_metrics.enabled = true
    c.runtime_metrics.statsd = TFE::Clients.rtstatsd
  • Ruby version:

3.1.5

  • Operating system:

Linux (various) & MacOS

  • Relevant library versions:
@SpamapS SpamapS added bug Involves a bug community Was opened by a community member labels Aug 7, 2024
@marcotc
Copy link
Member

marcotc commented Aug 8, 2024

Hey @SpamapS, we can't use "count" for GC.count because count would sum the number of times GC.count was reported, instead of only recording the latest value.

From our metrics docs: https://docs.datadoghq.com/metrics/types/?tab=count#metric-types

Suppose you are submitting a COUNT metric, notifications.sent, from a single host running the Datadog Agent. This host emits the following values in a flush time interval: [1,1,1,2,2,2,3,3].

The Agent adds all of the values received in one time interval. Then, it submits the total number, in this case 15, as the COUNT metric’s value.

Because GC.count will report the total number of GC cycles, we want it to be reported as a gauge.

Now, regarding the graph jumps you are seeing, these are likely caused by multiple Ruby processes with the same service name. The Runtime Metrics aggregate on a service level, no per individual process, thus causing such metrics to report inconsistent values.
We are actively working on a solution as we speak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Involves a bug community Was opened by a community member
Projects
None yet
Development

No branches or pull requests

2 participants