Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL for Continuous Aggregates and Hierarchical rollups #64

Open
jonatas opened this issue May 13, 2024 · 1 comment
Open

DSL for Continuous Aggregates and Hierarchical rollups #64

jonatas opened this issue May 13, 2024 · 1 comment

Comments

@jonatas
Copy link
Owner

jonatas commented May 13, 2024

As an ORM user, several times we need to organize migrations and heavy sql repetitions.

Today, I was working on an example and I'd like to bring this to expose the use-case. Check the full example.

I was thinking about how to bring the migration creation for complex rolling up models to avoid such boilerplate on all examples.

Let's consider we want to track count of downloads in multiple timeframes, so we have the model:

class Download < ActiveRecord::Base
  acts_as_hypertable time_column: 'ts'
    
  scope :by_hour, -> (query) do
      select("time_bucket('1h', ts) as bucket, #{query}")
  end

  scope :all_by_hour, -> do
    by_hour("count(*) as downloads").group(1)
  end

  scope :all_gems_by_hour, -> do
    by_hour("gem_name, count(*) as downloads").group(1,2)
  end

  scope :versions_by_hour, -> do
    by_hour("gem_name, gem_version, count(*) as downloads").group(1,2,3)
  end
end

Then, we need to create the continuous aggregates and also rollup to another level. So, let's do the cagg anonymous class to allow us to rollup:

cagg = -> (view_name) do
  Class.new(ActiveRecord::Base) do
    self.table_name = "downloads_#{view_name}"
    SUM = "sum(downloads) as downloads"

    scope :rollup, -> (range='1d', query=SUM) do
      select("time_bucket('#{range}', ts) as ts, #{query}")
        .group(1)
    end

    scope :per_hour, -> (query=SUM) do
      rollup('1h', query)
    end

    scope :per_day, -> (query=SUM) do
      rollup('1d', query)
    end

    scope :per_week, -> (query=SUM) do
      rollup('1w', query)
    end

    scope :per_month, -> (query=SUM) do
      rollup('1mon', query)
    end

    scope :per_year, -> (query=SUM) do
      rollup('1y', query)
    end

    def readonly?
      true
    end

    def self.refresh!
      ActiveRecord::Base.connection.execute <<-SQL
      CALL refresh_continuous_aggregate('#{table_name}', null, null);  
      SQL
    end
  end
end

Then we have the models to bind the sql properly:

PerMinute = cagg['per_minute']
PerHour= cagg['per_hour']
PerDay = cagg['per_day']
PerMonth = cagg['per_month']
GemsPerMinute = cagg['gems_per_minute']
GemsPerHour= cagg['gems_per_hour']
GemsPerDay = cagg['gems_per_day']
GemsPerMonth= cagg['gems_per_month']
VersionsPerMinute= cagg['versions_per_minute']
VersionsPerHour = cagg['versions_per_hour']
VersionsPerDay = cagg['versions_per_day']
VersionsPerMonth = cagg['versions_per_month']

And then we can build a migration that creates the base continuous aggregate and rolls up the continuous aggregates with hierarchical references:

  hypertable_options = {
    time_column: 'ts',
    chunk_time_interval: '1 day',
    compress_segmentby: 'gem_name, gem_version',
    compress_orderby: 'ts DESC',
    compression_interval: '7 days'
  }

  create_table(:downloads, id: false, hypertable: hypertable_options) do |t|
    t.timestamptz :ts, null: false
    t.text :gem_name, :gem_version, null: false
    t.jsonb :payload
  end

  {
    "per_minute": Download.per_minute,
    "per_hour": PerMinute.per_hour,
    "per_day": PerHour.per_day,
    "per_month": PerDay.per_month,
    "gems_per_minute": Download.gems_per_minute,
    "gems_per_hour": GemsPerMinute.per_hour,
    "gems_per_day": GemsPerHour.per_day,
    "gems_per_month": GemsPerDay.per_month,
    "versions_per_minute": Download.versions_per_minute,
    "versions_per_hour": VersionsPerMinute.per_hour,
    "versions_per_day": VersionsPerHour.per_day,
    "versions_per_month": VersionsPerDay.per_month,
  }.each do |name, scope|
    create_continuous_aggregate(
      "downloads_#{name}",
      scope.to_sql,
      refresh_policies: {
        schedule_interval: "INTERVAL '1 hour'",
        start_offset: "INTERVAL '3 hour'",
        end_offset: "INTERVAL '1 minute'"
      })
  end

Ideas for DSL for cascading the hierarchical continuous aggregates

I was thinking about how to avoid the boiler place as we need a more dynamic settings for such case. To generate the migrations, the minimal information would be something like:

{[:minutely, :hourly, :daily, :monthly] => "count() as total"]

or

{%w[1m 1h 1d 1w] => 'count(*) as total'}

This can be done in several ways like:

  1. Embed the metadata into create_hypertable macro.
create_hypertable :name, ..., aggregates: ...

Pros: Very useful for POCs
Cons: hard to track on large projects.

  1. Embed in the arguments of the acts_as_hypertable macro
acts_as_hypertable ..., aggregates:  ...
  1. Make it a standalone macro info that can be used in the model
aggregates :hypertable_name, *timeframes => *measurements

Convention over configuration

How to make it part of the model, in a way that the query experience is very smooth and intuitive with the context.

Download.first # fetches data from hypertable
Download::PerMinute.first # fetches from downloads_all_per_minute continuous aggregates
Download::PerHour.first # fetches from downloads_all_per_hour hierarchical continuous aggregates

Also, for generating the continuous aggregates + hierarchical, the migration could be extracted from it.

Download::ContinuousAggregates.not_created.each do |cagg|
  create_continuous_aggregates cagg.name, cagg.sql
end

Any ideas or comments based on your usage and how would you like to use the library?

@jonatas
Copy link
Owner Author

jonatas commented May 13, 2024

I updated the gist with my latest ideas from this script: https://gist.github.com/jonatas/418f360d45c890e1d86c30547a0cf6a4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant