Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matcher: new database schema #1340

Merged
merged 4 commits into from
Aug 5, 2024

Conversation

hdonnay
Copy link
Member

@hdonnay hdonnay commented May 9, 2024

This is a new database schema for the matcher subsystem, focused on:

  1. enabling queries across ecosystem silos
  2. reducing write traffic during an update
  3. trusting identifiers in the data instead of using client-side hashing schemes
  4. bulk load via COPY

Explicit non-goals:

  1. maintaining backwards compatibility
  2. JOIN avoidance

TODO for this PR:

  • exercise GC down to attr and the "name" tables
  • benchmark a likely query, perhaps add a relevant VIEW

@hdonnay hdonnay force-pushed the hack/new-updater-db branch 2 times, most recently from 9390605 to 2a46f51 Compare May 15, 2024 22:20
@hdonnay hdonnay force-pushed the hack/new-updater-db branch 2 times, most recently from fa2fbf1 to 60a6c74 Compare May 31, 2024 16:31
@hdonnay

This comment was marked as resolved.

@hdonnay hdonnay force-pushed the hack/new-updater-db branch 5 times, most recently from b7073cd to bfa4413 Compare June 3, 2024 20:23
Copy link

codecov bot commented Jun 3, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 12 lines in your changes missing coverage. Please review.

Please upload report for BASE (dev/matcher-v2@20f7be7). Learn more about missing BASE report.

Files Patch % Lines
datastore/postgres/migrations/migrations.go 60.00% 6 Missing and 6 partials ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##             dev/matcher-v2    #1340   +/-   ##
=================================================
  Coverage                  ?   56.25%           
=================================================
  Files                     ?      267           
  Lines                     ?    16876           
  Branches                  ?        0           
=================================================
  Hits                      ?     9493           
  Misses                    ?     6418           
  Partials                  ?      965           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hdonnay hdonnay force-pushed the hack/new-updater-db branch 2 times, most recently from 4737949 to 086ee3e Compare June 7, 2024 14:26
@hdonnay
Copy link
Member Author

hdonnay commented Jun 7, 2024

I cooked up a monster "show me everything" view that took ~38ms to return the first row.

CREATE TYPE display_reference AS (
	namespace TEXT,
	name TEXT,
	uri TEXT[]
);

CREATE TYPE display_package AS (
	name TEXT,
	kind PackageKind,
	arch Architecture[],
	vulnerable_range VersionMultiRange,
	version_upstream TEXT[],
	version_kind TEXT,
	purl TEXT,
	cpe TEXT
);

CREATE TYPE display_attr AS (
	mediatype TEXT,
	data JSONB
);

CREATE OR REPLACE VIEW latest_advisory_full AS
SELECT
	latest.id,
	latest.added,
	latest.generation,
	latest.updater,
	latest.name,
	meta.issued,
	meta.summary,
	meta.description,
	meta.uri,
	meta.severity,
	meta.normalized_severity,
	ref.refs,
	pkg.pkgs,
	attr.attrs
FROM
	matcher_v2.latest_advisory AS latest
	JOIN matcher_v2.advisory_meta AS meta ON meta.advisory = latest.id
	JOIN (
		SELECT
			ar.advisory,
			array_agg((
			r.namespace,
			r.name,
			r.uri)::matcher_v2.display_reference) AS refs
		FROM
			matcher_v2.advisory_reference AS ar
			JOIN matcher_v2.reference AS r ON r.id = ar.reference
		GROUP BY
			ar.advisory
	) AS ref ON ref.advisory = latest.id
	JOIN (
		SELECT array_agg((n.name, p.kind, p.arch, p.vulnerable_range, p.version_upstream, p.version_kind, p.purl, p.cpe)::matcher_v2.display_package) AS pkgs, p.advisory
		FROM matcher_v2.package AS p
		JOIN matcher_v2.package_name AS n ON n.id = p.name
		GROUP BY p.advisory
	) AS pkg ON pkg.advisory = latest.id
	JOIN (
		SELECT aa.advisory,
			array_agg((mt.mediatype, a.data)::matcher_v2.display_attr) AS attrs
		FROM matcher_v2.advisory_attr AS aa
		JOIN matcher_v2.attr AS a ON a.id = aa.attr
		JOIN matcher_v2.mediatype AS mt ON a.mediatype = mt.id
		GROUP BY aa.advisory
	) AS attr on attr.advisory = latest.id
;

I don't think this is a typical query, as the normalized parts (package, attr, ref) won't all be joined to the package at the same time, and the meta table definitely shouldn't be.

@hdonnay hdonnay changed the base branch from main to dev/matcher-v2 June 11, 2024 19:54
@hdonnay
Copy link
Member Author

hdonnay commented Jun 11, 2024

Decided to try merging this stuff into a feature branch as things are going.

@hdonnay hdonnay marked this pull request as ready for review June 11, 2024 19:56
@hdonnay hdonnay requested a review from a team as a code owner June 11, 2024 19:56
@hdonnay hdonnay requested review from crozzy, BradLugo and RTann and removed request for a team June 11, 2024 19:56
@RTann RTann requested a review from jvdm June 18, 2024 01:30
Copy link
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments while trying to stand up the schema that I noticed (i.e. not a full review)

ADD UNIQUE (mediatype, data);

COMMENT ON TABLE attr IS $$
Attrs are generic features of advisories and references.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a table to connect attrs with references

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, attr and reference are associated with an advisory.


COMMENT ON FUNCTION version_check_array IS 'This function reports whether a text array is well-formed to be used as a VersionRange';

CREATE OR REPLACE FUNCTION version_check (mr matcher_v2.VersionMultiRange)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a VersionMultiRange definition here? Can't see to be able to run the migration without an error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's created automatically with a RANGE type: https://www.postgresql.org/docs/16/sql-createtype.html#SQL-CREATETYPE-RANGE

@RTann RTann requested review from daynewlee and removed request for BradLugo July 18, 2024 20:48
This meets the PostgreSQL version requirements and prevents needing to
install it.

Signed-off-by: Hank Donnay <[email protected]>
@hdonnay

This comment was marked as outdated.

This comment was marked as outdated.

@hdonnay hdonnay merged commit 6a82cf9 into quay:dev/matcher-v2 Aug 5, 2024
8 checks passed
@hdonnay hdonnay deleted the hack/new-updater-db branch August 5, 2024 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants