Skip to content

Releases: kernelci/kcidb

v9

30 Nov 08:09
Compare
Choose a tag to compare

Another major release. Most-visible changes are listed below.

After this release we'll be improving our CI/CD to shorten our development cycle, so we can make smaller and more frequent releases.

Schema

  • Switch to using v4 schema, released with kcidb-io v3. Changes from v3 schema include:
    • Rename revisions to checkouts to better represent what is actually submitted, improve correlation, and prevent data loss. The checkouts are identified purely by origin-generated IDs, similarly to builds and tests. The commit hash only appears in git_commit_hash field now, and the patchset hash gets its own field.

      NOTE: the submitting CI systems that test and send revisions more than once are urged to upgrade to v4 schema to avoid revision ID-inherited checkouts overwriting each other.

    • Add patchset_hash field to checkouts to store the patchset hash, which was previously a part of revision ID.

      NOTE: you need to set patchset_hash to empty string, if you have no patches applied on top of the commit you checked out, otherwise your data might not appear in reports and dashboards.

    • Rename the checkout's patch_mboxes field to patchset_files to better correspond to the new patchset_hash field.

    • Rename all description fields to comment. The description name had the meaning of describing each object overall. However we have other, dedicated fields describing objects in detail, and we'd rather use those to generate our own description, consistently, regardless of the submitter, and use the comment field to augment that description.

    • Add log_url field to tests. It is meant to contain the URL pointing to a plain-text log file with the highest-level overview of the test's execution, similar to the log_url field in builds and checkouts. All the other log and output files should go into output_files.

    • Add log_excerpt field to all objects, meant to contain the part of the object's log (normally referenced by log_url), that was most relevant to its status. E.g. patch errors for a failed checkout, compiler errors for a failed build, error messages for a failed test. It could also be git am output for a successful checkout, the last hundred lines of a successful build, or a test suite summary for a successful test.

    • Remove the publishing_time field from checkouts, as nobody is sending them, it's not really possible to know a commit's publishing time in git, and there are no maillist-posted patches being submitted yet, for which that could be possible.

  • Support validating I/O JSON against a specific schema version with kcidb-validate. Thank you, @pawiecz!
  • Support outputting a specific version of the schema with kcidb-schema. Thank you, @effulgentstar!
  • Support specifying the version of the schema to upgrade I/O data to, with kcidb-upgrade.

Database

  • Separate the database client and database drivers. This allows implementing support for more databases, and pseudo-databases.

    Switch the library to accepting a single string specifying the driver and its parameters for opening a database, instead of BigQuery-specific project ID and dataset name. Switch all the database-accessing command-line tools to accepting just one option: -d/--database, specifying the driver and its parameters, instead of the two BigQuery-specific options: -p/--project and -d/--dataset.

    E.g. instead of running: kcidb-query -p kernelci-production -d kernelci05 -c redhat:122398712, run: kcidb-query -d bigquery:kernelci-production.kernelci05 -c redhat:122398712.

    Use the --database-help option with any database-accessing tool to print documentation on all drivers and their parameters (thank you, @amfelso).

  • Add null driver, which just discards loaded data, and returns no data for queries, which is useful for testing and development.

  • Add SQLite database driver (sqlite), supporting all the operations we use on BigQuery. This simplifies development and testing of subscriptions and notifications by removing the need for BigQuery access.

  • Add json database driver - an extension of the SQLite driver, always storing the database in-memory, and pre-loading it with JSON I/O data from stdin. This lets us implement command-line tools simulating notification generation directly from the JSON generated by a CI system, without the need to create or access a database explicitly.

  • Add object de-duplication when either loading into, or querying from the database. As previously, if there are two objects with the same ID being loaded into, or queried from the database, and a field's value is present in both of them (is not NULL in both of them), then the used value will be picked out of those two non-deterministically.

  • Replace BigQuery tables with views returning de-duplicated objects. Prefix the original table names with _. This makes querying the BigQuery database easier in code, manually, and in our Grafana dashboards.

  • Remove support for querying database objects using LIKE patterns matching their IDs, from both the library and the command-line tools, since nothing and nobody was using that, and since that simplifies the code.

  • Remove the kcidb-db-complement tool, since the "complement" operation is no longer required by the new ORM. Thank you, @mharyam!

ORM

  • Implement a new ORM layer to support representing results of any query as Python objects (e.g. revisions aggregated from checkouts), and summarizing results (e.g. giving a build/test PASS/FAIL for a revision). Use a custom "pattern" syntax inside the ORM and with command-line tools, to specify the objects to query or notify about.

    E.g. >checkout[redhat:12398712]#>*# pattern matches the checkout with ID redhat:12398712 and all its children objects (builds and tests), and e.g. >test[kernelci:8768ad33f]<*$ matches the ultimate parent (revision) of a test with ID kernelci:8768ad33f.

    Use the --pattern-help option with any ORM-using tool (e.g. kcidb-notify) to print the pattern's ABNF syntax and some examples.

  • Add kcidb-oo-query tool, which outputs the internal object-oriented representation of database objects matching the specified ORM "pattern", and is useful for debugging and developing the ORM layer.

Notifications

  • Rework our notifications to aggregate results coming from multiple CI systems for the same revision, and to summarize build and test results into a compact message. Support subscription-specific notification templates, allowing sharing and reusing of various pieces and macros with others.
  • Add a minimal HTML version to notification messages, to force some clients (e.g. GMail and groups.io) to use fixed-width fonts, for correct formatting. Thank you, @effulgentstar!
  • Remove the kcidb-summarize and kcidb-describe tools, since the notion of "canonical" text rendering of database objects has been removed from the new ORM.
  • Add kcidb-ingest tool, which generates notifications for objects created or modified by loading the input data into a (temporary) database. This emulates the notification-generation process deployed to Google Cloud without requiring it, and helps with developing and testing subscriptions and notifications.

Miscellaneous

  • Fold the kcidb-mq-publisher-* and kcidb-mq-subscriber-* tools into kcidb-mq-io-publisher and kcidb-mq-io-subscriber respectively. This reduces the number of KCIDB executables.
  • Add kcidb-mq-pattern-publisher and kcidb-mq-pattern-subscriber tools for managing ORM Pattern message queues used in our Google Cloud deployment.
  • Automate Google Cloud deployment and start doing test deployments in CI.

v8

13 Oct 11:44
Compare
Choose a tag to compare
v8

Another major release. Changes include:

  • Support processing JSON streams for all command-line tools. Now it's possible to feed multiple JSON report objects, one after another, into a single KCIDB command, and have them processed appropriately. RFC 7464 is supported as well. This removes the overhead of starting the tool (and connecting to the cloud) for every submitted report.
  • Make kcidb-merge accept the reports to merge on standard input, as a JSON stream, instead of expecting them as file arguments.
  • Make kcidb-notify accept the "new" reports on standard input, as a JSON stream, instead of expecting them as file arguments.
  • Support splitting the data retrieved from the database into multiple reports, limited by the number of objects, when using the library or the command-line tools. This allows retrieving large amounts of data without running out of memory. Support output either as simple concatenated-JSON streams, one-report-per-line, or using the RFC 7464 format.
  • Extract the kcidb.io package into a separate distribution called kcidb-io, to minimize the number of dependencies required for validating report data. See its v1 and v2 release notes for changes since kcidb v7. One important change brought by this is enabling enforcement of format rules in JSON schema for fields containing URLs, timestamps, and email addresses. If any of those were incorrect in your data before, now they will fail to validate.
  • Make sure the report is successfully sent to the message queue before returning from the submit()/publish() function in kcidb library, to avoid data loss. Before this the report could be handled later by a separate thread for the purpose of batching multiple submissions. Provide a function (future_publish()) to still allow batching and delayed submission.
  • Make kcidb-submit and kcidb-mq-publisher-publish print "submission IDs" (message queue message IDs) of each sent report. Note that due to batching the IDs could be printed with a delay, even after multiple following report were accepted, but they would still be printed in order.
  • Reduce amount of internal consistency verification in KCIDB code, by default. This improves performance when processing multiple/large datasets.
  • Ignore Syzbot test results in PoC subscriptions until we implement issues/incidents and can handle its frequent test failures.
  • Stop sorting JSON object keys in command-line tool output. The order will change, but will still stay stable mostly.
  • Add SUBMISSION_HOWTO.md explaining how to start submitting to KCIDB.
  • Add a minimal Dockerfile for a container with KCIDB installed.

v7

11 Aug 10:20
Compare
Choose a tag to compare
v7

A minor release, including the following changes:

  • Switch the notification templates to working with v3 schema introduced in the previous release. Before that they would produce nonsense when describing revisions.
  • Implement kcidb_load_queue - a Google Cloud Function optimizing the submission loading to avoid exceeding BigQuery load job quota and stalling. Pull submissions from the queue explicitly, providing more information on the speed of processing and the outstanding data, than the previously-used Google Cloud Function retry system would. Explicit pulling also allows holding submissions in the queue while upgrading or debugging without losing data. The new implementation could still hit the quota, but the probability of that is low, and so is complexity. Rename the previous implementation to kcidb_load_message.
  • Optimize I/O data merging to speed up 3+ dataset cases dramatically. This is particularly useful for bundling submissions before loading to BigQuery in kcidb_load_queue.
  • Add kcidb-count tool outputting the total number of objects (revisions/builds/tests) contained in I/O data. The underlying implementation is used to calculate cut-off point when collecting submissions to load in kcidb_load_queue.

v6

04 Aug 15:36
Compare
Choose a tag to compare
v6

A major release of KCIDB. Changes include:

  • Implement schema v3.0, with the following changes. See the attached kcidb.v3.0.schema.json for details.
    • Re-introduce the separate origin field, but keep the origin in IDs as necessary, regardless.
    • Tighten the definition of the revision ID: now it must be the commit hash, optionally followed by a plus (+) character and a sha256 hash identifying the applied patches. This allows correlating received reports across submitters.
    • Add tree_name field to revisions, containing the widely-recognized name of the base code's sub-tree. E.g. "mainline", or "net-next".
    • Rename revision fields git_repository_commit_name and git_repository_commit_hash to git_commit_name and git_commit_hash respectively, making them easier to read and not linked to the containing repository.
    • Require Git repository URLs to start with either https:// (preferably) or git://.
  • Add kcidb-notify tool, taking new (and existing) I/O data and outputting NUL-terminated notification messages. Could be used to debug notifications, or as an alternative way of generating them in production.
  • Add support for merging I/O data, and the corresponding kcidb-merge tool. Could be useful for merging smaller submissions together into bigger ones.
  • Add support for specifying logging level to every command-line tool. Nothing much is logged yet, only queries executed by the database client. The default level is NONE, disabling any logging.
  • Add minimal logging to Google Cloud Functions, set INFO as the log level.
  • Log data coming to Google Cloud Functions with DEBUG level.
  • Add a dummy subscription for mainline tree failures.
  • Support sending notifications to selected subscriptions only in Google Cloud Functions, select "mainline".
  • Support querying objects using exact IDs (both for library and command-line tools), in addition to LIKE patterns, which works much faster.
  • Switch to querying exact object IDs in notification generation, speeding it up dramatically.
  • Add X-KCIDB-Notification-ID header to notification messages, containing the (unique) notification ID.
  • Support and require specifying the Firestore collection path with the spooled notifications, both for Google Cloud Functions and the kcidb-spool-wipe tool.

The IDs in the existing dataset were updated for the new schema using the attached update-revision-ids script.

v5

14 Apr 15:47
Compare
Choose a tag to compare
v5

Another major release of kcidb includes:

  • Switch to report submission via Google Cloud Pub/Sub message queue. This speeds up submission considerably and allows implementing report notifications.

    However, this also changes the parameters required for submission: instead of BigQuery dataset name (e.g. kernelci03), these now should be the Google Cloud project ID (kernelci) and the Pub/Sub topic name (kernelci_new). OTOH, these parameters won't need to be updated whenever we switch to a new dataset.

    The required query parameters stay the same.

    The Client interface in the library changes accordingly.

    See kcidb-submit --help and kcidb-query --help output for details, as well as the code documentation for kcidb.Client class.

  • Implement preliminary report notification system, with two dummy subscriptions and e-mails sent to [email protected]. Spool the generated notifications in Google Cloud Firestore database, to avoid sending the same notification twice. Implement subscriptions as Python modules matching the report objects (revisions/builds/tests) of interest and generating notifications.

  • Add kcidb-spool-wipe tool for removing (old) notifications from the notification spool.

  • Add two tools for producing a summary and a description of a report object: kcidb-summarize and kcidb-describe respectively. These take report data on the standard input, and the name of the object list, plus optional IDs of objects to process on the command line. They output a text summary or a text description of the object(s), the same way as they would appear in a notification e-mail. These could be used for testing both the data you submit and the report generation.

  • Add support for querying particular objects from the database, using SQL LIKE patterns for IDs. Also allow querying the matching objects' parents and/or children. See kcidb-query --help output for details.

  • Add kcidb-db-dump tool for dumping the whole database unconditionally, doing the previous job of kcidb-db-query, which acquires the same object selection parameters as kcidb-query does.

  • Fix kcidb-db-complement tool and kcidb.db.Client.complement() function to not produce a combinatorial explosion when fetching multiple copies of the same object from the database.

v4

27 Mar 18:07
Compare
Choose a tag to compare
v4

A minor bugfix release, fixing inability to inherit v1 I/O data with any of resource file lists missing (e.g. patch_mboxes in revisions or input_files in builds).

v3

26 Mar 17:51
Compare
Choose a tag to compare
v3

A major release with lots of changes, including the below.

  • Add I/O schema v2.0. Changes below.
    • Merge *origin and *origin_id fields into *id fields, for all objects.
    • Explicitly prohibit resource file names from containing directory names (i.e. the / character) to allow using them in Content-Disposition: filename= headers.
  • Implement upgrading data from older to newer schema versions, automatically. Add a tool for manual upgrading of I/O data, called kcidb-upgrade.
  • Add storing the latest I/O schema version in the BigQuery dataset, when initializing it. Prohibit loading and querying data into/from the dataset if its major version is not the same as the major version of KCIDB's latest I/O schema.
  • Add a test catalog file (tests.yaml) containing identifiers for tests submitted by CI systems (CKI mostly so far), and a tool for validating the catalog, called kcidb-tests-validate.
  • To make space for adding tools for other subsystems, rename kcidb-init to kcidb-db-init and kcidb-cleanup to kcidb-db-cleanup. Make kcidb-submit and kcidb-query tools implementation-agnostic, and add implementation-specific kcidb-db-load and kcidb-db-query.
  • Add an experimental tool called kcidb-db-complement, which takes I/O data and returns it with all missing, but referenced objects added. The mechanism is to be used for processing subscriptions and generating reports, but the tool may be deprecated later.
  • Implement prototype tools for managing message queues and communicating through them:
    • kcidb-mq-publisher-init
    • kcidb-mq-publisher-cleanup
    • kcidb-mq-publisher-publish
    • kcidb-mq-subscriber-init
    • kcidb-mq-subscriber-cleanup
    • kcidb-mq-subscriber-pull
  • Implement a placeholder for the future Google Cloud Functions module communicating with a message queue. To be used for intercepting submissions and generating notifications with reports.

v2

26 Mar 17:24
Compare
Choose a tag to compare
v2

A minor release, changes include:

  • Add config_name and config_url fields to build objects.
  • Bump I/O schema version to 1.1
  • Switch to representing I/O schema versions as an object.

v1

22 Nov 13:54
Compare
Choose a tag to compare
v1

Initial release to allow control of upgrades for submitting CI systems.

A set of kcidb-* executables is implemented allowing basic submission and trivial retrieval of database contents, as well as database maintenance and data validation.