Refactor page/run tests that use the reference implementation #199

dcoutts · 2024-04-30T16:15:29Z

Many of the existing tests use the FormatPage.hs reference implementation of the page binary format, the type PageLogical and its QC generator.

Prior to PR #195 the PageLogical representation was always valid, for all possible values of the representation (at least for the purpose of serialisation, which doesn't care about key sort order). That's because the reference implementation used an unbounded approach to serialisation, rather than insisting that the binary format fit within a fixed disk page size (like 4k).

In PR #195 we introduce fixed disk page sizes, to make the reference implementation and real implementation closer and thus give us an opportunity to simplify tests using the reference impl. A consequence is that PageLogical now represents a superset of valid pages, rather than exactly all valid pages, and so the encodePage now has to be partial.

Furthermore, the tests for the real implementation need (or could be simpler) with a few variations on key/value sequences.

Existing tests

We review what existing tests do, or could do, to help us decide what set of types and QC generators would help us simplify and/or improve the tests. We go in rough order from low to higher level.

RawPage

The RawPage module is for decoding.

Existing tests:

a range of unit tests
a range of focused property tests for each aspect of the page format, or part of the size distribution, comparing against the reference impl

Missing tests:

A general test against the reference impl for the whole page content

The property tests want a type that represents a valid page (with total serialisation) with its logical content. This would make toRawPage total.

PageAcc

This covers a single page, with many key/ops in it, but not the single page case with overflow.

Existing tests:

general prototype test which compares the reference and real impls for a sequence of key/ops, including checking the sequences end at the same point.
several specific instances of prototype providing guaranteed coverage (could perhaps be replaced by labels & coverage conditions)

The existing prototype comparison test could be simplified if we could turn it into a bulk comparison test. Would need a reference impl page construction that returns the suffix that didn't fit.

These tests want a sequence of key/ops that may or may not fit within a page. The generator should cover:

single page, all operations
more than will fit in a page
probably wants a range of sizes: small, near page size

These tests need to exclude single value pages that overflow. This is covered by PageAcc1.

PageAcc1

This covers a single page, with a single key/op in it, including the single page case with overflow.

Existing tests:

two property test covering equal serialised pages between real and reference impl for single key with inserts or upsert
instances of the two above for specific cases (could be replaced by labels and coverage conditions)

These tests want a generator for single key/operation pairs, wants a range of (single value) sizes: small, near page size, overflow, multiple overflow pages.

IndexCompact

The tests here use their own page summary types for test case generation and it's unclear that they would benefit from improved types or generators for pages.

RunAcc

This now covers a whole run, including the bloom filter and compact index.

Existing tests:

now-redundant test about page size == 4k
general test about page matches prototype
- currently uses only nearly full pages but unclear why
- generates keys that are >= 6 bytes, for compact index
- can probably just use the full range of sizes: small, nealy full, overflow, multi-overflow
existing RunAcc tests are a bit of a mess after prior changes, they still talk about old PageAcc.

Missing tests:

Need to test multi-page RFP partitioning
Need general test for run consisting of multiple pages

These tests want a generator for key/op sequences that need not fit one page, and should cover multiple pages.

Need a RFP sequence partitioning, using reference impl within each partition. Reference impl does not need to be taught about RFP. But could provide a simple run construction using unfoldr + concat, in terms of a page construction returning the suffix.

Run

This is like RunAcc but now does file I/O. Much the same tests should still work at this level though.

Existing tests:

some unit tests
round trip write then read, for real impl with itself
property comparing write and open, for real impl with itself

Missing tests:

comparison of serialised run against reference impl
comparison of key/ops decoded from real run, vs original key/ops serialised by reference impl

The extra tests should mirror the RunAcc ones and so need the same types and generators.

Proposal

key/op pair sequences corresponding to a valid page (thus fitting)
key/op pair sequences that may or may not fit into a page
single key/op pair for the single-value page case
key/op pair sequences corresponding to runs of multiple pages (with keys >= 6 bytes)

What would each test use:

RawPage would use 1
PageAcc would use 2
PageAcc1 would use 3
RunAcc and Run would use 4

Extend reference impl to provide a construction function that returns the unconsumed suffix.

The text was updated successfully, but these errors were encountered:

jorisdral · 2024-05-03T06:46:56Z

IndexCompact

The tests here use their own page summary types for test case generation and it's unclear that they would benefit from improved types or generators for pages.

RunAcc

This now covers a whole run, including the bloom filter and compact index.

Existing tests:
* now-redundant test about page size == 4k

* general test about page matches prototype
  
  * currently uses only nearly full pages but unclear why

I did some digging in the git history, and I changed the generator at some point from using arbitrary :: Gen PageLogical to genFullPage, I think because the latter allows me to control how to generate keys (with >= 6 bytes), and the former doesn't. But, they don't have to be nearly full for the tests to work, I think, so that could be changed

  * generates keys that are >= 6 bytes, for compact index
  * can probably just use the full range of sizes: small, nealy full, overflow, multi-overflow

* existing RunAcc tests are a bit of a mess after prior changes, they still talk about old PageAcc.
Missing tests:
* Need to test multi-page RFP partitioning

* Need general test for run consisting of multiple pages
These tests want a generator for key/op sequences that need not fit one page, and should cover multiple pages.

Need a RFP sequence partitioning, using reference impl within each partition. Reference impl does not need to be taught about RFP. But could provide a simple run construction using unfoldr + concat, in terms of a page construction returning the suffix.

If we have this generator for key/op sequences that corresponding to runs of multiple pages, and if knows about RFP and contains sorted keys, then maybe we could use that instead of the page summary generator that we use for the compact index tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor page/run tests that use the reference implementation #199

Refactor page/run tests that use the reference implementation #199

dcoutts commented Apr 30, 2024

jorisdral commented May 3, 2024 •

edited

Loading

IndexCompact

RunAcc

Refactor page/run tests that use the reference implementation #199

Refactor page/run tests that use the reference implementation #199

Comments

dcoutts commented Apr 30, 2024

Existing tests

RawPage

PageAcc

PageAcc1

IndexCompact

RunAcc

Run

Proposal

jorisdral commented May 3, 2024 • edited Loading

IndexCompact

RunAcc

jorisdral commented May 3, 2024 •

edited

Loading