Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: add channel UUID #922

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

spec: add channel UUID #922

wants to merge 2 commits into from

Conversation

james-rms
Copy link
Collaborator

Public-Facing Changes

  • Resolves ambiguity around extending records with new fields. Records may be extended by changes to the specification only.
  • Defines zero values for all field types, to be used when a field is missing from a serialized record.
  • Adds a UUID field to the Channel record. This allows readers to safely treat channels in different files as the same logical channel.

@defunctzombie
Copy link
Contributor

Please split these up into separate PRs so we can consider each individually without holding up changes on the ones we want to spend more time on.

| 4 + N | topic | String | The channel topic. |
| 4 + N | message_encoding | String | Encoding for messages on this channel. The [well-known message encodings][message_encodings] are preferred. |
| 4 + N | metadata | `Map<string, string>` | Metadata about this channel |
| 16 | uuid | uuid | A globally unique identifier for this channel. A [nil UUID](https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.7) indicates no globally unique identifier is available. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this could be solved by using channel metadata. is there something that makes this special enough to be an explicitly named field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being a first class citizen means that tooling can rely on this field. Metadata is better read as "user-data" that is unspecified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have information about such a use case? What if I use something other than UUID to identify channels?

I think we should have hard coded fields where there is a demonstrated need for cross-tool interoperability. If this is only to support needs of individual companies then using a named metadata field would be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that UUID is the appropriate term or requirement. I would describe it is an identifier that is stable across multiple separate mcap files (possibly via rolling recording).

I think this PR needs to be split up into separate proposals that have clear problem statements. Right now its not in a state that makes a case on what problems it is solving.

@@ -111,12 +111,14 @@ All MCAP records are serialized as follows:

Record type is a single byte opcode, and record content length is a uint64 value.

Records may be extended by adding new fields at the end of existing fields. Readers should ignore any unknown fields.
Future changes to this specification may extend records by adding new fields at the end of existing fields. Readers should ignore any unknown fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are introducing this language it would be good to do in a separate PR, and ensure we have conformance tests for readers well ahead of introducing any new fields.


> The Footer and Message records will not be extended, since their formats do not allow for backward-compatible size changes.

Each record definition below contains a `Type` column. See the [Serialization](#serialization) section on how to serialize each type.

Record content may end before all known fields have been serialized. Readers should treat a missing field as having that field type's [zero value](#zero-values).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to make this statement, we should probably be explicit that records must still be fully formed, even if they are missing fields (i.e. records must match their stated length). otherwise the file should be considered corrupt.

### String

Strings are serialized using a `uint32` byte length followed by the string data, which should be valid [UTF-8](https://en.wikipedia.org/wiki/UTF-8).

<byte length><utf-8 bytes>

The [zero value](#zero-values) of a string is the empty string.
Copy link
Contributor

@amacneil amacneil Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [zero value](#zero-values) of a string is the empty string.
The [zero value](#zero-values) of a string is a string with length zero (i.e. `0x00000000`).

@@ -353,6 +370,8 @@ Example `Tuple<uint16, string>`:

<uint16><uint32><utf-8 bytes>

The [zero value](#zero-values) of a tuple is the first value type's zero value followed by the second value type's zero value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do tuples ever occur outside of an array? I don't think this case is needed.

@@ -365,6 +384,8 @@ An array of uint64 is specified as `Array<uint64>` and serialized as:

> Since arrays use a `uint32` byte length prefix, the maximum size of the serialized array elements cannot exceed 4,294,967,295 bytes.

The [zero value](#zero-values) of an array is the empty array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [zero value](#zero-values) of an array is the empty array.
The [zero value](#zero-values) of an array is an array with length zero (i.e. `0x00000000`).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants