Skip to content

Protobufs

benliao1 edited this page Jan 10, 2021 · 20 revisions

Overview

Protocol Buffers (protobufs) are a messaging protocol made by Google that allows for efficient encoding and decoding of complex data over a data stream (e.g. a network connection). The reason you need a protocol is that whenever you send data, both sides need to agree on the format of the data. We also want to send the minimal amount of information required to convey our message, (if you want to learn more about this field called information theory take EECS 126). Protobufs allow us to do both of these things easily.

It works by first defining a set of messages, similar to C structs, that can be sent. The definitions are done in the proto language which is explained in depth here https://developers.google.com/protocol-buffers/docs/proto3.

Then, in each language you need to create bindings from the .proto files to objects in that language. You'll notice that there's no option to choose C as a target language. Luckily, a group of people made a third-party extension of Google protobufs that can take .proto files and generate C source and header files that can be used for a C program to pack and unpack protobuf messages. A good place to understand how that works is the protobuf-c Github and their Wiki.

There are a few reasons why we chose Google Protocol Buffers for our communication with Dawn and Shepherd:

  • Consistency with the old version of Runtime (old Runtime and Dawn used protobufs as well to communicate)
  • Speed. It was decided the speed boost gained by using protobufs was worth the trouble of setting it up to work in C. Consider this: suppose we chose a protocol like JSON. To send a boolean, we might need to send the string: "{"switch0":false}". That's 19 characters (i.e. 19 bytes) sent over the network. Compare with protobufs, which would be literally 1 byte (0). Integers and complicated nested message types offer similar amounts of message size reductions. Combine it all together, and the network traffic reduction gained by using protobufs is substantial.
  • Consistency between Runtime communication with Dawn and Shepherd. Shepherd uses JSON internally to send data around, and originally the plan was to communicate with Dawn using protobufs and communicate with Shepherd using JSON. But that's not very smart, because then Runtime would have to convert our internal data into two different message formats, which would be extremely ugly. So the decision was made to use protobufs for all of Runtime's network communications.

Tutorial & Usage

Below is a tutorial on using the protobuf-c library and the protobufs that the proto compiler generates. The code has been tested and works; all of these files were in the repo at one point, but it was decided that the Runtime member would be better served if the files were here being used as a tutorial, instead of serving no functional purpose in the Github repo. An incomplete tutorial can be found on the protobuf-c Github repo's Wiki, but these examples were what the original authors of Runtime used to learn how to use the library.

Compiling to C

Before we can use protobufs, we have to utilize the protobuf-c third-party library to compile a .proto file (the definition of a proto) into a pair of .c and .h files that we can then use to work with that proto in C. After installing the protobuf-c library, run the following command on a <name>.proto file to make a <name>.pb-c.c and corresponding <name>.pb-c.h file that encapsulate the C representation of that proto:

protoc-c --c_out . <name>.proto

You can replace the . between the --c_out flag and the <name>.proto with whichever folder you would like the generated files to appear in. Then, include the generated .h file in any code that uses that protobuf in order to interact with that protobuf.

Running the Tutorial Code

The way to run these tests is to compile them into two separate executables, one for the *_in.c file and the other for the *_out.c file similar to the following command (using the log_in.c file as an example): gcc runtime/net_handler/pbc_gen/*.c log_in.c -o log_in -L/usr/local/lib -lprotobuf-c. You need to have installed the protobuf-c library and used it to have generated the C implementations of the proto files already; in the Runtime library, those are currently stored in the folder runtime/net_handler/pbc_gen/. To run the tests, run the *_out executable in the terminal, and pipe its output to the input of the *_in executable (again, using the log_in and log_out example, something like: ./log_out | ./log_in. Basically, all the *_out program demonstrates how to pack data into the generated protobuf struct and writes it to stdout. The *_in program demonstrates how to read the protobuf (it reads it from stdin) and unpack its contents back into a generated protobuf struct, which it then prints to the program's stdout. By piping the output of the *_out program to the input of the *_in program, you can simulate the "communication" between the two endpoints.

Example #1

We start with the run mode proto, which has the following definition (runtime/net_handler/protos/run_mode.proto):

/*
 * Defines a message for communicating the run mode
 */

syntax = "proto3";

option optimize_for = LITE_RUNTIME;

enum Mode {
    IDLE = 0;
    AUTO = 1;
    TELEOP = 2;
    ESTOP = 3;
    CHALLENGE = 4;
}

message RunMode {
    Mode mode = 1;
}

The protobuf has only one field, named mode, and it has a value that is one of the four enums listed in the definition (IDLE, AUTO, TELEOP, ESTOP, CHALLENGE). The following is code that initiates a new protobuf struct in C, sets a value for its mode field, packs it, and writes it to stdout:

runmode_out.c

#include <stdio.h>
#include <stdlib.h>
#include "/runtime/net_handler/pbc_gen/run_mode.pb-c.h" // will need to change this accordingly

int main() {
    RunMode run_mode = RUN_MODE__INIT;  // Initialize the struct; RunMode is the name of the generated protobuf struct
    void* buf;                          // Buffer to store serialized data
    unsigned len;                       // Length of serialized data

    // This sets the value of the run mode field; it is a member of the run_mode struct declared previously
    // The value is of type MODE__AUTO; the first word is the name of the enum as defined in the proto file; the second word is the name of the enum
    run_mode.mode = MODE__AUTO;

    // This function (with the name of the proto replaced in the front for the proto in question; in this case it's run_mode) calculates the size of the
    // packed protobuf struct, which is the size of memory needed to allocate
    len = run_mode__get_packed_size(&run_mode);

    // This allocates the buffer for the protobuf
    buf = malloc(len);

    // This function packs the filled-in protobuf struct into the buffer. The contents of buf is now ready to send
    run_mode__pack(&run_mode, buf);

    // This prints out the number of bytes that will be sent
    fprintf(stderr, "Writing %d serialized bytes\n", len);  // See the length of message
    fwrite(buf, len, 1, stdout);                            // This writes the contents of buf to stdout

    free(buf);  // Free the allocated serialized buffer
    return 0;
}

The following code reads from stdin, unpacks it, and prints out its contents to stdout:

runmode_in.c

#include <stdio.h>
#include <stdlib.h>
#include "runtime/net_handler/pbc_gen/run_mode.pb-c.h" // will need to change this accordingly
#define MAX_MSG_SIZE 1024 // You don't know how big the incoming message will be, so define a size for the buffer larger than any possible message

// This function reads data from `stdin` into the provided buffer until there is no more data to read
static size_t read_buffer(unsigned max_length, uint8_t* buf) {
    size_t cur_len = 0;
    size_t nread;
    while ((nread = fread(buf + cur_len, 1, max_length - cur_len, stdin)) != 0) {
        cur_len += nread;
        if (cur_len == max_length) {
            fprintf(stderr, "max message length exceeded\n");
            exit(1);
        }
    }
    return cur_len;
}

int main() {
    RunMode* run_mode; // Declare a protobuf struct into which we unpack the incoming data

    // Read packed message from stdin
    uint8_t buf[MAX_MSG_SIZE];
    size_t msg_len = read_buffer(MAX_MSG_SIZE, buf); // This reads in the raw serialized bytes and puts them into buf

    // This unpacks the message in buf and populates the protobuf struct declared previously with the information contained in the serialized message
    run_mode = run_mode__unpack(NULL, msg_len, buf);
    if (run_mode == NULL) {
        fprintf(stderr, "error unpacking incoming message\n");
        exit(1);
    }

    // Here we display the message's fields by printing to stdout
    printf("Received: mode = %u\n", run_mode->mode);  //comes in as unsigned int

    // Free the unpacked message by calling the generated function
    run_mode__free_unpacked(run_mode, NULL);
    return 0;
}

Example #2

In this program, we take a look at a more complicated proto, the Text proto, which is used to send logs from Runtime to Dawn. It has the following definition (runtime/net_handler/protos/text.proto):

/*
 * Defines a message for communicating text data
 */

syntax = "proto3";

option optimize_for = LITE_RUNTIME;

message Text {
    repeated string payload = 1;  //CHALLENGE_DATA: initial values or results of challenges
                                  //LOG: list of log lines
}

This message also has one field, named payload, and it is a repeated string. In C, that is represented as an array of strings, which we must allocate memory for to store our data (and we must also remember to free everything before exiting the program too). The following is the code that packs the data into the corresponding protobuf struct:

log_out.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "runtime/net_handler/pbc_gen/text.pb-c.h" // will need to change this accordingly

#define MAX_STRLEN 100

char* strs[4] = {"hello", "beautiful", "precious", "world"}; // These are the strings we want to send

int main() {
    Text log_msg = TEXT__INIT;  // Initializes a new protobuf struct of type Text to hold the data
    void* buf;
    unsigned len;

    // Insert strings into the protobuf struct
    
    // When you have a field that is "repeated", protobuf-c automatically generates another member of the struct named n_<field_name>, which is
    // to be set to how many data elements are in that array. In this case, since the payload has 4 strings, we set n_payload to 4
    log_msg.n_payload = 4;

    // Now we allocate the array of char* (log_msg.payload is of type char**--an array of strings in C)
    log_msg.payload = (char**) malloc(sizeof(char*) * log_msg.n_payload);

    // For each string we want to send, we allocate a buffer that size and set it to the next element of log_msg.payload. Then we copy the data
    // into the newly allocated buffer.
    for (int i = 0; i < log_msg.n_payload; i++) {
        log_msg.payload[i] = (char*) malloc(sizeof(char) * strlen(strs[i]));
        strcpy(log_msg.payload[i], (const char*) strs[i]);
    }

    // The procedure for sending is exactly the same for any protobuf
    len = text__get_packed_size(&log_msg);
    buf = malloc(len);
    text__pack(&log_msg, buf);

    fprintf(stderr, "Writing %d serialized bytes\n", len);  // See the length of message
    fwrite(buf, len, 1, stdout);                            // Write to stdout to allow direct command line piping

    free(buf);  // Free the allocated serialized buffer
    for (int i = 0; i < log_msg.n_payload; i++) {
        free(log_msg.payload[i]);
    }
    free(log_msg.payload);
    return 0;
}

The following code shows how to extract the data from a serialized Text protobuf struct:

log_in.c

#include <stdio.h>
#include <stdlib.h>
#include "runtime/net_handler/pbc_gen/text.pb-c.h" // will need to change this accordingly
#define MAX_MSG_SIZE 1024

static size_t read_buffer(unsigned max_length, uint8_t* buf) {
    size_t cur_len = 0;
    size_t nread;
    while ((nread = fread(buf + cur_len, 1, max_length - cur_len, stdin)) != 0) {
        cur_len += nread;
        if (cur_len == max_length) {
            fprintf(stderr, "max message length exceeded\n");
            exit(1);
        }
    }
    return cur_len;
}

int main() {
    Text* log_msg; // Declare a struct to hold the message

    // Read packed message from standard-input.
    uint8_t buf[MAX_MSG_SIZE];
    size_t msg_len = read_buffer(MAX_MSG_SIZE, buf); // This puts the serialized message into buf, as before

    // We unpack the data in buf into the previously declared struct
    log_msg = text__unpack(NULL, msg_len, buf);
    if (log_msg == NULL) {
        fprintf(stderr, "error unpacking incoming message\n");
        exit(1);
    }

    // This display the message's fields
    for (int i = 0; i < log_msg->n_payload; i++) {
        printf("\t%s\n", log_msg->payload[i]);
    }

    // Free the unpacked message by calling the generated function!
    text__free_unpacked(log_msg, NULL);
    return 0;
}

Example #3

Finally, we take a look at the most complicated protobuf used in Runtime: the Device Data protobuf, which has the following definition (runtime/net_handler/protos/device.proto):

/*
 * Defines a message for communicating Device Data
 */

syntax = "proto3";

option optimize_for = LITE_RUNTIME;

//message for describing a single device parameter
message Param {
    string name = 1;
    oneof val {
        float fval = 2;
        int32 ival = 3;
        bool bval = 4;
    }
}

//message for describing a single device
message Device {
    string name = 1;
    uint64 uid = 2;
    uint32 type = 3;
    repeated Param params = 4;  //each device has some number of params
}

message DevData {
    repeated Device devices = 1;  //this single field has information about all requested params of devices
}

You can see here that the device data message, called DevData, is composed of one field named devices, which is an array of Device messages (repeated Device). In turn, each Device message is composed of four fields: a string name, a uint64 uid, a uint32 type (device type), and an array of Param messages (repeated Param) named params. And each Param message contains a name field and a val field, which can be one of a float (for floating-point device parameters), int32 (for integer device parameters), or bool (for boolean device parameters). This makes filling out and allocating memory for DevData messages a bit cumbersome, but it allows for the ease of transmission of information between Dawn and Runtime. The following is an example of code that prepares a DevData message for sending:

devdata_out.c

#include <stdio.h>
#include <stdlib.h>
#include "runtime/net_handler/pbc_gen/device.pb-c.h" // will need to be changed accordingly

int main() {
    void* buf;     // Buffer to store serialized data
    unsigned len;  // Length of serialized data

    // Initialize all the messages and submessages (let's send two devices, the first with 1 param and the second with 2 params)
    DevData dev_data = DEV_DATA__INIT;
    Device dev1 = DEVICE__INIT;
    Device dev2 = DEVICE__INIT;
    Param d1p1 = PARAM__INIT;
    Param d2p1 = PARAM__INIT;
    Param d2p2 = PARAM__INIT;

    // Set all the fields .....
    d1p1.name = "switch0";
    // This sets the type of the "val" field, which is declared as "oneof" in the proto file. 
    // The field "val_case" is automatically generated by the protobuf-c compiler
    d1p1.val_case = PARAM__VAL_FVAL; 
    d1p1.fval = 0.3; // This sets the actual value of the "val" field

    d2p1.name = "sensor0";
    d2p1.val_case = PARAM__VAL_IVAL;
    d2p1.ival = 42;

    d2p2.name = "bogus";
    d2p2.val_case = PARAM__VAL_BVAL;
    d2p2.bval = 1;

    dev1.name = "LimitSwitch";
    dev1.uid = 984789478297;
    dev1.type = 12;
    dev1.n_params = 1;
    dev1.params = (Param**) malloc(dev1.n_params * sizeof(Param*));
    dev1.params[0] = &d1p1;

    dev2.name = "LineFollower";
    dev2.uid = 47834674267;
    dev2.type = 13;
    dev2.n_params = 2;
    dev2.params = (Param**) malloc(dev2.n_params * sizeof(Param*));
    dev2.params[0] = &d2p1;
    dev2.params[1] = &d2p2;

    dev_data.n_devices = 2;
    dev_data.devices = (Device**) malloc(dev_data.n_devices * sizeof(Device*));
    dev_data.devices[0] = &dev1;
    dev_data.devices[1] = &dev2;
    // Now we are done setting all fields!

    // We get the length of the packed data and follow the rest of the procedure to send the message
    len = dev_data__get_packed_size(&dev_data);
    buf = malloc(len);
    dev_data__pack(&dev_data, buf);

    fprintf(stderr, "Writing %d serialized bytes\n", len);  // See the length of message
    fwrite(buf, len, 1, stdout);                            // Write to stdout to allow direct command line piping

    // Free all allocated memory
    free(buf);
    free(dev1.params);
    free(dev2.params);
    free(dev_data.devices);
    return 0;
}

And below is the code that reads in the serialized message, unpacks it, and prints it to the screen:

devdata_in.c

#include <stdio.h>
#include <stdlib.h>
#include "runtime/net_handler/pbc_gen/device.pb-c.h" // will need to be changed accordingly
#define MAX_MSG_SIZE 1024

static size_t read_buffer(unsigned max_length, uint8_t* out) {
    size_t cur_len = 0;
    size_t nread;
    while ((nread = fread(out + cur_len, 1, max_length - cur_len, stdin)) != 0) {
        cur_len += nread;
        if (cur_len == max_length) {
            fprintf(stderr, "max message length exceeded\n");
            exit(1);
        }
    }
    return cur_len;
}

int main() {
    DevData* dev_data; // Declare a new struct of type `DevData`

    // Read packed message from standard-input.
    uint8_t buf[MAX_MSG_SIZE];
    size_t msg_len = read_buffer(MAX_MSG_SIZE, buf);

    // Unpack the message using protobuf-c.
    dev_data = dev_data__unpack(NULL, msg_len, buf);
    if (dev_data == NULL) {
        fprintf(stderr, "error unpacking incoming message\n");
        exit(1);
    }

    // Display the message's fields.
    printf("Received:\n");
    for (int i = 0; i < dev_data->n_devices; i++) {
        printf("Device No. %d: ", i);
        printf("\ttype = %s, uid = %llu, itype = %d\n", dev_data->devices[i]->name, dev_data->devices[i]->uid, dev_data->devices[i]->type);
        printf("\tParams:\n");
        for (int j = 0; j < dev_data->devices[i]->n_params; j++) {
            printf("\t\tparam \"%s\" has type ", dev_data->devices[i]->params[j]->name);
            switch (dev_data->devices[i]->params[j]->val_case) {
                case (PARAM__VAL_FVAL):
                    printf("FLOAT with value %f\n", dev_data->devices[i]->params[j]->fval);
                    break;
                case (PARAM__VAL_IVAL):
                    printf("INT with value %d\n", dev_data->devices[i]->params[j]->ival);
                    break;
                case (PARAM__VAL_BVAL):
                    printf("BOOL with value %d\n", dev_data->devices[i]->params[j]->bval);
                    break;
                default:
                    printf("UNKNOWN");
                    break;
            }
        }
    }

    // Free the unpacked message
    dev_data__free_unpacked(dev_data, NULL);
    return 0;
}