Skip to content

Shared Memory Wrapper

Vincent Le edited this page Oct 2, 2021 · 4 revisions

This page attempts to discuss the shared memory wrapper, the design decisions that were made regarding the wrapper, and how some of the details work / how those details enable certain optimizations.

Please read the wiki page on shared memory before proceeding.

Motivation

Interprocess communication (IPC) is orders of magnitude of slower than communication between threads in the same process, which is slower still than communication between functions in the same thread. If we could, the fastest way to ferry data between Dawn/Shepherd, student code, and the devices attached to the robot would be to have the entirety of Runtime contained within a single thread of a single process. This, however, is obviously impractical for code readability, complexity, and maintainability, so Runtime is broken down into several independent processes (which themselves have separate threads) to divide up the tasks. As a result, we now need to deal with IPC.

There are various methods of IPC; in order of (generally) slowest to fastest: file sharing, sockets, pipes and message queues, and shared memory. Since so much of the data is being transferred over IPC, and previous iterations of Runtime using pipes still had problems with the speed of the IPC, the decision was made to use shared memory for Runtime's IPC. Additionally, shared memory is well-suited for Runtime since the data that is passed between the processes is very structured and consistent (device data, gamepad data, robot state, robot log), so we don't have to worry about constantly expanding or shrinking the size of the shared memory blocks.

Expected Behavior

The shared memory wrapper is an interface for the various Runtime processes to access useful data about the current state of the system: whether dawn is connected, what devices are connected, and device data.

Since nearly all internal communication on Runtime occurs through the shared memory blocks, and we want the effects of any updates to robot state to propagate through Runtime as quickly as possible, the wrapper should be optimized to some extent. To this end, we use several large bitmaps to try to reduce the amount of computation needed to retrieve updated information from the shared memory wrapper.

Though the shared memory wrapper should be fast, it should also be extremely robust and reliable, i.e. the data should not become corrupt under any circumstances. We also need to be able to ensure that the shared memory blocks are created reliably and removed reliably on Runtime shutdown, since if the shared memory blocks are not removed reliably on Runtime shutdown, then errors will occur on Runtime reboot. To this end, we use semaphores to control access to the various shared memory blocks, and have two programs, shm_start and shm_stop which create and destroy all of the shared memory blocks, respectively. shm_start must be run before any of the Runtime processes initialize; shm_stop must be run once all Runtime processes terminate in the Runtime reboot sequence.

Relevant Files

  • shm_start.c: the source code for the program that creates and initializes all of the shared memory blocks and semaphores
  • shm_stop.c: the source code for the program that destroys and unlinks all of the shared memory blocks and semaphores
  • shm_wrapper.h: the interface presented by the shared memory wrapper to all of the processes in Runtime
  • shm_wrapper.c: the source code that implements the shared memory wrapper

Explanations

This section attempts to discuss some of the design decisions or more obtuse parts of the code.

Semaphores

Please read the wiki page on semaphores before proceeding.

In the shared memory wrapper, each device that is registered in the shared memory wrapper has two associated semaphores: one for its DATA stream (where dev_handler writes device data for executor to read when executing student code), and another for its COMMAND stream (where executor writes commands from student code for dev_handler to pass to the devices). When any process wants to access the data on a particular stream by calling the shared memory wrapper, the wrapper will automatically acquire the associated semaphore before accessing the data, and release it after the access. There are also five additional semaphores. The first two, cmd_map_sem and catalog_sem, are useful for facilitating the fast transfer of commands from executor to dev_handler, and for connecting and disconnecting devices, respectively. The next three; rd_sem, gp_sem, and log_data_sem; are for access-guarding the robot description shared memory block, the gamepad state shared memory block, and the robot log data shared memory block, respectively. There is also an additional check to make sure that a gamepad is connected before attempting to access the gamepad state.

We want each individual piece of data to have its own semaphore, since if a single semaphore controlled access to too many different pieces of data, processes wanting to access one piece of data might block on a semaphore that is locked by a process that isn't even accessing the same piece of data. For example, if you have a semaphore that is controlling access to data blocks 1 and 2, and two processes A and B that both access both data blocks, it could be the case that process A locks the semaphore in order to modify data block 1. While process A is modifying data block 1, process B tries to acquire the semaphore in order to modify data block 2. The attempt is refused by the semaphore, since process A currently has the semaphore. But this doesn't make sense, since in this case process A is modifying data block 1 and process B is going to modify data block 2; they could be modifying the data blocks at the same time with no consequences! It would be better to have two semaphores; one access-guarding data block 1, and another on data block 2. In the shared memory wrapper, this is what we do, which is why we use 70 semaphores!

The Command Bitmap

The fast transfer of commands from executor and dev_handler is done as follows. There is a command bitmap, which consists of MAX_DEVICES + 1 elements that are each MAX_PARAMS bits long. If every single bit in the entire bitmap is 0, then no parameters on any devices in the system have been requested to change by the executor. The bits in the first element of the command bitmap correspond to which devices have had their parameters changed; the bits in each of the other elements of the command bitmap correspond to which params of a particular device have changed. This is best illustrated with an example. Suppose that both MAX_DEVICES and MAX_PARAMS were 32. Then the command bitmap would be made up of 33 elements, each element being a uint32_t. At the beginning, suppose a device connects to the shared memory wrapper and is assigned the index of 0. This device also has 3 parameters, at indices 0, 1, and 2. When no changes are requested on this device, we have the following state (with the least significant bit (the 0th bit) on the right):

cmd_map[0]               cmd_map[1]               cmd_map[2]               .......    cmd_map[32]
<32 bits> ... 0000000    <32 bits> ... 0000000    <32 bits> ... 0000000    .......    <32 bits> ... 0000000

Now suppose executor is running some student code, and the student code has requested to change the param at index 1 of the connected device. When executor calls shared memory to write this new value to that device's COMMAND block, the wrapper takes note of this and sets the bit corresponding to that device to 1 in cmd_map[0], and the bit corresponding to that device and that param to 1 (in this case, the second bit in cmd_map[1]). So after the executor writes to the COMMAND block and returns, we now have this state (again, with the least significant bit (the 0th bit) on the right):

cmd_map[0]               cmd_map[1]               cmd_map[2]               .......    cmd_map[32]
<32 bits> ... 0000001    <32 bits> ... 0000010    <32 bits> ... 0000000     .......    <32 bits> ... 0000000

The 0th bit of cmd_map[0] being set to 1 says that the device at index 0 has one or more parameters changed. The 1st bit of cmd_map[1] being set to 1 says that the 1st parameter of the device at index 0 has had its value changed. This makes several optimizations possible for dev_handler. dev_handler no longer needs to read from each connected device continuously and compare it with the command that was most recently sent to the device to determine whether or not a new command was changed. Rather, it can just continuously read cmd_map[0] and check if it is equal to 0. If it is not equal to 0, that means that at least one device has new commands to send, and it can proceed to retrieve those new values by calling device_read on exactly the devices specified in cmd_map[0]. Furthermore, dev_handler can simply provide the bitmap corresponding to that device for which parameters have changed to the call to device_read as the argument params_to_read, meaning that the dev_handler does not waste any time retrieving, processing, packaging, or sending commands that have not changed. In our example, dev_handler would read cmd_map[0], notice that the 0th bit is set to 1, and call device_read with dev_ix = 0 and params_to_read = cmd_map[1] to retrieve the new param values. Of course, after dev_handler retrieves the data, all of the read bits are cleared by the shared memory wrapper (set back to 0).