Skip to content

Latest commit

 

History

History
292 lines (206 loc) · 10.4 KB

README.md

File metadata and controls

292 lines (206 loc) · 10.4 KB

binmat

A C library for efficient handling of 2D binary matrices.

by Kevin Pulo, [email protected]
https://github.com/devkev/binmat
Copyright (c) 2013
Licensed under the GNU GPL v3 or higher, as per the COPYING file.

Overview and motivation

Some applications need to store and manipulate a 2D matrix of binary (ie. boolean) values. For example, an unweighted graph can be stored as an adjacency matrix - the (i,j)-th element is a boolean indicating whether or not the i-th node is connected to the j-th. Given such a data structure, k-connectivity (for each node, the other nodes that can be reached in at most k edges or "hops") can easily be computed by taking the k-th power of the adjacency matrix.

However, for large matrices, storing each element as a byte (or worse, a 32 or 64 bit integer) is incredibly inefficient and wasteful, both in time and space. For example, a 10k square matrix requires 400Mb of RAM if stored as 32 bit int values.

Binmat is a library that bit-packs these matrices, so that the above 10k square matrix requires just 12.5Mb (the minimum space possible to store such a dense matrix).

More than that, binmat takes advantage of extremely fast bit-operations when multiplying matrices. The usual series of multiplications and additions required to compute each element are replaced by bitwise AND and OR operations. Furthermore, on 64-bit hardware each bitwise operation can replace up to 64 multiplications or additions, reducing operations that can take hundreds of clock cycles down to just a single cycle. This gives rise to some very considerable performance increases, especially when taking the power of a matrix. Binary exponentiation is implemented to further improve the performance of higher matrix powers.

Building

binmat currently uses a straightforward GNU make Makefile, and so a simple make (or gmake) should work on all modern Unixy operating systems.

There are no dependencies beyond a C compiler and associated toolchain.

Installing

After building, simply copy libbinmat.so and/or libbinmat.a to a location that the linker can find (eg. /usr/local/lib), and libbinmat.h to a location that the C preprocessor/compiler can find (eg. /usr/local/include).

Alternatively, you can specify -I/path/to/libbinmat when compiling and -L/path/to/libbinmat when linking.

To use binmat, simply do

#include "libbinmat.h"

in your code, and then use -lbinmat when linking.

API

If the HAVE_UNSIGNED_LONG_LONG_INT preprocessor symbol is defined, then binmat will use unsigned long long for its underlying chunk storage, otherwise unsigned long will be used. Alternatively, the BINMAT_DATA_TYPE macro can be defined, and that will be used instead, eg. -DBINMAT_DATA_TYPE='unsigned char'. This type must be unsigned.

Typedefs

  • binmat_data_t
    The underlying bit storage, aka the "chunk" (unsigned long long or unsigned long).

  • binmat_index_t
    The type of indices into arrays of binmat_data_t (unsigned int).

  • binmat_bool_t
    The native C type used to indicate true/false (unsigned char).

Consts

  • const binmat_data_t one
    The constant 1U as a binmat_data_t type.

  • const binmat_data_t zero
    The constant 0U as a binmat_data_t type.

Macros

  • binmat_chunkbytes
    The number of bytes in a chunk.

  • binmat_chunkbits
    The number of bits in a chunk.

  • binmat_numchunks(n)
    The number of chunks required to store n bits.

  • binmat_numbytes(n)
    The number of bytes required to store n bits.

  • binmat_numbits(n)
    The number of bits required to store n bits (will be greater than n when n is not a multiple of binmat_chunkbits).

  • binmat_numbytes_matrix(n)
    The number of bytes required to store an n by n matrix.

  • binmat_finalchunkbits(n)
    The number of bits used in the final chunk (for n bits total). If n is a multiple of binmat_chunkbits, then this will be binmat_chunkbits (not 0).

  • binmat_finalchunkmask(n)
    A mask of the bits used in the final chunk.

  • binmat_binary_bufn_t(x,n)
    Declare a char array called x of sufficient size to render n chunks.

  • binmat_binary_buf_t(x)
    Declare a char array called x of sufficient size to render a single chunk.

Functions

  • binmat_data_t *binmat_alloc(binmat_index_t n)
    Allocate an n by n matrix (initialised to all zeros) and return a reference to it.

  • void binmat_free(binmat_data_t *m)
    Free an allocated matrix m.

  • binmat_bool_t binmat_getbit(const binmat_data_t *m, binmat_index_t n, binmat_index_t row, binmat_index_t col)
    Returns the (row,col) bit of the n by n matrix m.

  • void binmat_setbit(binmat_data_t *m, binmat_index_t n, binmat_index_t row, binmat_index_t col, binmat_bool_t value)
    Sets the (row,col) bit of the n by n matrix m to be value.

  • char *binmat_format_chunk(binmat_data_t x, char *buf)
    Helper function that renders the chunk x into buf (and returns a reference to it). The buffer must be of sufficient length (see the binmat_binary_buf_t() macro).

  • void binmat_print_matrix_slow(FILE *f, const binmat_data_t *m, binmat_index_t n)
    Uses a slow, obvious algorithm (ie. loop over the bits and call binmat_getbit()) to print the n by n matrix m to the stdio FILE f.

  • void binmat_print_matrix_fast(FILE *f, const binmat_data_t *m, binmat_index_t n)
    Use a faster algorithm to print the n by n matrix m to the stdio FILE f. Currently broken

  • void binmat_print_matrix_hex(FILE *f, const binmat_data_t *m, binmat_index_t n)
    Print the n by n matrix m to the stdio FILE f in a more compact hex notation. Currently broken

  • void binmat_transpose(binmat_data_t *output, const binmat_data_t *input, binmat_index_t n)
    Transpose the n by n matrix input and store the output in output (which must already have been allocated). output can be the same as input, in which case a temporary matrix will be allocated and used.

  • int binmat_are_identical(const binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
    Compare the two n by n matrices a and b, and return true if they are identical and false otherwise.

  • void binmat_copy(binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
    Copy the n by n matrix b into a (which must already have been allocated).

  • void binmat_multiply(binmat_data_t *output, const binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
    Uses a fast algorithm to multiply together the n by n matrices a and b. Note that the b matrix must have been transposed before being passed to this function.

  • void binmat_multiply_slow(binmat_data_t *output, const binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
    Uses the simple, naive matrix multiplication algorithm (ie. using binmat_getbit() and binmat_setbit()) to multiply the n by n matrices a and b together, storing the result in output. Mostly useful to check the result of binmat_multiply().

  • void binmat_power(binmat_data_t *output, const binmat_data_t *a, const binmat_data_t *trans, binmat_index_t n, unsigned int pow)
    Takes the pow-th power of the n by n matrix a (with trans its transpose), storing the result in output.

  • void binmat_power_slow(binmat_data_t *output, const binmat_data_t *a, binmat_index_t n, unsigned int pow)
    Uses the slow, naive algorithm (ie. repeated application of binmat_multiply_slow()) to take the pow-th power of the n by n matrix a, storing the result in output.

Testing

The binmat-test program is used to test binmat's operations. In addition to confirming that fast binmat multiplication and powering matches that of the naive implementations, it also implements naive multiplication and power using straightforward "traditional" int-based matrices, and compares the results against them. This also means that it can be used for benchmarking, to test the performance improvements possible with binmat (although the limiting factor quickly becomes the time and space needed for the traditional matrices).

It takes the following command line options. n indicates a positive integer, and d indicates a (double) floating point value.

  • -n, --size n
    Specify the size of the (square) matrix (default: 151).

  • -p, --power n
    Specify the power to which the matrix should be raised (default: 3).

  • -w, --warmups n
    Specify the number of times to power the matrix as a warmup (ie. untimed) (default: 3).

  • -r, --reps n
    Specify the number of times to power the matrix (timed) (default: 10).

  • -t, --trad
    Include traditional matrix operations and comparisons (default).

  • -T, --no-trad
    Do not include traditional matrix operations and comparisons.

  • -tw, --trad-warmups n
    Specify the number of times to power the traditional matrix as a warmup (default: 1).

  • -tr, --trad-reps n
    Specify the number of times to power the traditional matrix (default: 1).

  • -d, --density d
    Specify the density of ones in the random matrix (between 0.0 and 1.0) (default: 0.01).

  • -s, --seed n
    Specify the PRNG seed value (default: time(NULL)).

  • -v, --verbose, --print
    Include verbose output (ie. print the matrices).

  • -V, --no-verbose, --silent
    Do not print the actual matrix contents (default).

  • -l, --limit d
    The maximum amount of memory (in Mb) that will be allocated (default: 1024).

License

binmat is copyright (c) 2013 Kevin Pulo, [email protected].
Licensed under the GNU GPL v3 or higher, as per the COPYING file.

However, if you're interested in using binmat in a way that is not compatible with the GPLv3+ (whether Free/Open Source or not), please feel free to contact me at the above email address to discuss the possibility of dual licensing or alternative arrangements.

Feedback

Comments, feature suggestions, bug reports, patches, etc are most welcome. Feel free to submit issues, pull requests, etc on github.