Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peculiar decode matrices being constructed by build_decode_matrix_into_space() #72

Open
chrisstjohn opened this issue Aug 22, 2022 · 8 comments

Comments

@chrisstjohn
Copy link

I'm interested in pre-calculating some decode matrices so I used some of the "under the hood" functions of fec.c

With a trivial example of k=2, n=3 I get the following:

static const uint8_t encoder_2_3[6] =
{
	0x01, 0x00, 
	0x00, 0x01, 
	0x03, 0x02, 
};

static const uint8_t decoder_2_3_0x3[4] =
{
	0x01, 0x00, 
	0x00, 0x01, 
};

static const uint8_t decoder_2_3_0x5[4] =
{
	0x01, 0x00, 
	0x8f, 0x8e, 
};

static const uint8_t decoder_2_3_0x6[4] =
{
	0x01, 0x00, 
	0x8f, 0x8e, 
};

there are three erase cases identified by a bitmap

  • 0x3 = parity loss
  • 0x5 = second data loss
  • 0x6 = first data loss

but note that the 0x5 and 0x6 decode matrices are identical - which is wrong.

I think I traced the problem to here build_decode_matrix_into_space() which does some weird stuff whilst trying to decimate the encoding matrix:

void
build_decode_matrix_into_space(const fec_t*restrict const code, const unsigned*const restrict index, const unsigned k, gf*restrict const matrix) {
    unsigned char i;
    gf* p;
    for (i=0, p=matrix; i < k; i++, p += k) {
        if (index[i] < k) {
            memset(p, 0, k);
            p[i] = 1;
        } else {
            memcpy(p, &(code->enc_matrix[index[i] * code->k]), k);
        }
    }
    _invert_mat (matrix, k);
}

I removed the apparent "optimisation" and it all springs into life:

void
build_decode_matrix_into_space(const fec_t*restrict const code, const unsigned*const restrict index, const unsigned k, gf*restrict const matrix) {
    unsigned char i;
    gf* p;
    for (i=0, p=matrix; i < k; i++, p += k) {
        memcpy(p, &(code->enc_matrix[index[i] * code->k]), k);
    }
    _invert_mat (matrix, k);
}

Now this is really really really old code and pretty central the the whole decoder; am I missing something or is it broken?

@WojciechMigda
Copy link

What values did you pass with index array?

@chrisstjohn
Copy link
Author

What values did you pass with index array?

0x3 -> { 0, 1 }
0x5 -> { 0, 2 }
0x6 -> { 1, 2 }

an alternative fix for me is to change p[i] = 1 to p[index[i]] = 1 since I think it's trying to make a row of identity rather than copy it from the encoding matrix

@WojciechMigda
Copy link

That is a good find and fix!
I cannot explain where did 0x8f, 0x8e rows come from in decoder_2_3_0x5 and decoder_2_3_0x6 matrices. These rows should be copied from the encoder matrix, but instead the look like some garbage values.

@chrisstjohn
Copy link
Author

chrisstjohn commented Aug 25, 2022

Those rows originate from the encoder matrix which is then inverted so they're not garbage - decoder_2_3_0x5 at least really is the inverse of {{ 1, 0 } { 3, 2 }}

I'm still puzzled how on earth this could work for anybody else but since I'm not using zfec for its intended purpose - just borrowing some of its math - I can't comment whether or not it does work for other people.

@WojciechMigda do you want me to submit a PR? Looks like there are a ton of unit tests that would check this for us?

@chrisstjohn
Copy link
Author

One possibility is that the code that normally uses this function re-orders the shards to replace data with parity.

e.g.
Encode A, B -> X, Y, Z
lose X
Decode Z, Y -> A, B

whereas I'm trying to decode Y, Z -> A, B

@WojciechMigda
Copy link

Those rows originate from the encoder matrix which is then inverted so they're not garbage - decoder_2_3_0x5 at least really is the inverse of {{ 1, 0 } { 3, 2 }}

Oh, I thought these are values right after the loop.

I'm still puzzled how on earth this could work for anybody else but since I'm not using zfec for its intended purpose - just borrowing some of its math - I can't comment whether or not it does work for other people.

There are some tests and they pass. In particular the Haskell ones use QuickCheck to construct input data which is then subject to roundtrip encode/decode, and they pass too.

do you want me to submit a PR? Looks like there are a ton of unit tests that would check this for us?

I am not a maintainer of this project, I just happen to be on the subject for the past few weeks. As for the existing tests it would be worth checking if they adequately cover this scenario.

One possibility is that the code that normally uses this function re-orders the shards to replace data with parity.

I don't see anything like that in zfec --- fec_decode calls build_decode_matrix_into_space right away. On the other hand, original Luigi Rizzo's code does have an additional step: function called shuffle is called before build_decode_matrix. We'd need to confirm that it does the reordering you've mentioned.

@WojciechMigda
Copy link

WojciechMigda commented Aug 28, 2022

@chrisstjohn maybe you already figured this out by yourself, but anyway I looked closely into this and these are my conclusions:

  1. The code should stay as it is. fec_decode requires that block indices passed with index array are ordered to satisfy this assertion:
(index[i] >= k) || (index[i] == i)

(https://github.com/tahoe-lafs/zfec/blob/master/zfec/fec.c#L524)

If the above does not hold then fec_decode will abort execution. [1]

  1. Original Rizzo's fec did reorder indices within fec_decode (index contents was mutable), but in zfec the same reordering was relocated to python API wrapper: https://github.com/tahoe-lafs/zfec/blob/master/zfec/_fecmodule.c#L454 .
  2. With blocks that fec_decode works on being in order which satisfies that index[i] == i it becomes obvious that p[i] = 1 is equivalent to p[index[i]] = 1.

[1] Unless assertions are disabled, but this is not a default setup.

@chrisstjohn
Copy link
Author

Thanks @WojciechMigda for your analysis and thoughts.

My feeling is that an ordering requirement buried deep in a 'C' library that is implemented way outside in the Python wrapper is a rather brittle design. As you say, right now there is an assertion in fec_decode but I can't see the harm of

  • changing the assertion to assert(index[i] < code->n)
  • changing p[i] to p[index[i]]

As I agree that right now, in this case, with this Python wrapper, index[i] == i

Even in terms of performance there probably wouldn't be an extra indirection as index[i] is already fetched and available to the compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants