Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aviad/blake2s #576

Open
wants to merge 15 commits into
base: V2
Choose a base branch
from
Open

Aviad/blake2s #576

wants to merge 15 commits into from

Conversation

aviadingo
Copy link

Describe the changes

adds Blake2s cuda capability.

Copy link
Contributor

@ChickenLover ChickenLover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job with the PR. I left a bunch of style related comments. In theory they can be ignored as we are merging this to V2

*/

#pragma once
typedef unsigned char BYTE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move these inside blake2s namespace

THROW_ICICLE_ERR(
IcicleError_t::InvalidArgument,
"Hash max preimage length does not match merkle tree arity multiplied by digest elements");
// if (compression.preimage_max_length < tree_config.arity * tree_config.digest_elements)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just delete those at this point

__device__ __forceinline__ void cuda_blake2s_init_state(cuda_blake2s_ctx_t* ctx)
{
memcpy(ctx->state, ctx->chain, BLAKE2S_CHAIN_LENGTH);
// ctx->state[8] = ctx->t0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these commented?

return a;
}

__device__ uint32_t cuda_blake2s_ROTR32(uint32_t a, uint8_t b) { return (a >> b) | (a << (32 - b)); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth to add __inline__

cudaMalloc(&cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch);
assert(keylen <= 32);

// CUDA_BLAKE2S_CTX ctx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be removed

WORD block = (n_batch + thread - 1) / thread;
kernel_blake2s_hash<<<block, thread>>>(cuda_indata, inlen, cuda_outdata, n_batch, BLAKE2S_BLOCK_SIZE);
cudaMemcpy(out, cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch, cudaMemcpyDeviceToHost);
cudaDeviceSynchronize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation does not support async. All of our other primitives do. So maybe worth adding HashConfig as an input and changing all the functions to their async alternatives

kernel_blake2s_hash<<<block, thread>>>(cuda_indata, inlen, cuda_outdata, n_batch, BLAKE2S_BLOCK_SIZE);
cudaMemcpy(out, cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch, cudaMemcpyDeviceToHost);
cudaDeviceSynchronize();
cudaError_t error = cudaGetLastError();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use our error-management functions (you can find an example in any of our primitives)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants