-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aviad/blake2s #576
base: V2
Are you sure you want to change the base?
Aviad/blake2s #576
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job with the PR. I left a bunch of style related comments. In theory they can be ignored as we are merging this to V2
*/ | ||
|
||
#pragma once | ||
typedef unsigned char BYTE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move these inside blake2s namespace
THROW_ICICLE_ERR( | ||
IcicleError_t::InvalidArgument, | ||
"Hash max preimage length does not match merkle tree arity multiplied by digest elements"); | ||
// if (compression.preimage_max_length < tree_config.arity * tree_config.digest_elements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just delete those at this point
__device__ __forceinline__ void cuda_blake2s_init_state(cuda_blake2s_ctx_t* ctx) | ||
{ | ||
memcpy(ctx->state, ctx->chain, BLAKE2S_CHAIN_LENGTH); | ||
// ctx->state[8] = ctx->t0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these commented?
return a; | ||
} | ||
|
||
__device__ uint32_t cuda_blake2s_ROTR32(uint32_t a, uint8_t b) { return (a >> b) | (a << (32 - b)); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth to add __inline__
cudaMalloc(&cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch); | ||
assert(keylen <= 32); | ||
|
||
// CUDA_BLAKE2S_CTX ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be removed
WORD block = (n_batch + thread - 1) / thread; | ||
kernel_blake2s_hash<<<block, thread>>>(cuda_indata, inlen, cuda_outdata, n_batch, BLAKE2S_BLOCK_SIZE); | ||
cudaMemcpy(out, cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch, cudaMemcpyDeviceToHost); | ||
cudaDeviceSynchronize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation does not support async. All of our other primitives do. So maybe worth adding HashConfig as an input and changing all the functions to their async alternatives
kernel_blake2s_hash<<<block, thread>>>(cuda_indata, inlen, cuda_outdata, n_batch, BLAKE2S_BLOCK_SIZE); | ||
cudaMemcpy(out, cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch, cudaMemcpyDeviceToHost); | ||
cudaDeviceSynchronize(); | ||
cudaError_t error = cudaGetLastError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use our error-management functions (you can find an example in any of our primitives)
Describe the changes
adds Blake2s cuda capability.