Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory Leak in SVM #6057

Open
harrism opened this issue Sep 4, 2024 · 3 comments
Open

[BUG] Memory Leak in SVM #6057

harrism opened this issue Sep 4, 2024 · 3 comments
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@harrism
Copy link
Member

harrism commented Sep 4, 2024

Describe the bug
There are allocations with no matching deallocation in svm/sparse_util.cuh.

*indptr_out = (int*)rmm_alloc->allocate((num_indices + 1) * sizeof(int), stream);
*nnz = computeIndptrForSubset(indptr_in, *indptr_out, row_indices, num_indices, stream);
// allocate indices, data
*indices_out = (int*)rmm_alloc->allocate(*nnz * sizeof(int), stream);
*data_out = (math_t*)rmm_alloc->allocate(*nnz * sizeof(math_t), stream);
// copy with 1 warp per row for now, blocksize 256
const dim3 bs(32, 8, 1);
const dim3 gs(raft::ceildiv(num_indices, (int)bs.y), 1, 1);
extractCSRRowsFromCSR<math_t><<<gs, bs, 0, stream>>>(
*indptr_out, *indices_out, *data_out, indptr_in, indices_in, data_in, row_indices, num_indices);

Steps/Code to reproduce bug

Inspect the linked code.

Expected behavior

See #5852. There should be no raw calls to resource::allocate(). Instead, use device containers like device_buffer.

@harrism harrism added ? - Needs Triage Need team to review and classify bug Something isn't working labels Sep 4, 2024
@mfoerste4
Copy link
Contributor

@harrism , this utility function is used to extract the support vectors and pass them back to the python world. The reason for the plain allocation is that here - just like for the dense counterpart - we don't know the dimensions in advance and therefore cannot allocate the stores from python in advance. The allocations are not really leaking as long as the new owner takes care of destruction.

AFAIK there are no data-owning containers exposed to python that would allow to be resized later on (please correct me if I am wrong or that changed). We would need something like a device_buffer representation on python side for this. If the allocation needs to be removed I would appreciate guidance on how this should be accomplished.

CC @tfeher , @divyegala

@divyegala
Copy link
Member

@mfoerste4 RMM has a python-side device_buffer object https://github.com/rapidsai/rmm/blob/branch-24.10/python/rmm/rmm/_lib/device_buffer.pyx

Would this suffice? It has resizing available.

@mfoerste4
Copy link
Contributor

Thanks @divyegala , I was not aware that we can use these right away. I will see whether it is much effort to replace it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants