Unable to allocate memory #84

aidinrs · 2024-06-17T01:57:56Z

It seems there is a problem with memory allocation when processing longer prompts, I have used a prompt with around 3500 tokens in LLMEval and when processing the prompt, the process hugs up to 12.5 GB of the memory initially, around 5GB of these are for the model weights which is fine, but the extra 7GB doesn't seem normal. The prompt is around 3500 tokens which means 2MB for each token! The memory usage gets lower (to 6GB) when the prompt processing phase is done. The issue gets worse when the full context is used (it hugs up to ~25GB).

I don't have this issue with llama.cpp since it just allocates the memory required for the weights with a little extra memory for the calculations.

Configuring the memory and cache limits also doesn't help, the process throws before processing.

This issue hinders running and developing applications for devices with lower than 32GB of RAM.

The text was updated successfully, but these errors were encountered:

davidkoski · 2024-06-17T03:21:17Z

Check out the details here: #17

You might want to use this:

https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/gpu/set(cachelimit:)

and set the cache limit to a few megabytes and see how that behaves.

aidinrs · 2024-06-17T03:34:43Z

@davidkoski I tried that already and it doesn't help. The initial jump in memory appears only when processing the prompt. When tokens are being generated one-by-one the memory usage is back to normal.

awni · 2024-08-19T14:02:01Z

The memory needed for long prompts scales with the square of the prompt length. So in your case: 3500 * 3500 * num_heads * 2 would be the memory used in bytes for the attention scores with a prompt length of 3500.

What were you running when it jumped to 12GB?

Also #93 should bring LLMEval up to parity with our Python counter part which can handle much longer prompts with lower memory use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to allocate memory #84

Unable to allocate memory #84

aidinrs commented Jun 17, 2024 •

edited

Loading

davidkoski commented Jun 17, 2024

aidinrs commented Jun 17, 2024

awni commented Aug 19, 2024

Unable to allocate memory #84

Unable to allocate memory #84

Comments

aidinrs commented Jun 17, 2024 • edited Loading

davidkoski commented Jun 17, 2024

aidinrs commented Jun 17, 2024

awni commented Aug 19, 2024

aidinrs commented Jun 17, 2024 •

edited

Loading