Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage: Daphne loading all the file in memory (POST request) #483

Open
cpina opened this issue Sep 26, 2023 · 7 comments
Open

Memory usage: Daphne loading all the file in memory (POST request) #483

cpina opened this issue Sep 26, 2023 · 7 comments

Comments

@cpina
Copy link

cpina commented Sep 26, 2023

I am using Debian packaged versions (Debian 12 Bullseye) using:

  • daphne 4.0.0
  • django 3.2.19
  • python 3.11.2

The problem happens when using Django with daphne (debugging with runserver or in production with Nginx in front of it).

For testing purposes I have a form in the application accepting a 4 GB file (with Nginx accepting this file size). This is to make it more visible.

When the request makes it to daphne in (Twisted and Python's cgi.py are not reading it into memory for what I can see, they use a temporary file or passing it into daphne without full read):
https://github.com/django/daphne/blob/4.0.0/daphne/http_protocol.py#L188

Daphne keeps reading the 4 GB and adding it to the queue in 8 KB chunks (the queue was created without any max_size).

If I use uvicorn (or gunicorn with uvicorn workers) the problem does not happen: no memory change POSTing a 4 GB file. If I use runserver without Daphne's application it does not happen either.

@carltongibson
Copy link
Member

This has been this way since day 1. Can you find the corresponding bit in unicorn? (The protocol need to pass the more_body flag is the key here...)

@cpina
Copy link
Author

cpina commented Sep 26, 2023

Sure!

I'll add links to the places. I (think) I see the problem (but not the solution in Daphne). Hopefully this will help to at least add some breakpoints in some places and POST a file :-)

If using daphne:

In ASGIHandler.read_body() (https://github.com/django/django/blob/stable/3.2.x/django/core/handlers/asgi.py#L175) receive() calls asyncio.Queue.get() . In my original comment I linked a few lines before the self.application_queue.put_nowait() (see https://github.com/django/daphne/blob/4.0.0/daphne/http_protocol.py#L196). So daphne seems to read all the body (in chunks) and add it all into the queue in chunks. Then, if using daphne, read_body() dequeues it from memory.

If using uvicorn:

In ASGIHandler.read_body() (https://github.com/django/django/blob/stable/3.2.x/django/core/handlers/asgi.py#L175) receive() calls RequestResponseCycle.receive() (https://github.com/encode/uvicorn/blob/0.17.0/uvicorn/protocols/http/h11_impl.py#L492)
.
I haven't properly understood this yet but on each call of RequestResponseCycle.receive() there is also a call to https://github.com/encode/uvicorn/blob/0.17.0/uvicorn/protocols/http/h11_impl.py#L127 (H11Protocol.data_received(), note that H11Protocol instantiates RequestResponseCycle) (this is done via a run_forever and actually comes from asyncio/selector_events/_SelectorSocketTransport._read_ready(). So, it seems that data is read under demand from Django as it keeps arriving.

I don't think that I have enough bandwidth at the moment to get enough familiar with Daphne code and fix it properly (unless I'm wrong this seems that might need quite lots of Daphne code changes? Do you think so?). Am I missing something obvious in Daphne that could provide a fix?

What I could maybe do the next days / weekend? is to write a very simple Django app (perhaps inspired by https://adamj.eu/tech/2020/10/15/a-single-file-rest-api-in-django/) that help reproducing the problem, if this would help.

@carltongibson
Copy link
Member

What I'm not clear on is how the protocol server is meant to pass the file to the application without reading it. Both have to do that it seems to me. (Make sure that you have Django set to spool to disk, but I assume you're using the same Django settings for both servers.)

Happy to look at what you discover.

@carltongibson
Copy link
Member

Also I'd update Django.

@cpina
Copy link
Author

cpina commented Sep 27, 2023

What I'm not clear on is how the protocol server is meant to pass the file to the application without reading it. Both have to do that it seems to me. (Make sure that you have Django set to spool to disk, but I assume you're using the same Django settings for both servers.)

I haven't understood well the code (I will try next days/weeks, I need to do some other things first). I think that last night I saw that uvicorn uses some async methods so it reads only what is passed to the application instead of reading everything.

When I find more (or better) findings I'll write them here.

Also I'd update Django.

For what I saw: I'm pretty sure that daphne code reads everything (holds in memory) before Django is involved. Then Django process it.

Same Django settings in both cases (just launching Django differently).

@BenjaminXT
Copy link

@carltongibson I meet the same problem,I am not good at English, but I hope I can describe the problem clearly.

In "http_protocol.py", "daphne" init "http.Request" in Class "WebRequest",the "http.Request" will call "cgi.parse_multipart" method,this will load whole file in memory, and the "args" seems not used, "ASGIHandler" will read the "content" again and parse the 'body'. so, think about overwrite the behavior of 'http.Request' in Class "WebRequest".

@carltongibson
Copy link
Member

As per this comment on the asgiref repo, I think the behaviour here is just required by the spec.

django/asgiref#66 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants