Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve child handling #140

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

rokroskar
Copy link

This pull request is an extension to #118 and #134 and tries to improve the ability to analyze child process memory consumption.

Changes/additions:

  • track child processes by pid instead of sequentially -- this is to ensure that we can properly track children of processes where the parent might continuously spawn many short-lived children
  • add a convert_mem_usage_to_df function which produces a pandas.DataFrame from a list returned by memory_usage for easier plotting and slicing
  • add the ability of mprof to monitor an existing process by providing a pid

Side-effects:

  • move the read_mprofile_file function to memory_profiler so that it can be used programatically
  • add a read_mprofile_file_multiprocess function that reads a mprofile file and returns a list of timings identical to what you expect from memory_usage
  • fix bug in memory_usage #139
  • fix bug in plot_file where plotting would fail when number of children > 6
  • add a timeout flag to mprof

@fabianp
Copy link
Collaborator

fabianp commented Apr 7, 2017

Thanks for the pull request!. This looks good, but give me a couple of days to look into it as I'm currently travelling

@fabianp
Copy link
Collaborator

fabianp commented Apr 11, 2017

Why is pandas needed? Does this change the quality of the plots? I would like to maintain dependencies to the strict minimum ...

@fabianp
Copy link
Collaborator

fabianp commented Apr 11, 2017

(numpy is fine since its required by matplotlib anyway)

@rokroskar
Copy link
Author

Right, pandas is not strictly needed and I also hesitate making another dependency. IMO the usability of the results is improved by putting them into a DataFrame. The user can then select only the PIDs they want, they can get averages, etc. It's a simple way to allow for a bit more customization.

On the other hand, it's not really a dependency, since the only function that needs pandas is convert_mem_usage_to_df and one could add a try/except there and alert the user that pandas is needed for that function. Another option would be to just return the numpy matrix that I generate along with the mapping of x and y coordinates to times and pids, but that's not as user-friendly.

@rokroskar
Copy link
Author

oh oops, I guess I already added the try/except in my last commit...

@fabianp
Copy link
Collaborator

fabianp commented Apr 12, 2017 via email

@eamars
Copy link

eamars commented Feb 14, 2019

Hello Guys
Thanks for your contribution! I find categorizing child process is extremely helpful. However the last change were committed in 2017. Is this issue going to be closed or should have been merged? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug in memory_usage
3 participants