Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging while computing principal_components #3331

Open
jonahpearl opened this issue Aug 23, 2024 · 6 comments
Open

Hanging while computing principal_components #3331

jonahpearl opened this issue Aug 23, 2024 · 6 comments
Labels
concurrency Related to parallel processing

Comments

@jonahpearl
Copy link
Contributor

jonahpearl commented Aug 23, 2024

Similar to #2689, I'm having an issue where computing the principal_component quality metric hangs at 0% on Linux when run as part of a script. Similar to that issue, it seems to require multiple parallel computations to occur; if I quit the hung process and re-run it, the PCA gets computed no problem and everything runs smoothly from there.

Unlike that issue, appending MKL_THREADING_LAYER=TBB in front of the call to my script didn't help (at least, not when passed through SLURM).

Attached is my conda env export — you can see that BLAS / MKL is there, but when I followed chatGPT's advice to check if this was being used in numpy / scipy, nothing came up:

import numpy as np
import scipy
print("NumPy configuration:")
np.__config__.show()
print("\nSciPy configuration:")
scipy.__config__.show()

output was:

  blas:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas64
    openblas configuration: USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.23.dev

I'm going to start debugging by cloning my conda env, and trying to force the clone to not use mkl with conda install nomkl numpy scipy scikit-learn numexpr (again, ht chatGPT). If that doesn't work, I guess it could copy #2689 and try switching to joblib in certain parts of the code...other suggestions and ideas welcome :) thanks!

@alejoe91
Copy link
Member

@jonahpearl

Can you try with this?

os.environ["OPENBLAS_NUM_THREADS"] = "1"

@alejoe91 alejoe91 added the concurrency Related to parallel processing label Aug 27, 2024
@jonahpearl
Copy link
Contributor Author

jonahpearl commented Aug 27, 2024

Thanks for the suggestion — unfortunately, still freezes. Here are the results from some work I've done trying to narrow down the cause. It seems to me that something about run_sorter is causing the subsequent parallelized PCA function to hang, but I have no clue as to why.

My test script consisted of three steps:

  • preprocessing + writing to a binary file
  • sorting with ks4 + saving
  • running the sorting analyzer in memory, with n_jobs=4, with the final step being principal_components, which would just get stuck at 0% completion.
Condition Outcome
1. Base test script ❌ hangs (at PCA)
2. Skip pre-processing and just load the extractor ❌ hangs (at PCA)
3. 2 and switch to TDC2 with n_jobs=4 ❌ hangs, but at the split_clusters with local_feature_clustering part of TDC2
3. 2 and switch to TDC2 with n_jobs=1 ❌ hangs at PCA step again
4. 2 and switch to MT5 / Scheme 1 ❌ hangs (at PCA, regardless of n_jobs into MT5)
5. 2 and skip ks4 and just load the sorting output ✅ runs
6. 2 and skip TDC2 and just load the sorting output ✅ runs
7. 2 and skip MT5 and just load the sorting output ✅ runs
8. 2 and just load the sorting output AND run the sorting analyzer twice in a row ✅ runs
9. 2 and force n_jobs=1 everywhere ✅ runs but with a memory leak warning

I confirmed that this behavior is still the case with both these env vars set:

os.environ['NUMEXPR_MAX_THREADS'] = '1'
os.environ["OPENBLAS_NUM_THREADS"] = "1"

Here is the memory leak warning (I only get this with n_jobs=1 everywhere):

Exception ignored in: <function SharedMemoryRecording.__del__ at 0x7fb43b6cbdc0>
Traceback (most recent call last):
  File "/home/jop9552/datta-lab/spikeinterface/src/spikeinterface/core/numpyextractors.py", line 207, in __del__
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/multiprocessing/shared_memory.py", line 240, in unlink
ImportError: sys.meta_path is None, Python is likely shutting down
/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I poked around in the run_sorter code, but I really don't see anything suggestive of setting up a multiprocessing pool, so I'm confused as to how something like ks4 or mt5/scheme1, which ostensibly don't seem to use mp, could have this effect. I don't think it's simply multiple calls to a multiprocessing context based on test 8, which seems to indicate that the PCA function itself isn't the problem, but rather something happening upstream of it.

Let me know if you have other debugging ideas! Going to try the nomkl bit now but not optimistic.

@jonahpearl
Copy link
Contributor Author

jonahpearl commented Aug 27, 2024

If it helps, this is what the traceback looks like when I keyboard interrupt the hung PCA:

Traceback
Traceback (most recent call last):
  File "/home/jop9552/datta-lab/spikeinterface/src/spikeinterface/postprocessing/principal_component.py", line 446, in _fit_by_channel_local
    for chan_ind, pca_model_updated in results:
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/concurrent/futures/_base.py", line 441, in result
    self._condition.wait(timeout)
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/threading.py", line 312, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jop9552/Jonah/20231003_vlPAG_npx/20240826_bug_testing_np_pca.py", line 216, in <module>
    main()
  File "/home/jop9552/Jonah/20231003_vlPAG_npx/20240826_bug_testing_np_pca.py", line 169, in main
    analyzer.compute(ext, **extensions_to_compute[ext], **job_kwargs)
  File "/home/jop9552/datta-lab/spikeinterface/src/spikeinterface/core/sortinganalyzer.py", line 1150, in compute
    return self.compute_one_extension(extension_name=input, save=save, verbose=verbose, **kwargs)
    extension_instance.run(save=save, verbose=verbose, **job_kwargs)
  File "/home/jop9552/datta-lab/spikeinterface/src/spikeinterface/core/sortinganalyzer.py", line 1936, in run
    self._run(**kwargs)
  File "/home/jop9552/datta-lab/spikeinterface/src/spikeinterface/postprocessing/principal_component.py", line 321, in _run
    pca_models = self._fit_by_channel_local(n_jobs, progress_bar)
  File "/home/jop9552/datta-lab/spikeinterface/src/spikeinterface/postprocessing/principal_component.py", line 447, in _fit_by_channel_local
    pca_models[chan_ind] = pca_model_updated
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/concurrent/futures/_base.py", line 637, in __exit__
    self.shutdown(wait=True)
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/concurrent/futures/process.py", line 767, in shutdown
    self._executor_manager_thread.join()
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/threading.py", line 1060, in join
    self._wait_for_tstate_lock()
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/threading.py", line 1080, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/jop9552/miniconda3/envs/spikeinterface/lib/python3.9/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

@jonahpearl
Copy link
Contributor Author

Ah, and — if I force multiprocessing to use spawn instead of fork, it seems to avoid the hang. However, everything runs painfully slowly. The PCA moves at ~3.5 s/iter instead of 72 iter/s with 1 core and 8 iter/s with 4 cores (nb this is a tiny recording, hence the faster pca with 1 core).

@samuelgarcia
Copy link
Member

@alejoe91 : we should use the threadpool_limits in _fit_by_channel_local no ?

@alejoe91
Copy link
Member

yes but that is very tricky given the current implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concurrency Related to parallel processing
Projects
None yet
Development

No branches or pull requests

3 participants