Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stray .nfs file preventing deletion of waveforms extension folder #3348

Open
jonahpearl opened this issue Aug 28, 2024 · 11 comments
Open

Stray .nfs file preventing deletion of waveforms extension folder #3348

jonahpearl opened this issue Aug 28, 2024 · 11 comments
Labels
core Changes to core module

Comments

@jonahpearl
Copy link
Contributor

jonahpearl commented Aug 28, 2024

Hi all — I'm on a Linux / NFS system (CentOS 7 specifically, on my school's HPCC), and I encounter an OSError every time I try to re-compute the waveforms extension of a sorting analyzer. The core problem is that there's a stray .nfs temporary file hanging out in the waveforms folder. If I do ls -a [that folder] I can see it clearly, it's always something like .nfs9eb83351caf602ec00005734.

I can see that it's python using the file with lsof:

COMMAND   PID    USER   FD   TYPE DEVICE  SIZE/OFF                 NODE NAME
python  30135 jop9552  mem    REG   0,45 128896688 11436947680097862380 [...]/analyzer_folder/extensions/waveforms/.nfs9eb83351caf602ec00005734
python  30135 jop9552   93u   REG   0,45 128896688 11436947680097862380 [...]/analyzer_folder/extensions/waveforms/.nfs9eb83351caf602ec00005734

and if I call kill on that process, my notebook's kernel crashes, so I'm pretty sure it's coming from spikeinterface.

The offending function is at /spikeinterface/core/sortinganalyzer.py:2090: shutil.rmtree(extension_folder).

Traceback
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[6], line 1
----> 1 analyzer.compute("waveforms")

File ~/datta-lab/spikeinterface/src/spikeinterface/core/sortinganalyzer.py:1155, in SortingAnalyzer.compute(self, input, save, extension_params, verbose, **kwargs)
 1108 """
 1109 Compute one extension or several extensiosn.
 1110 Internally calls compute_one_extension() or compute_several_extensions() depending on the input type.
 (...)
 1152 
 1153 """
 1154 if isinstance(input, str):
-> 1155     return self.compute_one_extension(extension_name=input, save=save, verbose=verbose, **kwargs)
 1156 elif isinstance(input, dict):
 1157     params_, job_kwargs = split_job_kwargs(kwargs)

File ~/datta-lab/spikeinterface/src/spikeinterface/core/sortinganalyzer.py:1232, in SortingAnalyzer.compute_one_extension(self, extension_name, save, verbose, **kwargs)
 1229     assert ok, f"Extension {extension_name} requires {dependency_name} to be computed first"
 1231 extension_instance = extension_class(self)
-> 1232 extension_instance.set_params(save=save, **params)
 1233 if extension_class.need_job_kwargs:
 1234     extension_instance.run(save=save, verbose=verbose, **job_kwargs)

File ~/datta-lab/spikeinterface/src/spikeinterface/core/sortinganalyzer.py:2053, in AnalyzerExtension.set_params(self, save, **params)
 2047 """
 2048 Set parameters for the extension and
 2049 make it persistent in json.
 2050 """
 2051 # this ensure data is also deleted and corresponf to params
 2052 # this also ensure the group is created
-> 2053 self._reset_extension_folder()
 2055 params = self._set_params(**params)
 2056 self.params = params

File ~/datta-lab/spikeinterface/src/spikeinterface/core/sortinganalyzer.py:2028, in AnalyzerExtension._reset_extension_folder(self)
 2026     extension_folder = self._get_binary_extension_folder()
 2027     if extension_folder.is_dir():
-> 2028         shutil.rmtree(extension_folder)
 2029     extension_folder.mkdir(exist_ok=False, parents=True)
 2031 elif self.format == "zarr":

File ~/miniconda3/envs/spikeinterface/lib/python3.9/shutil.py:740, in rmtree(path, ignore_errors, onerror)
  738         os.rmdir(path)
  739     except OSError:
--> 740         onerror(os.rmdir, path, sys.exc_info())
  741 else:
  742     try:
  743         # symlinks to directories are forbidden, see bug #1669

File ~/miniconda3/envs/spikeinterface/lib/python3.9/shutil.py:738, in rmtree(path, ignore_errors, onerror)
  736     os.close(fd)
  737     fd_closed = True
--> 738     os.rmdir(path)
  739 except OSError:
  740     onerror(os.rmdir, path, sys.exc_info())

OSError: [Errno 39] Directory not empty: '[...]/analyzer_folder/extensions/waveforms'

MRE:

recording = si.load_extractor([any recording])
sorting = si.read_sorter_folder([any sorting])
analyzer = si.create_sorting_analyzer(
    sorting=sorting, 
    recording=recording, 
    format='binary_folder', 
    folder="./tmp_analyzer",
)
to_compute = ["random_spikes", "waveforms"]
analyzer.compute(to_compute)
analyzer.compute("waveforms")  # tries to reset folder before recomputing but fails at deletion

Not sure how to go about debugging this but any advice is welcome. This is v0.101.0. Thanks!

@zm711
Copy link
Collaborator

zm711 commented Aug 28, 2024

What is the OSError that is actually given :) I don't see it in the issue.

@h-mayorquin
Copy link
Collaborator

It is probably in the recording and sorting that keeps a reference file handle. Probably related to:

#3295

@jonahpearl
Copy link
Contributor Author

What is the OSError that is actually given :) I don't see it in the issue.

Added the traceback :)

@jonahpearl
Copy link
Contributor Author

Also the same issue happens if I load sorting output from mountainsort5, so I don't think it's related to KS4 as being discussed in #3295.

@zm711
Copy link
Collaborator

zm711 commented Aug 28, 2024

What is the actually context of this use (ie I could imagine recomputing waveforms later, but why do compute(waveforms) compute(waveforms) twice in a row?)? Does the error still happen if you do:

analyzer.compute(['random_spikes', 'waveforms', 'templates'])

The waveforms file could be left open I guess during writing, so I'm wondering if computing another extensions right after waveforms ensures the file is closed.

@zm711
Copy link
Collaborator

zm711 commented Aug 28, 2024

Looking here
https://github.com/SpikeInterface/spikeinterface/blob/007b6efd0e675e60658199112d0541fba2b432cd/src/spikeinterface/core/waveform_tools.py#L194C45-L194C53

it seems that the memmap for writing the waveforms is open but I haven't found an explicit close yet? But I think @samuelgarcia or @alejoe91 know this part of the code base the best to know if the close is located somewhere else. But maybe this is causing your specific problem.

@jonahpearl
Copy link
Contributor Author

The context is, I'm developing code about the analyzer extensions and sometimes just end up running the same code twice :p The MRE just reproduces it for me, it isn't necessarily how one would come across it in the wild. And no, I've encountered this error with a large list of extensions being computed after waveforms.

If I compute the extensions once, then restart the kernel and reload the analyzer, I can get away with an initial re-compute of waveforms with no errors. But then on the second try of a given kernel there's usually an error, although sometimes it also takes until the third run (??).

@zm711
Copy link
Collaborator

zm711 commented Aug 28, 2024

Would you be willing to try the same with zarr. the binary_folder backend uses the memmap whereas zarr is using the shared_memory. It might be worth testing. If zarr works with no problems then at least from my perspective this would point more strongly to the waveforms.npy file not being closed.

@jonahpearl
Copy link
Contributor Author

Yup, you're exactly right. It works as expected with zarr, no OSError.

@zm711
Copy link
Collaborator

zm711 commented Aug 28, 2024

Okay, then I think we would need to close the memmap to fix that, but I think the general idea would be we create the memmap and leave it open so you can use the waveforms for other extensions without needing to reopen it./reload the data So I'm not sure if/when we could do that. We will just have to wait to see what Alessio and Sam think for this.

@samuelgarcia
Copy link
Member

Thanls for reporting and investigating.
I will have a look into this.

@zm711 zm711 added the core Changes to core module label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

No branches or pull requests

4 participants