Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract to scalar OOM error #28

Open
hanslovsky opened this issue Aug 8, 2019 · 4 comments
Open

Extract to scalar OOM error #28

hanslovsky opened this issue Aug 8, 2019 · 4 comments
Labels
bug Something isn't working

Comments

@hanslovsky
Copy link
Contributor

When running extract-to-scalar to save into an HDF file, this error may occur (have not reproduced yet):

19/08/08 16:49:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 1,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:717)
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
	at ch.systemsx.cisd.hdf5.HDF5BaseWriter.setupSyncThread(HDF5BaseWriter.java:170)
	at ch.systemsx.cisd.hdf5.HDF5BaseWriter.<init>(HDF5BaseWriter.java:165)
	at ch.systemsx.cisd.hdf5.HDF5WriterConfigurator.writer(HDF5WriterConfigurator.java:133)
	at ch.systemsx.cisd.hdf5.HDF5FactoryProvider$HDF5Factory.open(HDF5FactoryProvider.java:48)
	at ch.systemsx.cisd.hdf5.HDF5Factory.open(HDF5Factory.java:47)
	at org.janelia.saalfeldlab.n5.hdf5.N5HDF5Writer.<init>(N5HDF5Writer.java:92)
	at org.janelia.saalfeldlab.label.spark.N5Helpers.n5Writer(N5Helpers.java:38)
	at org.janelia.saalfeldlab.conversion.ExtractHighestResolutionLabelDataset$Args.lambda$call$49828ecb$1(ExtractHighestResolutionLabelDataset.java:101)
	at org.janelia.saalfeldlab.conversion.ExtractHighestResolutionLabelDataset.lambda$extract$a90bce4d$1(ExtractHighestResolutionLabelDataset.java:248)
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:351)
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:351)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:921)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:921)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

It looks very suspicious that a lot of threads would be created.

@hanslovsky
Copy link
Contributor Author

Thanks to @wangyuhan01 for raising saalfeldlab/paintera#293 initially.

@hanslovsky
Copy link
Contributor Author

The process creates tons of HDF5 Sync threads.

E.g. in one test case I observed as many as 5403(!!!!!!!!!) threads created by the HDF5 library. This is probably an issue with the underlying hdf5 library and how readers are created. Maybe each reader creates a sync thread. @igorpisarev have you ever observed anything like this in any of your spark jobs when converting to hdf5?

Screenshot_20190809_132004

@igorpisarev
Copy link
Contributor

That indeed looks very strange. I haven't really used hdf5 for conversion except for only a couple of experiments, but I don't recall seeing this, or at least it wasn't a problem.

Does the converter here write into HDF5? I though that it always writes out N5, but can take HDF as an input.

@hanslovsky
Copy link
Contributor Author

@igorpisarev This does not affect the paintera-conversion-helper command that always converts into N5 as you say. Instead this is with regard to the extract-to-scalar command that extracts the highest resolution label mipmap dataset into a scalar dataset in an N5-like container.

One of the use cases is extraction into an h5 container for easier processing in some scenarios, e.g. Python.

I will try with the n5-spark utilities to see if similar issues occur there. I think that this is an issue with the upstream hdf5 library that creates a new thread with every single hdf5 reader/writer. A solution would probably be to to multiple blocks at the same time to not create as many writers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants