Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatfield Correction memory requirements #42

Open
carshadi opened this issue Jun 2, 2022 · 0 comments
Open

Flatfield Correction memory requirements #42

carshadi opened this issue Jun 2, 2022 · 0 comments

Comments

@carshadi
Copy link

carshadi commented Jun 2, 2022

Hello,

My dataset is composed of 1200 tiles of shape [2000,1600,105], unsigned 16-bit ints. Each tile is ~640MB, and the total dataset is 768GB.

I had tried several Dataproc cluster configurations but would always run out of memory before the job finished.
Here is the log from one such failed run:

Working interval is at [0, 0, 0] of size [2000, 1600, 105]
Working with stack of size 1120
Output directory: gs://xxxx/spark-stitching-test/tile_config-flatfield/fullsize/solution
Running flatfield correction script in 3D mode
Histogram intensity range: min=0.0, max=596.0
Background intensity value: 2.0
Binning the input stack and saving as N5 blocks...
22/05/31 09:36:11 ERROR org.apache.spark.scheduler.AsyncEventQueue: Dropping event from queue eventLog. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.
22/05/31 09:36:11 WARN org.apache.spark.scheduler.AsyncEventQueue: Dropped 1 events from eventLog since the application started.
Collected reference histogram of size 258 (first and last bins are tail bins):
[0.0, 437.7783683452381, 81.77367467857142, 46.8757134047619, 47.477869, 24.27659158333333, 20.217159428571428, 24.67996761904762, 13.484565666666667, 11.611294119047619, 14.610392464285715, 8.293473321428571, 7.379666666666667, 9.752576595238095, 5.83438805952381, 5.138283369047619, 6.672545178571428, 3.938613880952381, 3.604872380952381, 4.9050658333333335, 2.9963505, 2.8183993095238096, 3.935544, 2.454618130952381, 2.342203142857143, 3.3578584166666667, 2.1719595833333334, 2.1521068928571427, 3.1912209285714286, 2.0808182619047617, 2.057839535714286, 3.0716244761904763, 2.0486201904761905, 2.0560531785714287, 3.1066629285714287, 2.091556119047619, 2.1118542023809526, 3.2133971071428573, 2.176664880952381, 2.2079175714285713, 3.376959154761905, 2.293583488095238, 2.3201689880952383, 3.5095552261904763, 2.3501815, 2.354911095238095, 3.5369314166666665, 2.358748130952381, 2.3579966904761904, 3.5322006071428573, 2.350168214285714, 2.3450787976190477, 3.5074072857142857, 2.330146857142857, 2.3218056666666667, 3.4597665714285712, 2.2850581785714286, 2.2618299404761903, 3.343573261904762, 2.1940888095238096, 2.165262333333333, 3.1936283452380954, 2.092879726190476, 2.0641206666666667, 2.0355645, 3.0004284285714284, 1.9654259047619047, 1.9381174285714287, 2.8543376904761906, 1.866112130952381, 1.835421869047619, 2.6926035119047618, 1.7538462976190476, 1.7213971547619047, 2.52253375, 1.6434050476190476, 1.6141225714285714, 2.367681869047619, 1.544810738095238, 1.5184377976190475, 2.2299729642857145, 1.4552315119047619, 1.4303141071428571, 2.099422119047619, 1.3690689404761904, 1.3446135714285714, 1.971852369047619, 1.285029369047619, 1.2631100238095239, 1.854925011904762, 1.2108501428571428, 1.1915200595238096, 1.752440119047619, 1.1456031309523809, 1.127990380952381, 1.658482880952381, 1.0839164285714287, 1.0666084642857143, 1.567604988095238, 1.0240427142857143, 1.0078844166666667, 1.4823280357142856, 0.969960880952381, 0.9559713928571428, 1.4111897142857144, 0.9262692261904761, 0.91561625, 1.3544762023809525, 0.890556880952381, 0.8807474166666667, 1.3031114047619048, 0.8570969404761904, 0.8480894404761905, 1.2557332261904761, 0.8268718095238096, 0.8195214285714286, 1.2149739404761906, 0.8017395833333333, 0.7954338333333333, 1.1819282976190477, 0.7809344761904762, 0.7757366904761904, 1.154225892857143, 0.7637096428571428, 0.7590340238095238, 1.1298775833333334, 0.7474632142857143, 0.7429755833333334, 0.7380995952380952, 1.0983775119047618, 0.7262675357142857, 0.7218658214285715, 1.074690619047619, 0.7112021190476191, 0.7072245, 1.0541304523809525, 0.6980834047619048, 0.6944824642857143, 1.0356507261904762, 0.6859677023809524, 0.6826705119047619, 1.0172985833333332, 0.6741086428571429, 0.6706799523809523, 0.9999159642857143, 0.6627480238095238, 0.6596391547619047, 0.9839666190476191, 0.6525576785714285, 0.6499785, 0.9700708690476191, 0.6436101428571429, 0.6411394047619048, 0.9570107976190476, 0.6348474285714286, 0.6324135357142857, 0.9440853333333333, 0.6262217380952381, 0.6238707380952381, 0.9315843095238096, 0.6182572619047619, 0.6162029523809524, 0.9202849523809524, 0.6108643452380952, 0.6087792261904762, 0.909237630952381, 0.6035397023809523, 0.6010273333333334, 0.8971875238095238, 0.5952812619047619, 0.5925989523809524, 0.8841085595238095, 0.5863143928571428, 0.5834745238095238, 0.8707290238095238, 0.5772933214285715, 0.5747278333333333, 0.8573223571428571, 0.5682029047619047, 0.5656684404761905, 0.84311375, 0.5587734285714285, 0.5557833214285715, 0.8280252023809523, 0.5484250476190476, 0.5450879761904762, 0.81207725, 0.5374864880952381, 0.5344990833333333, 0.7961647142857143, 0.5270869523809524, 0.5241008214285714, 0.5211760833333333, 0.7760217142857143, 0.5136853928571429, 0.5109437380952381, 0.7608697261904762, 0.5039475833333333, 0.5011205595238095, 0.7468406666666667, 0.49464491666666666, 0.49217063095238095, 0.7338129523809523, 0.4861535, 0.48381580952380954, 0.7214703452380953, 0.47814615476190475, 0.476042619047619, 0.7102251071428571, 0.47076435714285714, 0.4687870238095238, 0.6995748452380952, 0.4640189880952381, 0.46221634523809524, 0.6898104880952382, 0.45763016666666667, 0.4558778095238095, 0.6806325357142857, 0.4518012619047619, 0.4501145357142857, 0.6721814761904762, 0.4463145, 0.44457219047619045, 0.6640260595238096, 0.440697380952381, 0.43915423809523807, 0.6558015952380952, 0.43531659523809524, 0.4337336785714286, 0.6476901428571429, 0.4297576904761905, 0.4282166785714286, 0.6391359642857143, 0.4241249047619048, 0.42246446428571427, 0.6305123809523809, 0.4181372857142857, 0.41641934523809526, 0.6212703928571428, 0.4119727976190476, 0.4100477261904762, 0.6116464404761904, 0.40528240476190475, 0.40338982142857144, 0.6014080714285714, 0.3984663333333333, 0.3963685476190476, 0.5907914642857143, 0.39122344047619045, 0.38909815476190474, 0.5797881190476191, 0.3837952261904762, 0.38173467857142857, 0.5684315595238095, 0.37630688095238096, 0.37413361904761905, 0.5570520952380953, 56.52632755952381]

Solving for scale 6:  size=[31, 25, 2],  model=AffineModel, regularizer=IdentityModel
Solving for scale 5:  size=[63, 50, 3],  model=AffineModel, regularizer=AffineModel
Solving for scale 4:  size=[125, 100, 7],  model=AffineModel, regularizer=AffineModel
Solving for scale 3:  size=[250, 200, 13],  model=AffineModel, regularizer=AffineModel
Solving for scale 2:  size=[500, 400, 26],  model=FixedScalingAffineModel, regularizer=AffineModel
Solving for scale 1:  size=[1000, 800, 53],  model=FixedScalingAffineModel, regularizer=AffineModel
Solving for scale 0:  size=[2000, 1600, 105],  model=FixedScalingAffineModel, regularizer=AffineModel
22/05/31 09:58:34 INFO org.sparkproject.jetty.server.AbstractConnector: Stopped Spark@16073fa8{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at net.imglib2.img.basictypeaccess.array.AbstractDoubleArray.<init>(AbstractDoubleArray.java:50)
	at net.imglib2.img.basictypeaccess.array.DoubleArray.<init>(DoubleArray.java:47)
	at net.imglib2.img.basictypeaccess.array.DoubleArray.createArray(DoubleArray.java:58)
	at net.imglib2.img.basictypeaccess.array.DoubleArray.createArray(DoubleArray.java:43)
	at net.imglib2.img.array.ArrayImgFactory.create(ArrayImgFactory.java:91)
	at net.imglib2.img.array.ArrayImgFactory.create(ArrayImgFactory.java:68)
	at net.imglib2.img.array.ArrayImgs.doubles(ArrayImgs.java:558)
	at org.janelia.flatfield.FlatfieldCorrectionSolver.unpivotSolution(FlatfieldCorrectionSolver.java:414)
	at org.janelia.flatfield.FlatfieldCorrection.run(FlatfieldCorrection.java:391)
	at org.janelia.flatfield.FlatfieldCorrection.run(FlatfieldCorrection.java:195)
	at org.janelia.flatfield.FlatfieldCorrection.main(FlatfieldCorrection.java:80)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The configuration that finally worked:

  • 1 m1-megamem-96 (96vcpus, 1.4TB RAM) node as master/worker.
  • 500GB SCSI standard persistent disk
  • 1 local 375GB NVMe disk

I did not change any yarn / spark cluster or job properties.

The job took 25.1hrs to run.

From the executors page, it shows peak JVM on-heap memory up to ~60GB per executor (full disclosure, I don't have a great idea about what these metrics mean).

executors

With 8 cores per executor, that's a ~8GB minimum requirement per core.
That gives us 8 * 96 = 768GB required memory, which is the size of my full dataset.
Is this expected in the general case? Does it depend on the number of cores used?

Thank you,
Cameron

P.S. is this step mandatory, or can I just skip to the stitching after converting the input tiles to N5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant