Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run ConvertTIFFTilesToN5Spark on a Dataproc cluster #40

Open
carshadi opened this issue May 13, 2022 · 1 comment
Open

Run ConvertTIFFTilesToN5Spark on a Dataproc cluster #40

carshadi opened this issue May 13, 2022 · 1 comment

Comments

@carshadi
Copy link

carshadi commented May 13, 2022

Hi there,

I am trying to run the ConvertTIFFTilesToN5Spark step on a Dataproc cluster, where the tiff tiles and json configuration file are both located in a google storage bucket.

The issue is that it fails to load the tiff tiles from the bucket. Error:

22/05/13 19:56:10 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) (convert-tiff-tiles-to-n5-spark-cluster-m.us-west1-a.c.neural-dynamics-338018.internal executor 1): java.lang.NullPointerException
	at net.imglib2.img.ImagePlusAdapter.wrapLocal(ImagePlusAdapter.java:97)
	at net.imglib2.img.ImagePlusAdapter.wrap(ImagePlusAdapter.java:74)
	at net.imglib2.img.imageplus.ImagePlusImgs.from(ImagePlusImgs.java:210)
	at org.janelia.stitching.ConvertTIFFTilesToN5Spark.convertTileToN5(ConvertTIFFTilesToN5Spark.java:207)
	at org.janelia.stitching.ConvertTIFFTilesToN5Spark.lambda$convertTilesToN5$cbf5f68e$1(ConvertTIFFTilesToN5Spark.java:161)
	at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2244)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

My tile_config.json looks like this (bucket name redacted)

[{"index": 0, "file": "gs://xxxx/spark-stitching-test/Ex_488_em_525_merged_tiffs/111590/111590_088050/000000.tiff", "position": [0.0, 0.0, 0.0], "size": [2000, 1600, 420], "pixel_resolution": [1.8, 1.8, 2.0], "type": "GRAY16"}, 
{"index": 1, "file": "gs://xxxx/spark-stitching-test/Ex_488_em_525_merged_tiffs/111590/111590_088050/008400.tiff", "position": [0.0, 0.0, 420.0], "size": [2000, 1600, 420], "pixel_resolution": [1.8, 1.8, 2.0], "type": "GRAY16"}, 
{"index": 2, "file": "gs://xxxx/spark-stitching-test/Ex_488_em_525_merged_tiffs/111590/111590_088050/016800.tiff", "position": [0.0, 0.0, 840.0], "size": [2000, 1600, 420], "pixel_resolution": [1.8, 1.8, 2.0], "type": "GRAY16"}

My job looks like this

image

It appears that the GoogleCloudDataProvider simply calls IJ.openImage() on the Tiff path, without downloading the blob to a temporary directory first. Am I correct in assuming that the way I'm running this isn't supported? Or do I just need to format things differently?

public static ImagePlus openImage( final String path )
{
if ( path.endsWith( ".tif" ) || path.endsWith( ".tiff" ) )
{
final ImagePlus imp = IJ.openImage( path );
if ( imp != null )
Utils.workaroundImagePlusNSlices( imp );
return imp;
}

Thank you!

@carshadi
Copy link
Author

carshadi commented May 13, 2022

Update: I got it to work by adding the following to

@Override
public ImagePlus loadImage( final String link ) throws IOException
{
if ( link.endsWith( ".tif" ) || link.endsWith( ".tiff" ) )
return ImageImporter.openImage( link );
throw new NotImplementedException( "Only TIFF images are supported at the moment" );
}

	@Override
	public ImagePlus loadImage( final String link ) throws IOException
	{
		if ( link.endsWith( ".tif" ) || link.endsWith( ".tiff" ) )
		{
			if (link.startsWith("gs:"))
			{
				Path tempPath = null;
				ImagePlus imp = null;
				try
				{
					tempPath = Files.createTempFile( null, ".tif");
					final GoogleCloudStorageURI googleCloudUri = new GoogleCloudStorageURI( link );
					final Blob blob = storage.get( BlobId.of( googleCloudUri.getBucket(), googleCloudUri.getKey() ) );
					blob.downloadTo(tempPath);
					imp = ImageImporter.openImage(tempPath.toString());
				}
				finally
				{
					if ( tempPath != null )
						tempPath.toFile().delete();
				}
				return imp;
			}
			else 
			{
				return ImageImporter.openImage( link );
			}
		}
		throw new NotImplementedException( "Only TIFF images are supported at the moment" );
	}

Please let me know if there are any anticipated issues

Thanks,
Cameron

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant