Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark tensorFlow distributor workers not working #161

Open
rayjinghaolei opened this issue Jun 14, 2020 · 3 comments
Open

Spark tensorFlow distributor workers not working #161

rayjinghaolei opened this issue Jun 14, 2020 · 3 comments

Comments

@rayjinghaolei
Copy link

RayKo-MBP:spark RAY$ ./tests/integration/run.sh
Stopping spark_worker_2 ... done
Stopping spark_worker_1 ... done
Stopping spark_master_1 ... done
Removing spark_worker_2 ... done
Removing spark_worker_1 ... done
Removing spark_master_1 ... done
Removing network spark_default
Creating network "spark_default" with the default driver
Creating spark_master_1 ... done
WARNING: The "worker" service specifies a port on the host. If multiple containers for this service are created on a single host, the port will clash.
Creating spark_worker_1 ... done
Creating spark_worker_2 ... done
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.8.1, pluggy-0.13.1
rootdir: /mnt/spark-tensorflow-distributor/tests/integration, inifile: pytest.ini
collected 17 items

tests/integration/test_mirrored_strategy_runner.py No container found for worker_1
No container found for worker_2
no org.apache.spark.deploy.master.Master to stop
starting org.apache.spark.deploy.master.Master, logging to /usr/local/lib/python3.7/dist-packages/pyspark/logs/spark--org.apache.spark.deploy.master.Master-1-master.out
No container found for worker_1
No container found for worker_2
Starting worker 1
Starting worker 2
20/06/13 18:52:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/06/13 18:56:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 40 more times
20/06/13 18:56:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 39 more times
20/06/13 18:56:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 38 more times
20/06/13 18:56:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 37 more times
20/06/13 18:57:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 36 more times
20/06/13 18:57:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 35 more times
20/06/13 18:57:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 34 more times
20/06/13 18:57:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 33 more times
20/06/13 18:58:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 32 more times
20/06/13 18:58:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 31 more times
20/06/13 18:58:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 30 more times
20/06/13 18:58:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 29 more times
20/06/13 18:59:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 28 more times
20/06/13 18:59:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 27 more times
20/06/13 18:59:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 26 more times
20/06/13 18:59:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 25 more times
20/06/13 19:00:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 24 more times
20/06/13 19:00:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 23 more times
20/06/13 19:00:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 22 more times
20/06/13 19:00:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 21 more times
20/06/13 19:01:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 20 more times
20/06/13 19:01:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 19 more times
20/06/13 19:01:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 18 more times
20/06/13 19:01:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 17 more times
20/06/13 19:02:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 16 more times
20/06/13 19:02:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 15 more times
20/06/13 19:02:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 14 more times
20/06/13 19:02:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 13 more times
20/06/13 19:03:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 12 more times
20/06/13 19:03:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 11 more times
20/06/13 19:03:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 10 more times
20/06/13 19:03:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 9 more times
20/06/13 19:04:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 8 more times
20/06/13 19:04:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 7 more times
20/06/13 19:04:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 6 more times
20/06/13 19:04:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 5 more times
20/06/13 19:05:04 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 4 more times
20/06/13 19:05:19 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 3 more times
20/06/13 19:05:34 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 2 more times
20/06/13 19:05:49 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 1 more times
20/06/13 19:06:03 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 0 more times
F.2020-06-13 19:06:06.589740: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-06-13 19:06:06.589812: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-06-13 19:06:06.589854: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (master): /proc/driver/nvidia/version does not exist
2020-06-13 19:06:06.591429: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-13 19:06:06.603496: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2916350000 Hz
2020-06-13 19:06:06.604624: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f0128000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-13 19:06:06.604804: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
............No container found for worker_1
No container found for worker_2
stopping org.apache.spark.deploy.master.Master
starting org.apache.spark.deploy.master.Master, logging to /usr/local/lib/python3.7/dist-packages/pyspark/logs/spark--org.apache.spark.deploy.master.Master-1-master.out
No container found for worker_1
No container found for worker_2
Starting worker 1
Starting worker 2
.No container found for worker_1
No container found for worker_2
stopping org.apache.spark.deploy.master.Master
starting org.apache.spark.deploy.master.Master, logging to /usr/local/lib/python3.7/dist-packages/pyspark/logs/spark--org.apache.spark.deploy.master.Master-1-master.out
No container found for worker_1
No container found for worker_2
Starting worker 1
Starting worker 2
20/06/13 19:09:21 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:303)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to master/172.20.0.2:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: master/172.20.0.2:7077
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
20/06/13 19:12:21 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 40 more times
20/06/13 19:12:36 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 39 more times
20/06/13 19:12:51 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 38 more times
20/06/13 19:13:06 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 37 more times
20/06/13 19:13:21 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 36 more times
20/06/13 19:13:36 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 35 more times
20/06/13 19:13:51 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 34 more times
20/06/13 19:14:06 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 33 more times
20/06/13 19:14:21 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 32 more times
20/06/13 19:14:36 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 31 more times
20/06/13 19:14:51 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 30 more times
20/06/13 19:15:06 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 29 more times
20/06/13 19:15:21 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 28 more times
20/06/13 19:15:36 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 27 more times
20/06/13 19:15:51 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 26 more times
20/06/13 19:16:06 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 25 more times
20/06/13 19:16:21 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 24 more times
20/06/13 19:16:35 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 23 more times
20/06/13 19:16:50 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 22 more times
20/06/13 19:17:05 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 21 more times
20/06/13 19:17:20 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 20 more times
20/06/13 19:17:35 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 19 more times
20/06/13 19:17:50 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 18 more times
20/06/13 19:18:05 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 17 more times
20/06/13 19:18:20 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 16 more times
20/06/13 19:18:35 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 15 more times
20/06/13 19:18:50 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 14 more times
20/06/13 19:19:05 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 13 more times
20/06/13 19:19:20 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 12 more times
20/06/13 19:19:35 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 11 more times
20/06/13 19:19:50 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 10 more times
20/06/13 19:20:05 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 9 more times
20/06/13 19:20:20 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 8 more times
20/06/13 19:20:35 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 7 more times
20/06/13 19:20:50 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 6 more times
20/06/13 19:21:05 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 5 more times
20/06/13 19:21:20 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 4 more times
20/06/13 19:21:35 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 3 more times
20/06/13 19:21:50 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 2 more times
20/06/13 19:22:05 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 1 more times
20/06/13 19:22:20 WARN DAGScheduler: Barrier stage in job 0 requires 1 slots, but only 0 are available. Will retry up to 0 more times
FNo container found for worker_1
No container found for worker_2
stopping org.apache.spark.deploy.master.Master
starting org.apache.spark.deploy.master.Master, logging to /usr/local/lib/python3.7/dist-packages/pyspark/logs/spark--org.apache.spark.deploy.master.Master-1-master.out
No container found for worker_1
No container found for worker_2
Starting worker 1
Starting worker 2
20/06/13 19:22:27 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:303)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to master/172.20.0.2:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: master/172.20.0.2:7077
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
20/06/13 19:25:27 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 40 more times
20/06/13 19:25:42 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 39 more times
20/06/13 19:25:57 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 38 more times
20/06/13 19:26:12 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 37 more times
20/06/13 19:26:27 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 36 more times
20/06/13 19:26:42 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 35 more times
20/06/13 19:26:57 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 34 more times
20/06/13 19:27:12 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 33 more times
20/06/13 19:27:27 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 32 more times
20/06/13 19:27:42 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 31 more times
20/06/13 19:27:57 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 30 more times
20/06/13 19:28:12 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 29 more times
20/06/13 19:28:27 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 28 more times
20/06/13 19:28:42 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 27 more times
20/06/13 19:28:57 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 26 more times
20/06/13 19:29:12 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 25 more times
20/06/13 19:29:27 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 24 more times
20/06/13 19:29:42 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 23 more times
20/06/13 19:29:57 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 22 more times
20/06/13 19:30:12 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 21 more times
20/06/13 19:30:27 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 20 more times
20/06/13 19:30:42 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 19 more times
20/06/13 19:30:56 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 18 more times
20/06/13 19:31:11 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 17 more times
20/06/13 19:31:26 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 16 more times
20/06/13 19:31:41 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 15 more times
20/06/13 19:31:56 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 14 more times
20/06/13 19:32:11 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 13 more times
20/06/13 19:32:26 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 12 more times
20/06/13 19:32:41 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 11 more times
20/06/13 19:32:56 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 10 more times
20/06/13 19:33:11 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 9 more times
20/06/13 19:33:26 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 8 more times
20/06/13 19:33:41 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 7 more times
20/06/13 19:33:56 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 6 more times
20/06/13 19:34:11 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 5 more times
20/06/13 19:34:26 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 4 more times
20/06/13 19:34:41 WARN DAGScheduler: Barrier stage in job 0 requires 2 slots, but only 0 are available. Will retry up to 3 more times
^CERROR: Aborting.

@mengxr
Copy link
Contributor

mengxr commented Jun 22, 2020

cc: @WeichenXu123 @sarthfrey

@sarthfrey
Copy link
Contributor

Confirmed with @rayjinghaolei offline, his docker engine doesn't have enough memory to run the tests. Perhaps we should add a warning to the test running instructions.

@sarthfrey
Copy link
Contributor

I will try to reproduce and see if we can catch this incident and deliver a good error message to the test runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants