When launching multiple VMs and one fails, only kill that one. #49

SolomonShorser-OICR · 2015-10-23T18:49:44Z

When the Provisioner launches several VMs in a single batch and ansible fails to provision only one of them (such as SSH timeout when connecting), the entire batch will be killed at the end of the playbook because the playbook returns a non-zero error code even when only one VM fails. This is less than ideal when provisioning takes a while and there are large batches being provisioned at a time.

(was originally created on Consonance, but that was the wrong place: Consonance/consonance#97)

SolomonShorser-OICR · 2015-10-27T20:59:36Z

Another issue I've discovered here is that the workers that provisioned OK might actually have enough time to pull a job from the queue before they are reaped. So you could have scenarios where your job queue is draining but no work is getting done because the entire fleet is killed when one or two of them fail to provision. Maybe instead of killing the fleet at the end, would it be possible to do it at the beginning when a failure with one node is detected? Ideally, only the failed node should be reaped, but I realize that might be difficult to do (it would probably involve parsing the text output of ansible).

SolomonShorser-OICR added the bug label Oct 23, 2015

SolomonShorser-OICR mentioned this issue Oct 23, 2015

When launching multiple VMs and one fails, only kill that one. Consonance/consonance#97

Closed

SolomonShorser-OICR added the Reaper label Oct 23, 2015

SolomonShorser-OICR added the Deployer label Oct 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When launching multiple VMs and one fails, only kill that one. #49

When launching multiple VMs and one fails, only kill that one. #49

SolomonShorser-OICR commented Oct 23, 2015

SolomonShorser-OICR commented Oct 27, 2015

When launching multiple VMs and one fails, only kill that one. #49

When launching multiple VMs and one fails, only kill that one. #49

Comments

SolomonShorser-OICR commented Oct 23, 2015

SolomonShorser-OICR commented Oct 27, 2015