Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

Cgroups not mounting #52

Open
rboarman-sc opened this issue Aug 7, 2019 · 6 comments
Open

Cgroups not mounting #52

rboarman-sc opened this issue Aug 7, 2019 · 6 comments
Labels

Comments

@rboarman-sc
Copy link

Using your example, I was able to launch a Consul cluster (working fine) and a Nomad cluster which successfully connects to Consul.

However, two of the drivers, java and exec, are failing to load due to error "Cgroup mount point unavailable."

Nomad client log file:

==> Loaded configuration from /opt/nomad/config/default.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 172.31.21.117:4646
            Bind Addrs: HTTP: 0.0.0.0:4646
                Client: true
             Log Level: DEBUG
                Region: us-west-2 (DC: us-west-2b)
                Server: false
               Version: 0.9.4

==> Nomad agent started! Log data will stream in below:

    2019-08-07T17:51:10.239Z [WARN ] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/opt/nomad/data/plugins
    2019-08-07T17:51:10.305Z [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/opt/nomad/data/plugins
    2019-08-07T17:51:10.305Z [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/opt/nomad/data/plugins
    2019-08-07T17:51:10.305Z [INFO ] agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2019-08-07T17:51:10.305Z [INFO ] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2019-08-07T17:51:10.305Z [INFO ] agent: detected plugin: name=rkt type=driver plugin_version=0.1.0
    2019-08-07T17:51:10.305Z [INFO ] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2019-08-07T17:51:10.305Z [INFO ] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2019-08-07T17:51:10.305Z [INFO ] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2019-08-07T17:51:10.305Z [INFO ] agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
    2019-08-07T17:51:10.307Z [INFO ] client: using state directory: state_dir=/opt/nomad/data/client
    2019-08-07T17:51:10.327Z [INFO ] client: using alloc directory: alloc_dir=/opt/nomad/data/alloc
    2019-08-07T17:51:10.331Z [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters="[arch cgroup consul cpu host memory network nomad signal storage vault env_gce env_aws]"
    2019-08-07T17:51:10.333Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=cgroup period=15s
    2019-08-07T17:51:10.335Z [DEBUG] client.fingerprint_mgr.cpu: detected cpu frequency: MHz=2400
    2019-08-07T17:51:10.335Z [DEBUG] client.fingerprint_mgr.cpu: detected core count: cores=1
    2019-08-07T17:51:10.337Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul period=15s
    2019-08-07T17:51:10.348Z [WARN ] client.fingerprint_mgr.network: unable to parse speed: path=/sbin/ethtool device=eth0
    2019-08-07T17:51:10.348Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/eth0/speed
    2019-08-07T17:51:10.348Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected and no speed specified by user, falling back to default speed: mbits=1000
    2019-08-07T17:51:10.348Z [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=eth0 IP=172.31.21.117
    2019-08-07T17:51:10.355Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=vault period=15s
    2019-08-07T17:51:10.373Z [DEBUG] client.fingerprint_mgr.env_gce: could not read value for attribute: attribute=machine-type resp_code=404
    2019-08-07T17:51:10.373Z [DEBUG] client.fingerprint_mgr: detected fingerprints: node_attrs="[arch cpu host network nomad signal storage env_aws]"
    2019-08-07T17:51:10.373Z [INFO ] client.plugin: starting plugin manager: plugin-type=driver
    2019-08-07T17:51:10.373Z [INFO ] client.plugin: starting plugin manager: plugin-type=device
    2019-08-07T17:51:10.400Z [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused"
    2019-08-07T17:51:10.400Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=driver
    2019-08-07T17:51:10.400Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=device
    2019-08-07T17:51:10.400Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=device
    2019-08-07T17:51:10.400Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=healthy description=Healthy
    2019-08-07T17:51:10.407Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=unhealthy description="Cgroup mount point unavailable"
    2019-08-07T17:51:10.407Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=qemu health=undetected description=
    2019-08-07T17:51:10.407Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=java health=unhealthy description="Cgroup mount point unavailable"
    2019-08-07T17:51:10.411Z [DEBUG] client.driver_mgr.docker: could not connect to docker daemon: driver=docker endpoint=unix:///var/run/docker.sock error="Get http://unix.sock/version: dial unix /var/run/docker.sock: connect: no such file or directory"
    2019-08-07T17:51:10.411Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=undetected description="Failed to connect to docker daemon"
    2019-08-07T17:51:10.411Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=rkt health=undetected description="Failed to execute rkt version: exec: "rkt": executable file not found in $PATH"
    2019-08-07T17:51:10.411Z [DEBUG] client.driver_mgr: detected drivers: drivers="map[undetected:[qemu docker rkt] healthy:[raw_exec] unhealthy:[exec java]]"
    2019-08-07T17:51:10.411Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=driver
    2019-08-07T17:51:10.411Z [INFO ] client: started client: node_id=7b3d2591-71fa-9d92-d949-2a748099420b
    2019-08-07T17:51:10.414Z [WARN ] client.server_mgr: no servers available
    2019-08-07T17:51:10.414Z [DEBUG] client: registration waiting on servers
    2019-08-07T17:51:10.414Z [WARN ] client.server_mgr: no servers available
    2019-08-07T17:51:10.415Z [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused"
    2019-08-07T17:51:13.468Z [ERROR] http: request failed: method=GET path=/v1/agent/health?type=client error="{"client":{"ok":false,"message":"no known servers"}}" code=500
    2019-08-07T17:51:13.468Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration=568.093µs
    2019-08-07T17:51:23.755Z [ERROR] http: request failed: method=GET path=/v1/agent/health?type=client error="{"client":{"ok":false,"message":"no known servers"}}" code=500
    2019-08-07T17:51:23.755Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration=140.767µs
    2019-08-07T17:51:25.625Z [INFO ] client.fingerprint_mgr.consul: consul agent is available
    2019-08-07T17:51:30.745Z [WARN ] client.server_mgr: no servers available
    2019-08-07T17:51:30.745Z [DEBUG] client: registration waiting on servers
    2019-08-07T17:51:30.747Z [DEBUG] client.consul: bootstrap contacting Consul DCs: consul_dcs=[us-west-2]
    2019-08-07T17:51:30.765Z [INFO ] client.consul: discovered following servers: servers=172.31.13.97:4647
    2019-08-07T17:51:30.765Z [DEBUG] client.server_mgr: new server list: new_servers=172.31.13.97:4647 old_servers=
    2019-08-07T17:51:30.777Z [DEBUG] client: updated allocations: index=1 total=0 pulled=0 filtered=0
    2019-08-07T17:51:30.778Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=0
    2019-08-07T17:51:30.778Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=0 errors=0
    2019-08-07T17:51:30.781Z [INFO ] client: node registration complete
    2019-08-07T17:51:33.756Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration=270.264µs
    2019-08-07T17:51:35.753Z [DEBUG] client: state updated: node_status=ready
    2019-08-07T17:51:38.116Z [DEBUG] client: state changed, updating node and re-registering
    2019-08-07T17:51:38.121Z [INFO ] client: node registration complete
    2019-08-07T17:51:43.757Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration=159.506µs

The configuration is directly from your example code except I set the number of servers and clients to one.

Please advise.

@brikis98
Copy link
Collaborator

brikis98 commented Aug 8, 2019

What OS? What version of Nomad?

@rboarman-sc
Copy link
Author

Sorry, I should have included that.

Amazon Linux 2: Linux 4.14.133-88.112.amzn1.x86_64 x86_64
Nomad: 0.9.4

@brikis98
Copy link
Collaborator

brikis98 commented Aug 9, 2019

@Etiene Any chance you could look into this one?

@rboarman-sc
Copy link
Author

@Etiene @brikis98 Any word on this? Thanks!

@Etiene
Copy link
Contributor

Etiene commented Aug 19, 2019

I'll have a look at that now! Sorry for the delay :)

@Etiene
Copy link
Contributor

Etiene commented Aug 19, 2019

Just to confirm and so it is easier for me to help you debug this, which example did you follow, the root example where the consul servers and the nomad servers are co-located? Or the one where you have 3 separate clusters?

2019-08-07T17:51:10.400Z [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused"

This line is interesting... It looks like the nomad client is failing to reach localhost at port 8500 and check through the consul client where the respective servers are located.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants