Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks for openvswitch and helper function to enable the checks #601

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions charmhelpers/contrib/charmsupport/nrpe.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import yaml

from charmhelpers.core.hookenv import (
DEBUG,
config,
hook_name,
local_unit,
Expand Down Expand Up @@ -509,6 +510,37 @@ def add_haproxy_checks(nrpe, unit_name):
check_cmd='check_haproxy_queue_depth.sh')


def add_openvswitch_checks(nrpe, unit_name):
"""
Add checks for openvswitch

:param NRPE nrpe: NRPE object to add check to
:param str unit_name: Unit name to use in check description
"""
enable_sudo_for_openvswitch_checks()
nrpe.add_check(
shortname='openvswitch',
description='Check Open vSwitch {%s}' % unit_name,
afreiberger marked this conversation as resolved.
Show resolved Hide resolved
check_cmd='check_openvswitch.py')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about make the name more explicit, e.g. check_ovs_interfaces.py or check_ovs_ifaces.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was definitely feeling this would be the start of something that could be expanded to more checks as we identify them. The interface errors are just MVP for the current need.



def enable_sudo_for_openvswitch_checks():
sudoers_dir = "/etc/sudoers.d"
sudoers_mode = 0o100440
ovs_sudoers_file = "99-check_openvswitch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, suggest check_ovs_interfaces; unless the plan is to expand in the future?

ovs_sudoers_entry = "nagios ALL=(root) NOPASSWD: /usr/bin/ovs-vsctl show"
dest = os.path.join(sudoers_dir, ovs_sudoers_file)
try:
with open(dest, "w") as sudoer_file:
sudoer_file.write(ovs_sudoers_entry)
os.chmod(dest, sudoers_mode)
os.chown(dest, uid=0, gid=0)
except (OSError, IOError) as e:
log("Failed to setup sudoers file for check_openvswitch: {}".format(e))
else:
log("Sudoers file for check_openvswitch installed: {}".format(dest), DEBUG)

afreiberger marked this conversation as resolved.
Show resolved Hide resolved

def remove_deprecated_check(nrpe, deprecated_services):
"""
Remove checks fro deprecated services in list
Expand Down
62 changes: 62 additions & 0 deletions charmhelpers/contrib/openstack/files/check_openvswitch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env python3
# -*- coding: us-ascii -*-
afreiberger marked this conversation as resolved.
Show resolved Hide resolved
"""Check for issues with NVME hardware devices."""

import argparse
import re
import subprocess
import sys

from nagios_plugin3 import CriticalError, UnknownError, try_check


def parse_ovs_status():
"""Check for errors in 'ovs-vsctl show' output."""
afreiberger marked this conversation as resolved.
Show resolved Hide resolved
try:
cmd = ["/usr/bin/sudo", "/usr/bin/ovs-vsctl", "show"]
ovs_output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
except subprocess.CalledProcessError as e:
raise UnknownError(
"UNKNOWN: Failed to query ovs: {}".format(
afreiberger marked this conversation as resolved.
Show resolved Hide resolved
e.output.decode(errors="ignore").rstrip()
)
)

ovs_vsctl_show_errors = []
ovs_error_re = re.compile(r"^.*error: (?P<message>.+)$", re.I)
for line in ovs_output.decode(errors="ignore").splitlines():
m = ovs_error_re.match(line)
if m:
ovs_vsctl_show_errors.append(m.group("message"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a regex, the ovs-vsctl command does support outputing in json (--format=json). Is there a reason for not doing that (perhaps that the errors would appear in different nodes in different versions??) Just wondering how to make it less magic.

Copy link
Contributor Author

@afreiberger afreiberger Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovs-vsctl show does not support the --format option (the args are parsed, but the output is not differentiated).

root@ruling-manta:/home/ubuntu# ovs-vsctl -f json show
fcaf57e2-8667-4972-b687-169789d1d15d
    Bridge "br0"
        Port "dpdk-p1"
            Interface "dpdk-p1"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.1"}
                error: "could not open network device dpdk-p1 (Address family not supported by protocol)"
        Port "br0"
            Interface "br0"
                type: internal
        Port "dpdk-p0"
            Interface "dpdk-p0"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0"}
                error: "could not open network device dpdk-p0 (Address family not supported by protocol)"
    ovs_version: "2.9.8"

I've also investigated using the json output for ovs-vsctl list Interfaces:

{"data":[[["uuid","9bc2d04d-0069-4a76-918a-b9ac61d4c5ce"],["set",[]],["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],"could not open network device dpdk-p1 (Address family not supported by protocol)",["map",[]],["set",[]],0,0,["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],"dpdk-p1",-1,["set",[]],["map",[["dpdk-devargs","0000:01:00.1"]]],["map",[]],["map",[]],["map",[]],"dpdk"],[["uuid","2d8758ca-510f-4399-ab84-8c1f3a33061e"],"down",["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],4,0,0,["set",[]],0,["set",[]],"down",["map",[]],["set",[]],"c2:bd:86:7f:fa:4d",1500,["set",[]],"br0",65534,["set",[]],["map",[]],["map",[]],["map",[["collisions",0],["rx_bytes",0],["rx_crc_err",0],["rx_dropped",2],["rx_errors",0],["rx_frame_err",0],["rx_over_err",0],["rx_packets",0],["tx_bytes",0],["tx_dropped",0],["tx_errors",0],["tx_packets",0]]],["map",[["driver_name","openvswitch"]]],"internal"],[["uuid","d67191aa-bdae-4ed5-95c8-86778276052e"],["set",[]],["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],"could not open network device dpdk-p0 (Address family not supported by protocol)",["map",[]],["set",[]],0,0,["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],"dpdk-p0",-1,["set",[]],["map",[["dpdk-devargs","0000:01:00.0"]]],["map",[]],["map",[]],["map",[]],"dpdk"]],"headings":["_uuid","admin_state","bfd","bfd_status","cfm_fault","cfm_fault_status","cfm_flap_count","cfm_health","cfm_mpid","cfm_remote_mpids","cfm_remote_opstate","duplex","error","external_ids","ifindex","ingress_policing_burst","ingress_policing_rate","lacp_current","link_resets","link_speed","link_state","lldp","mac","mac_in_use","mtu","mtu_request","name","ofport","ofport_request","options","other_config","statistics","status","type"]}

Very odd that the project uses "headings" for indexing the data model instead of making it key-value oriented output as would be expected of json. Trying to use this instead of parsing with regex, I get the following that requires some additional processing for values that are empty, as every Interface has an errors key, and if it's blank, the value of that list index is a list that contains the data type, "set", and the empty set, [].

>>> error_index = data["headings"].index("error")
>>> for interface in data["data"]:
...     print(interface[error_index])
... 
could not open network device dpdk-p1 (Address family not supported by protocol)
['set', []]
could not open network device dpdk-p0 (Address family not supported by protocol)

This could certainly be used instead of regex with an if interface[error_index] != list(['set', []]): but I chose the simpler to read regex, and am also hoping to catch errors from 'ovs-vsctl show' that may not be Interface related (though for the current requirement, limiting to checking for Interface errors would suffice).

Do you have advice regarding readability of code vs using something other than regex in a situation like this? The regex, to me, seemed more elegant and readable vs the additional handling of missing "error" index as well as the handling of the ['set', []].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After investigating all of the other tables, Interfaces is the only table that has the error column, so I'll write this more deterministically and be able to include potentially vital interface information in the notification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize it wasn't a complete implemention (the --format option being missing). I always worry about using regex's on human-readable/consumable output as it is prone to be changed (on a whim sometimes!) and so it can make the code brittle.

I think from your explanations, it's fine to go with regex as a pragmatic solution as long as all error conditions are handled. I'll go back and look again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated with latest commit to use the Interface table json.


if ovs_vsctl_show_errors:
numerrs = len(ovs_vsctl_show_errors)
raise CriticalError(
"CRITICAL: Found {} error(s) in ovs-vsctl show: "
"{}".format(numerrs, ", ".join(ovs_vsctl_show_errors))
)

print("OK: no errors found in openvswitch")
afreiberger marked this conversation as resolved.
Show resolved Hide resolved


def parse_args(argv=None):
"""Process CLI arguments."""
parser = argparse.ArgumentParser(
prog="check_openvswitch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previous

description=(
"this program checks openvswitch status and outputs an "
afreiberger marked this conversation as resolved.
Show resolved Hide resolved
"appropriate Nagios status line"
),
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
return parser.parse_args(argv)


def main(argv):
"""Define main subroutine."""
parse_args(argv)
try_check(parse_ovs_status)


if __name__ == "__main__":
main(sys.argv[1:])