Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Every restart/redeploy of the config node advertises a new DNS-SD record without removing the old #477

Open
lilyball opened this issue Feb 28, 2022 · 3 comments
Assignees
Labels
bug 🐛 There is at least high chance that it is a bug!

Comments

@lilyball
Copy link

NRCHKB Plugin Version

1.4.3

Node JS Version

v14.19.0

NPM Version

6.14.16

Node-RED Version

2.2.0

Operating System

Raspberry pi Buster

What happened?

Every time I either deploy Full or Restart Flows, once the plugin has finished reinitializing (i.e. after a few seconds) a brand new DNS-SD record is advertised for the bridge without removing the old record. The records are identical except for the port, and the DNS-SD service name ends up with an autoincrementing number tacked on the end.

Checking right now, I have 20 different DNS-SD service records for the exact same config node. 18 of them declare ports that node-red is not listening on. node-red is listening on two of the ports though. One is for the most recent service, and the other is for the original service, the first one to exist for the bridge. I don't know why it's still listening on that original port, but there are a couple of active connections to that instance right now (though there are more connections to the latest bridge).

If I use one of the devices connected to the original bridge, it doesn't work. The Home app on that device reports the wrong state, and toggling the accessory from that device looks like it works in Home and actually produces a status update on the service node, but it produces no events (I am very confused as to how the service node actually gets a status update with no event). If I use a different device, one connected to the latest bridge, then things work fine there. I also suspect this might explain why my 3AM automation last night that controls this accessory did not work, as it may have sent the command to the phantom original bridge instance.

Also potentially related, I renamed my config node but the bridge name in HomeKit hasn't changed. I know the _hap._tcp. service name needs to match the name in the Accessory Information Service, and renaming the config node caused the DNS-SD service name to change, but I don't know if it's reporting the new name in Accessory Information Service (surely it is?) or whether HomeKit is simply confused by the existence of the original DNS-SD service or whether bridges simply can't be renamed without repairing them (which would be frustrating and surprising). I can of course change the name in HomeKit myself, but that would just be a workaround.

The one thing I haven't done yet is restart node-red entirely, which would clear all of the DNS-SD records and start again (though they would just start accumulating again on each restart anyway).

How to reproduce?

  1. Have a config node. Mine looks like

    {
        "id": "6ec32a21d1cf0af0",
        "type": "homekit-bridge",
        "bridgeName": "HomeKit Bridge",
        "pinCode": "redacted",
        "port": "",
        "advertiser": "ciao",
        "allowInsecureRequest": false,
        "manufacturer": "NRCHKB",
        "model": "1.4.3",
        "serialNo": "Default Serial Number",
        "firmwareRev": "1.4.3",
        "hardwareRev": "1.4.3",
        "softwareRev": "1.4.3",
        "customMdnsConfig": false,
        "mdnsMulticast": true,
        "mdnsInterface": "",
        "mdnsPort": "",
        "mdnsIp": "",
        "mdnsTtl": "",
        "mdnsLoopback": true,
        "mdnsReuseAddr": true,
        "allowMessagePassthrough": true
    }
  2. Hook up services maybe. I don't know if that matters.

  3. Deploy

  4. Either make a change and Deploy Full, or Restart Flows

Expected behavior:

The old DNS-SD service should be removed and a new one created. Or it could also just keep the existing service if it wants to either reuse the port or can even keep the listening socket active across the restart, though that might run contrary to the notion of doing a restart, I'm not sure (also that might leak the service/socket if I delete the config node, so it should probably just remove and recreate the service).

Additional comments?

I'm currently using the ciao advertiser, as that is the newer modern advertiser and what I'm using in homebridge (bonjour-hap is very problematic and I'm actually really surprised this plugin defaults to that for new config nodes, it should be defaulting to ciao at this point as ciao is no longer experimental).

The Relevant log output is what I get from dns-sd -B _hap._tcp.. The last two entries were the results of restarting while I was in the process of writing this ticket. Notice how there are no remove events, just add events. The Node-RED Timers service set is the original name of my config, and HomeKit Bridge is what I renamed it to (though I need to rename it again now that I know that the "Name" of the config is what HomeKit sees).

Relevant log output

11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (2)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (3)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (4)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (8)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (9)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (10)
11:34:14.439  Add        3  13 local.               _hap._tcp.           Node-RED Timers E0EF
11:34:14.439  Add        3  13 local.               _hap._tcp.           Node-RED Timers E0EF (2)
11:34:14.439  Add        3  13 local.               _hap._tcp.           Node-RED Timers E0EF (3)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (14)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (15)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (11)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (12)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (13)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (5)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (6)
11:34:14.439  Add        3  13 local.               _hap._tcp.           HomeKit Bridge E0EF (7)
11:34:55.108  Add        2  13 local.               _hap._tcp.           HomeKit Bridge E0EF (16)
11:36:02.696  Add        2  13 local.               _hap._tcp.           HomeKit Bridge E0EF (17)
@lilyball
Copy link
Author

lilyball commented Mar 1, 2022

After rebooting the entire nodered service, this problem isn't reproducing. I expect it will occur again later though. I also wonder if there's any connection between this and my sometimes-recurring issue in the past of service nodes getting duplicate messages on a partial deploy, e.g. maybe something is leaking there and that's holding onto the dns-sd service somehow. Although when this problem was reproducing it was occurring on every reload, not just reloads after getting the service duplicate message bug, so I'm just grasping at straws here.

@Shaquu
Copy link
Member

Shaquu commented Mar 1, 2022

Foremost, as far as I know any of the currently existing advertiser cannot be called best or perfect.

Regarding zombie records, we have a possibility to set custom mDNS configuration in bridge.
Maybe try fiddling with that?

@lilyball
Copy link
Author

lilyball commented Mar 2, 2022

Foremost, as far as I know any of the currently existing advertiser cannot be called best or perfect.

I did not say perfect. but it is the best (of the two options). The bonjour-hap advertiser does not conform to the RFCs and does not pass Apple's HomeKit conformance tests. The ciao advertiser was written specifically to replace bonjour-hap and both conforms to the RFCs and claims to pass Apple's HomeKit conformance tests. Even the bonjour-hap github repo says to not use it anymore.

Regarding zombie records, we have a possibility to set custom mDNS configuration in bridge.

I could try fiddling with that, but I'd rather not do so without a good reason. In NorthernMan54/Hap-Node-Client#56 it was suggested that I try explicitly assigning a port in the hopes that this would prevent the zombies, though since I can't currently reproduce the issue (not since rebooting nodered) I'm not inclined to do that, especially as I'm a bit skeptical that it wouldn't just create zombies anyway that have the same port (given that service conflicts produce new service names).

I'm not sure what the conditions are that cause this to happen, beyond the fact that once it starts, every "Restart Flows" causes a new zombie even without changes. I'm not inclined to fiddle too much right now since I don't want to break my node-red service (and I don't have the time right now to figure out how to run a test node-red setup and fiddle with that, especially since I don't want to confuse HomeKit and I don't know if the reproduction steps require HomeKit to actually be involved).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 There is at least high chance that it is a bug!
Projects
None yet
Development

No branches or pull requests

2 participants