Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My MQTT Broker disconnects often. #33

Open
jnherm opened this issue Mar 1, 2018 · 33 comments
Open

My MQTT Broker disconnects often. #33

jnherm opened this issue Mar 1, 2018 · 33 comments

Comments

@jnherm
Copy link

jnherm commented Mar 1, 2018

Does anybody experience in your broker always disconnects all clients? All my clients were all disconnected sometimes.

@martin-ger
Copy link
Owner

Are your clients connected from via the AP interface or via the STA. The broker has to disconnect all clients when the uplink connection of the STA is (temporarily) lost.

@jnherm
Copy link
Author

jnherm commented Mar 1, 2018

Hello Martin! My clients and broker are both connected to my home router AP.

@martin-ger
Copy link
Owner

Then I am pretty sure that you have a temporary disconnect of the ESP from the router AP - you should be able to see that on the serial console.

To verify that you might also run a script on the broker that does some action "on wifidisconnect", e.g. increase a counter variable. You than can log into the broker and look into "show vars" to see the value of the counter.

How to fix that?

  • Maybe the distance between the broker and the AP is too large and the signal maybe somewhat week?
  • Most clients do an automatic reconnect, so the problems should be only temporarily. You might shorten the keep-alive timeout on the clients to detect these problems faster.

@jnherm
Copy link
Author

jnherm commented Mar 1, 2018

Ok Martin, Thank you for the advice. I will look into it. Update you when something comes up.

@jnherm
Copy link
Author

jnherm commented Mar 1, 2018

Off topic Martin, I am using netcat for windows to upload script to the esp, Why is it that it would take about 5mins to upload the script regardless of the size of my script?

@jnherm
Copy link
Author

jnherm commented Mar 1, 2018

Can you help me understand this log:

Waiting for script upload on port 2000
CMD>Fatal exception 0(IllegalInstructionCause):
epc1=0x40217b7b, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x0
þ000000
ets Jan 8 2013,rst cause:1, boot mode:(3,6)

load 0x40100000, len 31244, room 16
tail 12
chksum 0x7d
ho 0 tail 12 room 4
load 0x3ffe8000, len 2124, room 12
tail 0
chksum 0xf5
load 0x3ffe8850, len 11972, room 8
tail 12
chksum 0x6a
csum 0x6a
çI8‚Œ�ò±¾C¡C¡U-T-$ª«�W–®©*���$(®k«–�$•& �Ò.W,.]Z·VH¨HhÔËËVë�’Ö«—‹�R,‹��ë+�V�VH¨”
•ªJ�R-
Ò×,®Z\º� Error ('init', 'mqtStarting Console TCP Server on port 7777
Max number of TCP clients: 15
mode : sta(ec:fa:bc:07:50:8e)
add if0
mode: 0 -> 3
�state: 2 ->@3 (0)
state: 3 -> 5 (10)
add 0
aid 2
cnt

connected with Balay24, channel 1
dhcp client start...
connect to ssid Balay24, channel 1
ip:192.168.10.101,mask:255.255.255.0,gw:192.168.10.251
ip:192.168.10.101,mask:255.255.255.0,gw:192.168.10.251,dns:192.168.10.251
pm open,type:2 0
Got NTP server: 129.250.35.251
NTP synced

@jnherm
Copy link
Author

jnherm commented Mar 1, 2018

Martin, what is wrong with this snippet?


% When wifi disconnects
on wifidisconnect
do
println "Wifi Disconnected"
setvar $wifiDisc = @3 + 1
setvar @3 = $wifiDisc
println "Wifi Disconnected : " | $wifiDisc | " times"
% When wifi connected
on wificonnect
do
println "Wifi Connected"
setvar $wifiCon = @4 + 1
setvar @4 = $wifiCon
println "Wifi Connected : " | $wifiCon | " times"


I got error:

Script upload completed (1387 Bytes)
Error ('init', 'mqttconnect', 'topic', 'gpio_interrupt', 'serial', 'alarm', 'htt
p_response', or 'timer' expected) at >>do =

@martin-ger
Copy link
Owner

martin-ger commented Mar 1, 2018 via email

@martin-ger
Copy link
Owner

Just tested it: the script is fine with the lastest build.

No, it is not normal, that it takes 5 min - should work immediately. However, never tested it with Windows. What netcat do you use? Some sort of connection problems?

Your trace is obviously a reboot. This happened while loading a script? If so, are you able to reproduce it with the same script? If so, I would be interested in this script!

@jnherm
Copy link
Author

jnherm commented Mar 1, 2018

hello Martin, Thanks for your help. I just upgraded my firmware and it work. I can now monitor my wifi connection. But one thing I observe is "on wifidisconnect" will execute everytime the esp tried to connect to my router. That is if my router is off, the counter keep on increasing even though it was previously disconnected.

Regarding my netcat version for windows, I just downloaded if from github, by Diego Casorran.

This is my Final Script that I used:

% Config params, overwrite any previous settings from the commandline
config ap_ssid MQTTBROKER2
config ap_password
config ntp_server 1.de.pool.ntp.org
config broker_user
config broker_password
config speed 80
% Now the initialization, this is done once after booting
on init
do

% @<num> vars are stored in flash and are persistent even after reboot 
setvar $run = @2 + 1
setvar @2 = $run
println "This is reboot no "|$run
setvar $relay_status = 0
gpio_out 12 $relay_status
setvar $command_topicX = "cmnd/sonoff/03/POWER1"
setvar $command_topicY = "cmnd/sonoff/03/POWER2"

% The local pushbutton
on gpio_interrupt 0 pullup
do
println "New state GPIO 0: " | $this_gpio
if $this_gpio = 0 then

	gpio_out 13 not ($relay_status)
	publish local $command_topicX $relay_status retained
	publish local $command_topicY $relay_status retained
	if $relay_status = 0 then
		setvar $relay_status = 1
	else
		setvar $relay_status = 0
	endif
	
endif

% When wifi disconnects
on wifidisconnect
do
println "Wifi Disconnected on " | $timestamp
setvar $wifiDisc = @3 + 1
setvar @3 = $wifiDisc
println "Wifi Disconnected : " | $wifiDisc | " times"

% When wifi connected
on wificonnect
do
println "Wifi Connected on " | $timestamp
setvar $wifiCon = @4 + 1
setvar @4 = $wifiCon
println "Wifi Connected : " | $wifiCon | " times"

@jnherm
Copy link
Author

jnherm commented Mar 1, 2018

Can you suggest netcat version for windows? Or any method that will be easier for my to upload script?

@martin-ger
Copy link
Owner

You could try this local web server ( http://fenixwebserver.com/ ) on your windows machine and use the "pull"-mode.

@jnherm
Copy link
Author

jnherm commented Mar 2, 2018

Hello Martin! It doesn't seems to work from my end. I tried to use fenix webserver but when I "pull" the script from fenix webserver to my sonoff device via serial this is what happend:

HTTP request to http://127.0.0.1:81/sonoffScript2.txt started

Then nothing happens(i waited for 5 to 10mins), but when I press enter, this is the message:

HTTP script upload failed (error code -1).

I also tried "pull" request via internet. I uploaded my script to my google drive and get the link. But again this shows:

client handshake start.
client handshake ok!

After that, I waited but nothing happens. The when I press enter key, this is the message:

HTTP script upload failed (error code 302)

@jnherm
Copy link
Author

jnherm commented Mar 2, 2018

Hello Martin, I downloaded Android App webserver and it work great. Just after the request, ,my script were downloaded to esp device.
I also tried this webserver that works with windows (https://sourceforge.net/projects/miniweb/files/), the result was great.

@martin-ger
Copy link
Owner

Good to read!

BTW: I guest 127.0.0.1 was the wrong address for the fenix - this is "localhost" you will need the actual IP of your PC in the local net.

@jnherm
Copy link
Author

jnherm commented Mar 2, 2018

127.0.0.1 was the host ip given by fenix... I tried also localhost, but to no avail. I tried to use my browser and the text were displayed when I use 127.0.0.1.

@martin-ger
Copy link
Owner

I am currently running a setup where also the MQTT connection to the uMQTTBroker is interrupted and than immediatly reestablished by the clients. At least the wireshark trace tells me, that the CLIENTs (using tuampmt's original lib) actively disconnect, not the broker. Up to know I don't know why...

@jnherm
Copy link
Author

jnherm commented Mar 5, 2018

I also observe the number of disconnections on my setup but I noticed that even the esp MQTT broker did not disconnect from my router, the clients tried to reconnect, at least based on the log of the clients. By the way I am using itead sonoff devices with Tasmota firmware. I also posted an issue on tasmota but unfortunately no one is responding positively to my issue.
My issue with tasmota firmware is that whenever it gets reconnected with the MQTT broker it will randomly change my relay status. Sometimes it will toggle, sometime it will turn OFF or turn ON the relay.

So, I think your broker is stable. Even if it sometimes disconnect from the router, it will alway reconnects.

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

Martin, sorry to bother you again. I observe that when I reconnect my sonoff to your MQTT broker, after subscriptions of topic, your MQTT broker will try to publish(maybe test publish) the topic just subscribed. Is my observation correct? Other MQTT broker I used won't do so. Is it possible that you could have a setting for that so that user can choose if they want to test the subscribed topic or not.

@martin-ger
Copy link
Owner

martin-ger commented Mar 6, 2018 via email

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

Yes there are subscriptions that are 'retained', but the same subscriptions did not cause the other MQTT broker that I am using to send a published MQTT command to my devicce. On the other hand, your broker will to send those subscribed command to my devices after re-connection.

@martin-ger
Copy link
Owner

I think, this behavior is exactly conforming to the specs:
https://www.hivemq.com/blog/mqtt-essentials-part-8-retained-messages

A retained message is a normal MQTT message with the retained flag set to true. The broker will store the last retained message and the corresponding QoS for that topic Each client that subscribes to a topic pattern, which matches the topic of the retained message, will receive the message immediately after subscribing. For each topic only one retained message will be stored by the broker.

What is your reference broker? Mosquitto?

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

Furthermore, if it is due to 'retained' subscriptions, your broker should send the last published topic's payload, for which my device will get a payload that is the same with previous relay state. But in my case, my devices will get a consistent payload.
This is my scenario:

Client A lost MQTT connection with Broker ESP: (Client Relay is ON)
Cliens A will reconnect to Broker ESP every 10sec
If Broker ESP is now connected to wifi, Client A will have successful connection after next 10sec.
Client A then receives MQTT command from Broker ESP with Payload OFF
Client A will turn OFF the relay.

Note that if previous client A's relay state is OFF before re-connection, relay will remain OFF since Payload is OFF.

@martin-ger
Copy link
Owner

Don`t understand why the last state that is send after resubscription should be OFF? When it lost connection when switched ON, the retained state should be ON? It's not?

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

I am using this Android App as a broker. I don't know if it is Mosquitto. https://play.google.com/store/apps/details?id=server.com.mqtt

@martin-ger
Copy link
Owner

I have a guess: is it possible that you have a restart of the broker? Then this could make sense: if the retained state is saved in flash, it will be constant after restart. This would also explain the connection loss.

What kind of ESP are you using? What about power supply?

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

I really prefer your ESP broker because it is the most economical for a small/lightweight application like controlling a room.

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

I am using ITEAD's sonoff basic as ESP broker. Power supply is SMPS inside sonoff basic connected directly to mains.

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

"Dont understand why the last state that is send after resubscription should be OFF? When it lost connection when switched ON, the retained state should be ON? It's not?"

Yes you are right, It should be ON. I have other scenarios which toggle my relay, no matter what is my previous relay state. Maybe ESP broker saved "toggle" payload

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

I have a guess: is it possible that you have a restart of the broker?

I simulate lost connection by turning Off then ON the ESP broker.

@martin-ger
Copy link
Owner

Okay, then the behavior is clear: you have a saved state in flash: OFF. If you reset the broker (on/off), it will restart with this state from flash.
Two options:

  • set broker_autoretain 1
    this will immediatly store any state change in flash (on the long run this may cause flash mem to faill) okay for tests or state that rarely changes.
  • delete_retained
    This will delete all retained state from flash. If you reboot the esp it won't remember the previous state. However this reboot does not simulate a temporary connecton loss correctly.

@jnherm
Copy link
Author

jnherm commented Mar 6, 2018

Thank you Martin.

@jnherm
Copy link
Author

jnherm commented Mar 7, 2018

Good news Martin. With autoretain set to 1 my problem solved! Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants