Announcement

Collapse
No announcement yet.

Errors in latest vers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Errors in latest vers

    Sorry, your PM's are full

    Getting these randomly since installing 3.5.5.0

    Thanks,
    Z
    Feb-27 8:05:30 PM mcsMQTT MQTTClient is Connected
    Feb-27 8:05:30 PM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 8:02:29 PM mcsMQTT MQTTClient is Connected
    Feb-27 8:02:29 PM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 6:13:25 PM mcsMQTT MQTTClient is Connected
    Feb-27 6:13:25 PM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 6:10:24 PM mcsMQTT MQTTClient is Connected
    Feb-27 6:10:24 PM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 7:27:19 AM mcsMQTT MQTTClient is Connected
    Feb-27 7:27:19 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 7:23:17 AM mcsMQTT MQTTClient is Connected
    Feb-27 7:23:17 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 7:19:13 AM mcsMQTT MQTTClient is Connected
    Feb-27 7:19:13 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 6:31:13 AM mcsMQTT MQTTClient is Connected
    Feb-27 6:31:13 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-27 3:14:08 AM mcsMQTT MQTTClient is Connected
    Feb-27 3:14:08 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-26 10:32:10 PM mcsMQTT MQTTClient is Connected
    Feb-26 10:32:10 PM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-26 5:23:58 AM mcsMQTT MQTTClient is Connected
    Feb-26 5:23:58 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-26 12:02:01 AM mcsMQTT MQTTClient is Connected
    Feb-26 12:02:01 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-25 6:36:34 AM mcsMQTT MQTTClient is Connected
    Feb-25 6:36:34 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-25 4:11:36 AM mcsMQTT MQTTClient is Connected
    Feb-25 4:11:36 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-25 1:16:35 AM mcsMQTT MQTTClient is Connected
    Feb-25 1:16:35 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-24 10:21:14 AM mcsMQTT MQTTClient is Connected
    Feb-24 10:21:14 AM mcsMQTT Found a Self-Signed Cert (OK)
    Feb-24 10:21:13 AM mcsMQTT GetLastHistoryRecord Line 0 Conversion from type 'DBNull' to type 'Long' is not valid.
    Feb-24 10:21:12 AM mcsMQTT Version 3.5.5.0 Registered with Homeseer

    #2
    These are an indication that mcsMQTT observed on its 10 second polling cycle that its connection to the broker was lost so it reconnected. It is an area of the code that has not changed in awhile, but secondary effects may exist. These seem somewhat random. Is there anything else that you may be correlated?

    Comment


      #3
      Originally posted by Michael McSharry View Post
      These are an indication that mcsMQTT observed on its 10 second polling cycle that its connection to the broker was lost so it reconnected. It is an area of the code that has not changed in awhile, but secondary effects may exist. These seem somewhat random. Is there anything else that you may be correlated?
      Nothing has changed on this end other than the vers update as far as I know. My broker does show the discconnect, but says it's a socket error on the client:
      1551315915: Socket error on client mcsMQTT on SHELTERMONITOR2, disconnecting.
      1551315925: New connection from 192.168.1.79 on port 8883.

      It reconnects 10 sec after it fails, so the PI watchdog is working as designed. Maybe an errant device value causes it to fault? That would explain the randomness.

      HS3 is the only device the broker is showing problems with. My HS3 server event log shows nothing during those times (or for that day) as well.

      Will the PI debug log be too large to run for a day?

      Thanks,
      Z

      Comment


        #4
        Debug for a day will not be a problem. I searched the code for intentional disconnects and they occur only in three places. One is on the General tab when changing the broker setup and then in the 10 second monitor when an connection attempt fails. The third is during shutdown.

        csMQTT's interface with M2Mqtt is with subscribe/unsubscribe, publish, connect/isconnected and the MqttMsgPublishReceived callback for received messages. The callback is serviced in a separate thread and coupled with a queue. Logic changes occurred in this separate thread to improve efficiency. This thread now runs faster so the queue empties sooner, but I don't see how it would affect the socket.

        I put source at http://mcsSprinkler.com/mcsMQTT_source_3550.zip in case it may help.

        Comment


          #5
          Originally posted by Michael McSharry View Post
          Debug for a day will not be a problem. I searched the code for intentional disconnects and they occur only in three places. One is on the General tab when changing the broker setup and then in the 10 second monitor when an connection attempt fails. The third is during shutdown.

          csMQTT's interface with M2Mqtt is with subscribe/unsubscribe, publish, connect/isconnected and the MqttMsgPublishReceived callback for received messages. The callback is serviced in a separate thread and coupled with a queue. Logic changes occurred in this separate thread to improve efficiency. This thread now runs faster so the queue empties sooner, but I don't see how it would affect the socket.

          I put source at http://mcsSprinkler.com/mcsMQTT_source_3550.zip in case it may help.
          Will check it out. I suspect others might be having the same issue, but I don't think many use the SSL connect so they don't see it??

          BTW
          Playing around looking at the error I came across a couple other items you might take under consideration.

          1. When the Debug checkbox is unchecked, the Debug_log.txt file is still locked so you can't delete it without stopping the PI.

          2. # of backup DB files doesn't seem to be pruned? Is that a manual task for the user?

          3. I get a GetLastHistoryRecord error on startup because I don't have anything in the History DB. The Exception is giving a "Conversion to Long", which while accurate is a bit confusing. Perhaps get the data back and check for DBnull before doing the Long conversion and either ignore the error, or say "History DB is empty" ???

          4. When you're on the Assoc page of the PI, changing any of the checkboxes at the top require a manual refresh of the page rather than automatically refreshing on change of state?

          5. Provide a way to remove the unused (usually marked in the R column) from the DB rather than just hiding them. I had over 200 junk records in there. Really speeds up the startup of the PI when they are gone. I removed them from both of the Tables they were in.

          Thanks,
          Z

          Comment


            #6
            Thank you for the feedback

            1. There is some unconditional writes that go to the file. I could split it and in general enhance the debug for multiple levels of information
            2. The plugin does prune the backup databases, but I think it is 6 months.
            3. Saw it in your post and source updated already
            4. This is intentional and why the Show Association button is present. There may be multiple clicks to setup the filters and rebuilding the list is slow and the user would not want to wait after each click
            5. This was added in 3.5. It is on the General tab Inbound Obsolete row.

            Comment


              #7
              Originally posted by Michael McSharry View Post
              Thank you for the feedback

              1. There is some unconditional writes that go to the file. I could split it and in general enhance the debug for multiple levels of information
              2. The plugin does prune the backup databases, but I think it is 6 months.
              3. Saw it in your post and source updated already
              4. This is intentional and why the Show Association button is present. There may be multiple clicks to setup the filters and rebuilding the list is slow and the user would not want to wait after each click
              5. This was added in 3.5. It is on the General tab Inbound Obsolete row.
              RTFM
              Understood. Splitting debug probably isn't worth your time.
              Can't get the link to work for the source.

              Thanks,
              Z

              Comment


                #8
                http://mcsSprinklers.com/mcsMQTT_Source_3550.zip

                I just uploaded again and confirmed I could download.

                Comment


                  #9


                  Clue:
                  3/1/2019 9:07:41 AM 10560372 | HSEvent VALUE_CHANGE| 1024| | 40.7| 40.8| 2354
                  3/1/2019 9:07:42 AM 10560624 | HSEvent VALUE_CHANGE| 1024| | 40.7| 40.8| 2355
                  3/1/2019 9:07:42 AM 10560825 | HSEvent VALUE_CHANGE| 1024| | -0.04| -0.01| 2356
                  3/1/2019 9:07:42 AM 10561086 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q108| 117.08| 117.06| 2310
                  3/1/2019 9:07:42 AM 10561090 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q105| 15.757| 15.719| 2307
                  3/1/2019 9:07:47 AM 10565880 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q108| 117.2| 117.08| 2310
                  3/1/2019 9:07:47 AM 10565885 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q105| 15.717| 15.757| 2307
                  3/1/2019 9:07:48 AM 10567150 | Installing CA Cert - C:\Misc\IOT\certs\IOTCA.crt
                  3/1/2019 9:07:48 AM 10567158 | CACert Results = [Version]

                  "Duplicate" devices in HSEvent may be the issue? I see all of these devices handled normally otherwise, it's only when there are dups it restarts.
                  Took a quick look at the code and it looks like it's the HSEventCollection or the DoHSevent proceessing? It's pretty random, so if you don't see anything obvious I can stick some more debugs here and see if I can narrow it down. I still suspect it's occurring with non-SSL as well, there's just an HS3 log entry for SSL use that's allowing it to be seen there.

                  Z

                  Here's a good log:
                  3/1/2019 9:07:36 AM 10555313 | HSEvent VALUE_CHANGE| 1024| | 40.8| 40.8| 2354
                  3/1/2019 9:07:37 AM 10555714 | HSEvent VALUE_CHANGE| 1024| | 40.8| 40.8| 2355
                  3/1/2019 9:07:37 AM 10556014 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q108| 117.06| 117.2| 2310
                  3/1/2019 9:07:37 AM 10556186 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q105| 15.719| 15.742| 2307
                  3/1/2019 9:07:39 AM 10558311 | HSEvent VALUE_CHANGE| 1024| 003D2E88-038-Q80| 118.65| 119.02| 2126
                  3/1/2019 9:07:39 AM 10558491 | HSEvent VALUE_CHANGE| 1024| 003D2E88-038-Q76| 118.653| 119.019| 2122
                  3/1/2019 9:07:41 AM 10559942 | ActoOnMessageForTrigger Topic pressmonitor/heartbeat,Payload=85
                  3/1/2019 9:07:41 AM 10560137 | ActoOnMessageForTrigger Topic pressmonitor/rssi,Payload=-65
                  3/1/2019 9:07:41 AM 10560137 | Update Accepted 2354 to 40.68 StatusType=2
                  3/1/2019 9:07:41 AM 10560139 | Updating Device from 40.76 PayloadNumeric=True
                  3/1/2019 9:07:41 AM 10560140 | ActoOnMessageForTrigger Topic pressmonitor/pressure/IN,Payload=40.68
                  3/1/2019 9:07:41 AM 10560140 | Update Accepted 2355 to 40.71 StatusType=2
                  3/1/2019 9:07:41 AM 10560142 | Updating Device from 40.76 PayloadNumeric=True
                  3/1/2019 9:07:41 AM 10560143 | ActoOnMessageForTrigger Topic pressmonitor/pressure/OUT,Payload=40.71
                  3/1/2019 9:07:41 AM 10560143 | Update Accepted 2356 to -0.04 StatusType=2
                  3/1/2019 9:07:41 AM 10560144 | Updating Device from -0.01 PayloadNumeric=True
                  3/1/2019 9:07:41 AM 10560145 | ActoOnMessageForTrigger Topic pressmonitor/pressure/filter,Payload=-0.04

                  Comment


                    #10

                    3/1/2019 5:40:43 PM 41341142 | HSEvent VALUE_CHANGE| 1024| | 44.8| 44.6| 2354
                    3/1/2019 5:40:43 PM 41341303 | HSEvent VALUE_CHANGE| 1024| | 44.5| 44.3| 2355
                    3/1/2019 5:40:44 PM 41342336 | HSEvent VALUE_CHANGE| 1024| Arduino, Board:1, Rom:38C6E71541FD60A2| 80.3| 80.2| 1895
                    3/1/2019 5:40:47 PM 41345303 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q108| 116.35| 116.33| 2310
                    3/1/2019 5:40:48 PM 41345577 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q105| 6.076| 6.089| 2307
                    3/1/2019 5:40:48 PM 41345837 | ActoOnMessageForTrigger Topic pressmonitor/heartbeat,Payload=75
                    3/1/2019 5:40:48 PM 41345976 | HSEvent VALUE_CHANGE| 1024| 003D30D7-032-Q102| 8270.568| 8270.545| 2304
                    3/1/2019 5:40:48 PM 41346042 | ActoOnMessageForTrigger Topic pressmonitor/rssi,Payload=-65
                    3/1/2019 5:40:48 PM 41346042 | Update Accepted 2354 to 44.49 StatusType=2
                    3/1/2019 5:40:48 PM 41346044 | Updating Device from 44.77 PayloadNumeric=True
                    3/1/2019 5:40:48 PM 41346046 | ActoOnMessageForTrigger Topic pressmonitor/pressure/IN,Payload=44.49
                    3/1/2019 5:40:48 PM 41346046 | Update Accepted 2355 to 44.32 StatusType=2
                    3/1/2019 5:40:48 PM 41346047 | Updating Device from 44.51 PayloadNumeric=True
                    3/1/2019 5:40:48 PM 41346049 | ActoOnMessageForTrigger Topic pressmonitor/pressure/OUT,Payload=44.32
                    3/1/2019 5:40:48 PM 41346049 | Update Accepted 2356 to 0.17 StatusType=2
                    3/1/2019 5:40:48 PM 41346050 | Updating Device from 0.26 PayloadNumeric=True
                    3/1/2019 5:40:48 PM 41346051 | ActoOnMessageForTrigger Topic pressmonitor/pressure/filter,Payload= 0.17

                    Well it was a good theory. Too bad it wasn't correct

                    3/1/2019 5:40:48 PM 41346153 | HSEvent VALUE_CHANGE| 1024| | 44.5| 44.8| 2354
                    3/1/2019 5:40:48 PM 41346314 | HSEvent VALUE_CHANGE| 1024| | 44.3| 44.5| 2355
                    3/1/2019 5:40:49 PM 41346476 | HSEvent VALUE_CHANGE| 1024| | 0.17| 0.26| 2356
                    3/1/2019 5:40:50 PM 41347790 | HSEvent VALUE_CHANGE| 1024| 003D2E88-038-Q80| 117.42| 117.35| 2126
                    3/1/2019 5:40:50 PM 41347972 | HSEvent VALUE_CHANGE| 1024| 003D2E88-038-Q76| 117.417| 117.349| 2122
                    3/1/2019 5:40:52 PM 41349780 | Installing CA Cert - C:\Misc\IOT\certs\IOTCA.crt
                    3/1/2019 5:40:52 PM 41349789 | CACert Results = [Version]
                    V3

                    Comment


                      #11
                      Perhaps running from Visual Studio. If you have M2Mqtt loaded then may be able to set a breakpoint if there is an event with socket disconnected. I know when I first tried to do the monitoring for lost connection I tried to use events callbacks but had no luck and that is why I went to polling approach. I never looked into M2Mqtt why the event callback was not working for me.

                      I just got back home and will be updating my install and I can assess if the problem occurs for me. I don't use cert, but still can detect a re-connection. I run production on Odroid C1/Debian Stretch. I have a W7 test machine too.

                      Comment


                        #12
                        Another thing I noticed when I was doing the stress/performance testing with 3.5 is that the QOS at the highest level caused something to choke. I think it was my test program, but dont remember for certain. It may be another consideration when trying to isolate.

                        Comment


                          #13
                          Originally posted by Michael McSharry View Post
                          Perhaps running from Visual Studio. If you have M2Mqtt loaded then may be able to set a breakpoint if there is an event with socket disconnected. I know when I first tried to do the monitoring for lost connection I tried to use events callbacks but had no luck and that is why I went to polling approach. I never looked into M2Mqtt why the event callback was not working for me.

                          I just got back home and will be updating my install and I can assess if the problem occurs for me. I don't use cert, but still can detect a re-connection. I run production on Odroid C1/Debian Stretch. I have a W7 test machine too.
                          It's restarting within StartMQTTthread with the Client object intact, but disconnected. So somewhere the Client is being disconnected. I lowered the KeepAlive to 30 to see if that changes anything but it's talking to the Broker much quicker than the timeout already. Worse case I'll see if I can get the ConnectionLost Callback installed.
                          MQTT really doesn't divulge much (if anything) in the logs when it disconnects, so it's tough to find why.

                          Z

                          Comment


                            #14
                            I have been running with latest version for about a day now and the Statistics tab shows that mcsMQTT has been online/connected for the full duration.

                            Another idea is to use wireshark to monitor the traffic with mcsMQTT or the broker to see what is actually happening with the network traffic at the time the disconnect occurs.

                            Comment


                              #15
                              Originally posted by Michael McSharry View Post
                              I have been running with latest version for about a day now and the Statistics tab shows that mcsMQTT has been online/connected for the full duration.

                              Another idea is to use wireshark to monitor the traffic with mcsMQTT or the broker to see what is actually happening with the network traffic at the time the disconnect occurs.
                              I'm starting to think this might be a red herring caused by my Broker or my network. I've not seen a drop since I restarted the Broker last night, but it's also not unheard of to not have a drop for over 24 hrs. On the otherhand, I never saw the disconnects with the previous versions so no telling..

                              Since it's very difficult to determine where the MQTT link disconnects, there's a 50/50 chance it could on either side. Curiously none of my other devices shows a disconnect on the Broker log (now running in debug mode), so I'm just not sure at this point.

                              With both sides now running debug logs, hopefully something will show up. Worse case, it's not the end of the world for the PI to restart on loss of connection, as that's by design already.

                              Thanks,
                              Z

                              Comment

                              Working...
                              X