Announcement

Collapse
No announcement yet.

Excessive CPU and Excessive MQTT Topics

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #91
    Michael,
    I was just going to stay at 3.2.5.2 and deal with the QOS Exactly issue but all associations stopped displaying in the associations tab, it was completely blank. Restarted the browser, cleared cache, tried a different browser.

    Interestingly, I do see them listed in the filter drop downs and it is received as an unaccepted device message on the statistics page.

    I noticed this started with 3.2.5.2 and is also in 3.2.5.3. I did also try completely blowing out all of the mcsMQTT databases, didn't help.

    Update: Upon restarting HomeSeer all statistics are lost and the abovementioned filters are gone. No longer can I see the devices in the filter DDLs. Not sure if this is related, thought I would mention it.
    Attached Files

    Comment


      #92
      What I see in the debug is normal startup and over the period of about 8 minutes there were 8 browser windows that were open and closed. The debug flag does not appear to be set because no very much in the debug file.

      The database looks normal with 300 Devices and 6 Topics. Nothing has been Accepted or manually defined.

      What I have learned today in my exploration is that the problems with the Association table come from two sources. One is that if nothing is every Accepted then the Association table will display no rows. This is fixed in 3.2.6.0.

      The other is that the Chrome (only one I tried) Browser is the holdup on display of large number of rows in the Association table. I was testing with about 5000 rows and the plugin builds the string and returns for an AJAX update of the page. Within 2 or 3 seconds the plugin was done, but it was minutes later before the browser actually showed the table. It often occurred that the browser just gave up and provided a something wrong message.

      I did the History table as new browser pages because I was concerned with the AJAX time vs just building a new page and I also liked the idea of having multiple pages showing different slices of the history data. I could go the same way with the Association table, but I think prudent use with the filters should be fine.

      To address the user feedback I removed the Show button and replaced it with a message as to how big the table will be. This lets the user know the button press was accepted and gives some idea as to how long it will take. On my Odroid C1 plugin and Chrome running on Intel I7 I get about 300 rows in 10 seconds.

      I did some other things with 3.2.6.0 to better handle concurrent threads. When I was processing thousands of received topics and showing the browser page the filter updates would sometimes happen in the middle of adding a new topic to a dictionary. It meant that that iteration of the filter construction would error-out and manual use of the checkboxes could be needed to get the filters correct.

      Comment


        #93
        Updated to 3.2.6.0 and now receiving the following message just below the Show Selected Associations on the Associations tab. I'll delete the DB files and test again with an update this post shortly. Nothing was received in the HS logs.

        Error Building Table Object reference not set to an instance of an object

        Update: Deleting the DB did not resolve the issue. Attached logs.
        Attached Files

        Comment


          #94
          I liked the old days before Mono where line numbers could be used to identify the specific problem statement. I will put in some progress markers to help isolate. The DB was only an issue in the transition from 3.1.x.x. Your prior one was clean and can continue to be used.

          Comment


            #95
            I put the additional error isolation in 3.2.6.1 in the updater.

            Comment


              #96
              This time the PI actually crashed.


              Apr-01 23:48:03 Starting Plug-In Plugin mcsMQTT started successfully in 520 milliseconds
              Apr-01 23:48:03 mcsMQTT Version 3.2.6.1 Registered with Homeseer
              Apr-01 23:48:03 Starting Plug-In Initializing plugin mcsMQTT ...
              Apr-01 23:48:03 Info Plugin mcsMQTT has connected. IP:127.0.0.1:35916
              Apr-01 23:48:02 Warning I/O interface mcsMQTT is down, executable is not running, restarting ...
              Apr-01 23:47:59 Info Plugin mcsMQTT with instance: has disconnected
              Apr-01 23:47:02 Starting Plug-In Plugin mcsMQTT started successfully in 426 milliseconds
              Apr-01 23:47:02 mcsMQTT Version 3.2.6.1 Registered with Homeseer
              Apr-01 23:47:02 Starting Plug-In Initializing plugin mcsMQTT ...
              Apr-01 23:47:02 Info Plugin mcsMQTT has connected. IP:127.0.0.1:35840
              Apr-01 23:47:01 Plug-In Finished initializing plug-in mcsMQTT
              Attached Files

              Comment


                #97
                The main loop in the plugin is like the one provided in the SDK with the snipit shown below. The plugin establishes a socket connection to HS and then sits in a tight wait loop monitoring the connection. If the connection is lost the plugin will command the ShutdownIO call to do a clean shutdown and then end. HS would then see the plugin disconnected at its next API call which in these two cases were Name and InterfaceStatus and recognize the lost connection and try to start it up again.
                Code:
                client = ScsServiceClientBuilder.CreateClient(Of IHSApplication)(New ScsTcpEndPoint(sIp, 10400), oPlugin)
                client.Connect()
                Do
                     Threading.Thread.Sleep(10)
                Loop While client.CommunicationState = HSCF.Communication.Scs.Communication.CommunicationStates.Connected And Not gShutdown
                So the question becomes why does the ScsServiceClientBuilder class think the socket connection has disconnected?

                An observation is that the ShutdownIO is always 4 or 5 milliseconds after the HS Name/InterfaceStatus call. It is as if the HS call is the event that caused the connection status to go to disconnected. No particular implications, but just an observation.

                What is the environment you are running HS and the plugin?

                As you can see from the snipit there is no tolerance for a disconnect status. There is no attempt to see if it persists for more than 10ms. There is no attempt by the plugin to reconnect once lost. We can try some of this, but have no idea what the implications may be as this is the template that was provided by HST.

                Code:
                4/1/2018 11:48:03 PM	2	| mcsMQTT Running at /usr/local/HomeSeer, HS is at /usr/local/HomeSeer 
                4/1/2018 11:48:03 PM	19	| mcsMQTT InitHW ComputerName= hometrollerSEL, IOEnabled=False 
                4/1/2018 11:48:03 PM	214	| MakeTable MQTT_MESSAGE Exists=True  
                4/1/2018 11:48:03 PM	215	| MakeTable MQTT_HISTORY Exists=True  
                4/1/2018 11:48:03 PM	220	| mcsMQTT Debug InitHW Database Ready 
                4/1/2018 11:48:03 PM	224	| mcsMQTT Debug Receive Ready 
                4/1/2018 11:48:03 PM	228	| mcsMQTT Debug Trigger Ready 
                4/1/2018 11:48:03 PM	322	| Spawning MQTT Threads  
                4/1/2018 11:48:03 PM	332	| mcsMQTT Debug MQTT Ready 
                4/1/2018 11:48:03 PM	332	| HW Init Complete  
                4/1/2018 11:48:03 PM	333	| MQTT Thread Started with broker 192.168.8.2, Shutdown=False, Disconnect=False, Client=False, Connected=False  
                4/1/2018 11:48:03 PM	333	| MQTT Thread Not Connected Yet  
                4/1/2018 11:48:03 PM	335	| Background Init Started  
                4/1/2018 11:48:03 PM	335	| Background Init Received  
                4/1/2018 11:48:03 PM	336	| Background Send  
                4/1/2018 11:48:03 PM	340	| MQTT Thread Client Created  
                4/1/2018 11:48:03 PM	341	| MQTT Thread Client ID=f909b4bb-db5c-4763-869f-579068473d22  
                4/1/2018 11:48:03 PM	347	| Background Init Filters - Background Complete  
                4/1/2018 11:48:03 PM	353	| HS Request Name  
                4/1/2018 11:48:03 PM	362	| MQTT Thread Broker 192.168.8.2 Connect Response=0  
                4/1/2018 11:48:03 PM	362	| MQTT Broker Connection Accepted, Connected=True  
                4/1/2018 11:48:03 PM	365	| MQTT Thread Subscribing   
                4/1/2018 11:48:03 PM	368	| MQTT Subscription Start  
                4/1/2018 11:48:03 PM	368	| MQTT Subscription Topics being selected  
                4/1/2018 11:48:03 PM	368	| MQTT Subscription Topics selected  
                4/1/2018 11:48:03 PM	370	| MQTTSubscribe List /  NoDiscovery=False to   
                4/1/2018 11:48:03 PM	370	| #  
                4/1/2018 11:48:16 PM	13545	| HS Request InterfaceStatus  
                4/1/2018 11:48:18 PM	15547	| HS Request InterfaceStatus  
                4/1/2018 11:48:19 PM	16697	| HS Request Name  
                4/1/2018 11:48:19 PM	16701	| Shutdown IO

                Comment


                  #98
                  The attached has no monitoring of the connection. What this means is that multiple instances of mcsMQTT may be spawned by HS. It is primarily to assess the behavior without monitoring the socket connection status.
                  Attached Files

                  Comment


                    #99
                    Replaced the exe with the file provided and tested. Clicked the show selected associations button once, button was repalaced with "please wait" and I immediately pulled the debug logs. The exact same outcome as last night and again, HS restarted the plugin. Appears to still be something incorrect in the function associated with the show selected associations button.

                    Attached the debug, I'm guessing this is all that's needed?


                    Apr-02 07:36:58 Starting Plug-In Plugin mcsMQTT started successfully in 519 milliseconds
                    Apr-02 07:36:58 mcsMQTT Version 3.2.6.2 Registered with Homeseer
                    Apr-02 07:36:58 Starting Plug-In Initializing plugin mcsMQTT ...
                    Apr-02 07:36:58 Info Plugin mcsMQTT has connected. IP:127.0.0.1:42100
                    Apr-02 07:36:57 Warning I/O interface mcsMQTT is down, executable is not running, restarting ...
                    Apr-02 07:36:34 Info Plugin mcsMQTT with instance: has disconnected
                    Apr-02 07:35:57 Starting Plug-In Plugin mcsMQTT started successfully in 541 milliseconds
                    Apr-02 07:35:57 mcsMQTT Version 3.2.6.2 Registered with Homeseer
                    Apr-02 07:35:57 Starting Plug-In Initializing plugin mcsMQTT ...
                    Apr-02 07:35:57 Info Plugin mcsMQTT has connected. IP:127.0.0.1:42022
                    Apr-02 07:35:56 Plug-In Finished initializing plug-in mcsMQTT
                    Attached Files

                    Comment


                      This log confirms that when connection status is reported as disconnected it is actually no longer connected to HS. When HS makes its next API call to the plugin it detects the failure and later restarts it. Not certain if HS is detecting the lost connection or detecting that the mcsMQTT.exe process is no longer running after the plugin terminates itself.

                      I tried the opposite approach of inducing an exit from the connection monitoring loop but, instead of shutting down the plugin, looped back to try to reestablish a connection. While there was success in getting a new connection, HS recognized the disconnect and removed mcsMQTT as a running plugin so when the URL request to the mcsMQTT%20Setup was done the request was not forwarded by HS to mcsMQTT. To get HS to recognize the plugin again I had to kill the mcsMQTT.exe process. HS then restarted it.

                      I normally do my evaluations with a browser that is running on a non-HS computer. During this evaluation I used Chrome on the same computer. I had one case of the feedback message "Please Wait" that did not get restored back to the button. What this means is that the AJAX connection which is providing a 2 second timer callback from the browser to refresh screen fields had stopped. This is the same behavior I saw when a large table was being rendered where the timer stopped until the browser had completely rendered the table. The plugin had not disconnected from HS as I could request a new browser page with mcsMQTT URL. It just showed that there is something happening in the AJAX Browser-HS connection that is not as robust as it could be.

                      I really do not know what the next step should be for your environment. The data is pretty conclusive that the socket connection established using HSCF.Communication.ScsServices.Client.IScsServiceClient(Of IHSApplication) is disconnecting. Your prior use of the Disconnect flag shows that the connection lost is associated with other socket activity established by mcsMQTT using the M2MMqtt.Net.dll. The problem seems to be pretty deep in the bowels of the socket handling within the OS and how these two packages interact with it.

                      I do not have a setup where Mosquitto and HS are on the same computer so all my testing involves at least two and usually three computers plus the other MQTT clients on the network.

                      Comment


                        Not sure I see how having the broker on the same machine as HS could be the issue or how my setup will be different from the vast majority of HS users that have their system running on a single raspberry pi. I do have a pi that is used for homebridge and node red. Mosquitto can be moved there but I don't see how this could resolve the issue. it worked before requiring the changes related to the Show Associations button

                        My setup consists of a HomeTroller SEL (2 core Celeron) running HS and mosquitto on ethernet and my MacBook Pro running safari on 5ghz wifi. Like I mentioned I also tested with Crome on this laptop. I also tried the show associated devices button on a windows machine using IE and Edge, same result.

                        At this point, I don't have any devices in your database so it shouldn't take long to return the values, maybe a few ms. Not sure how a delay could cause this, CPU usage is 8% on average and there are 12 other plugins working flawlessly. Plus the HS machine is proactively rebooted weekly.

                        Current Date/Time: 4/2/2018 3:13:55 PM
                        HomeSeer Version: HS3 Pro Edition 3.0.0.423
                        Linux version: Linux hometrollerSEL 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux System Uptime: 0 Days 15 Hours 18 Minutes 30 Seconds
                        IP Address: 192.168.8.2
                        Number of Devices: 307
                        Number of Events: 145
                        Available Threads: 199
                        HSTouch Enabled: True
                        Event Threads: 1
                        Event Trigger Eval Queue: 0
                        Event Trigger Priority Eval Queue: 0
                        Device Exec Queue: 0
                        HSTouch Event Queue: 0
                        Email Send Queue: 0
                        Anti Virus Installed:

                        Enabled Plug-Ins
                        3.3.0.0: APCUPSD
                        3.1.1.24385: Blue-Iris
                        1.2.2.0: Device History
                        0.0.0.21: DoorBird
                        3.0.0.43: EasyTrigger
                        3.0.0.22: Ecobee
                        3.2.6.1: mcsMQTT
                        3.0.2.0: MeiHarmonyHub
                        0.0.0.38: Pushover 3P
                        30.0.0.36: RFXCOM
                        3.0.5.6: SDJ-Health
                        3.0.6542.35207: UltraWeatherWU3
                        3.0.1.199: Z-Wave

                        Have your dlls changed? I notice that they are not overwritten on an upgrade. Anything else that changed?

                        At this point, I'm probably going to go to a previous version that I know worked. I appreciate you taking a look at this but I'm not sure what changed and everything else on my machine is working fine.

                        PS Would it make sense to have the databases and debug file in the /Data/mcsMQTT folder?

                        Update: disabled all other plugins, deleted all db's and config's related to the PI and tested, same thing.

                        Update 2: Deleted the PI and reinstalled, same thing. Also noticed the debug file is missing even though the option is enabled.
                        Last edited by mwolter; April 2, 2018, 05:46 PM.

                        Comment


                          Oh no, I have the broker set up on the same machine as HS also.
                          RJ_Make On YouTube

                          Comment


                            Have your dlls changed? I notice that they are not overwritten on an upgrade. Anything else that changed?

                            At this point, I'm probably going to go to a previous version that I know worked. I appreciate you taking a look at this but I'm not sure what changed and everything else on my machine is working fine.

                            PS Would it make sense to have the databases and debug file in the /Data/mcsMQTT folder?

                            Update: disabled all other plugins, deleted all db's and config's related to the PI and tested, same thing.

                            Update 2: Deleted the PI and reinstalled, same thing. Also noticed the debug file is missing even though the option is enabled.
                            Dlls have not changed

                            I put debug output earlier around the initial connection logic. It turns out it was before the slash (/ vs. \) was defined so when the debug was created the slash between Data the mcsMQTT was missing. I broke it very recently with the additional debug code and fixed this in 6.2.7.1. I also created the \Data\csMQTT folder and move the data files to there.

                            Initially the problem was with the plugin disconnection immediately after startup. You were able to work around it by changing the QOS. Is the problem now only when you press the Show button for Association table or do you disconnect even without using this button?

                            FYI, the HS devices are enumerated so they exist when you try to render the Association table. This will be 300 or so rows. Unchecking the Include non-Plugin items from the Association tab would eliminate them, or using a filter to reduce the set.

                            Comment


                              I took a look at the Device Management page that HST produces and see that the same technique of a DIV around the entire table is used and when a new device is created the entire table is rebuild within the DIV. It seems, however, to be snappier than rendering an Association table of similar size. It would be nice to have visibility at the server level where the actual AJAX communications are occurring.

                              From my vantage point with VS debugger I see a large string being returned via the HS API DivToUpdate.Add and then the two second timer (which I think is implemented in javascript in the browser) that normally returns control back to the plugin every two seconds stops. With large tables Chrome will give up indicating something is wrong. On more reasonable size ones the two second heartbeat will resume when the table is finally rendered.

                              These tables are not that large that it should be such an issue. I had originally thought the holdup was how long it was taking the plugin to construct the HTML to provide to the HS server, but found it to be rather short after I implemented the feedback approach with the button.

                              There could be a correlation with HS server being busing trying to serve the HTML for the table update and does not maintain the plugin connection as needed and it disconnects which then results in the plugin shutting down.

                              It would be nice to have a dialog with HST, but no response on the ticket I submitted.

                              Comment


                                Last edited by mwolter; April 3, 2018, 02:06 AM.

                                Comment

                                Working...
                                X