Announcement

Collapse
No announcement yet.

What version of .NET are you building with

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What version of .NET are you building with

    Spud, I tried to send you a private message but never heard back and now can't find it, so wondering if the portal ate it. Can you confirm which version of the .NET framework you are building the Autelis plugin against? I am running into constant errors with it, and we've learned that .NET 4.6 (or later) is not fully interop compatible with .NET 4 on Mono (Windows works fine). Plugins built with 4.5.2 or lower seem fine, so I wanted to see if you are indeed on an higher build, and if it would be possible to do a 4.5.2 build

    I had built my plugins on 4.7 but luckily was not using any libs that required anything more than 4.5.2 so recompiling fixed the issues for me, and a number of other authors did the same. Wanted to see on yours since I definitely see errors with it.
    Aug-29 9:48:51 AM Autelis ERROR FindDeviceByID: Object reference not set to an instance of an object
    Aug-29 9:46:45 AM Autelis ERROR FindDeviceByID: Object reference not set to an instance of an object
    Aug-29 9:44:47 AM Autelis ERROR FindDeviceByID: Object reference not set to an instance of an object
    Aug-29 9:43:46 AM Autelis INFO CMD SENT: http://10.1.0.249:80/status.xml?
    Aug-29 9:43:46 AM Autelis INFO CMD SENT: http://10.1.0.249:80/lights.xml?
    Aug-29 9:43:44 AM Autelis INFO CMD SENT: http://10.1.0.249:80/names.xml?
    Aug-29 9:43:43 AM Autelis INFO Autelis version 3.0.0.28

  • #2
    The Autelis plugin targets .NET framework 4

    I did answer your visitor message
    https://forums.homeseer.com/member/7...ge-from-bsobel

    Comment


    • #3
      Spud, sorry somehow I completely missed that, and couldn't figure out how to get back to where I sent it.

      Could I ask you to consider a change then, in your code I believe internally that FindDeviceById is calling GetDeviceEnumerator() or another HS API to do a lookup. In that code would you consider doing the below where your current call is? The second call, if it fails would be caught by your existing catch block that logs, but I believe the second call will work so the issue will be masked.

      try
      {
      HS.Call()
      }
      catch (NullPointerException npe)
      {
      HS.Call()
      }

      What we are seeing on some systems on Mono is the first call to a HS API on a secondary thread, the first call fails. The remoting sets up the TCP channel but the first request fails, I still don't have root cause. It is MUCH more pronounced during startup and Rich has been doing some startup changes with me to test this (I suspect that since his STA thread wasn't pumping messages during startup, a bunch of async calls are getting queued and timing out, leading to the issue). Usually the same thread works after the first success (so guessing/hoping in the above example you fire of a new thread every minute or so to do work, and that's why we see the same error every minute). On 4.6 and above it seems to get the NPE on more than just the first call, but with the version you are on, it seems to be just the first call on a new thread.

      It is ALWAYS a null pointer exception and always occurs in the depths of mono remoting, I've modified my plugin to retry and a few others (basically as I see the issue on my system I've been reaching out). Ideally Rich and I will get to root cause, but this so far has taken a year to get as close/far as we turn out to be at this stage. Im confident any really fix will be in the HS code not the plugins, but since you are likely to have to support back builds for some time the above try/catch/recall at least once should mask the issue for myself and other users who may or may not even know they are running into it (its not consistent, its been a FUN bug to track down...)

      Thank you for considering, and happy to beta if you are willing to trust me and make the change.

      Best,
      Bill

      Comment


      • #4
        Spud,

        More details on what is going on. These details are likely to change as I am working on code changes for HS to address this, however we all (plugin developers) have a backwards compatibility issue we need to deal with:

        Every plugin talks to HomeSeer over a RPC framework. That RPC framework has one queue and one thread per plugin on the server side. So what this means is if say you start a background thread and call GetDeviceEnumerator() it gets sent to the server and dispatched. Now say you receive an HSEvent notification you care about, or your thread monitoring equipment notices a change and you call SetDeviceString() (really any HS.* api). That call even though its on its own thread in your plugin winds up in the shared queue on the server and won't be processed until its the top of the queue. So in this example it doesn't execute until GetDeviceEnumerator() completes. I have the HSCF code and am changing this to a thread pool to resolve this, but this is how HS is working today and I think we all coded assuming that our individual plugin threads were indeed not being serialized in this way.

        I desperately need you to change the Autelis plugin, as here is what is occurring: The server sets an 8 minute timeout on API's, but the plugin defaults to 60 seconds. On my system GetDeviceEnumerator() often takes 45-75 seconds. In the >60 second case the call is aborted and winds up throwing a null pointer exception. The plugin needs to retry when you see that exception (as the call time is variable), perhaps even simply shutting down if it can never get one after a reasonable number of tries (my plugins do 20 tries, with a few second sleep between). Today when you receive the NullPointerException you move on and recreate your devices (you treat the NPE exception the same as an empty enumerator).

        My system currently has about 650 extra Autelis plugin devices due to this (accounting for 19% of my total devices). It's a circle of death as each new device caused GetDeviceEnumerator() to take a bit longer. So as you see the exception occasionally before over time its becoming more and more often, resulting in more devices, resulting in it being more often (rinse/repeat...)

        This is nothing you did wrong in your plugin, the contract with the HS API isn't working as we thought. But until everyone is on a fixed version, we will all need to be more defensive in our coding.

        Yo may recall in the past I had the similar issue, its taken until now to really get HS to dig in to the level that we can fully explain what is occurring.

        Can you kindly get me a build with those changes to try?

        Best
        Bill

        Comment


        • #5
          thanks for the detailed explanation. It's weird that it only happens on Linux though.

          you can test beta version 3.0.0.29

          here is the code I used:

          Code:
                  public clsDeviceEnumeration SafeGetDeviceEnumerator()
                  {
                      clsDeviceEnumeration de = null;
                      int retry = 0;
          
                      while(de == null && retry < 5)
                      {
                          try
                          {
                              de = (clsDeviceEnumeration)m_hs.GetDeviceEnumerator();
                          }
                          catch (NullReferenceException)
                          {
                              Thread.Sleep(500);
                              if (retry >= 4)
                                  throw new HomeSeerRemoteCallException("GetDeviceEnumerator failed after 5 retries");
                          }
                          retry++;
                      }
          
                      return de;
                  }
          and when HomeSeerRemoteCallException is catched the devices won't be recreated.

          Comment


          • #6
            Originally posted by spud View Post
            thanks for the detailed explanation. It's weird that it only happens on Linux though.

            you can test beta version 3.0.0.29

            here is the code I used:

            Code:
             public clsDeviceEnumeration SafeGetDeviceEnumerator()
            {
            clsDeviceEnumeration de = null;
            int retry = 0;
            
            while(de == null && retry < 5)
            {
            try
            {
            de = (clsDeviceEnumeration)m_hs.GetDeviceEnumerator();
            }
            catch (NullReferenceException)
            {
            Thread.Sleep(500);
            if (retry >= 4)
            throw new HomeSeerRemoteCallException("GetDeviceEnumerator failed after 5 retries");
            }
            retry++;
            }
            
            return de;
            }
            and when HomeSeerRemoteCallException is catched the devices won't be recreated.
            Hi, and thank you! After further digging, the reason I am seeing it on Linux is simply the SEL box is a 1.5ghz 2 core box and a single plugin takes about 40 seconds to get a device enumerator. When multiple plugins are running the serialization and deserialization threads starve and the time going through the roof and well past 60 seconds. Im sure the same issue exists on Windows, those just have traditionally been beefier boxes.

            I am about done with the HCSF changes and will send them to Rich, it will extend the life of the SEL and similar boxes. I just think at about now 2700 devices I'm a bigger customer than its designed for. I think around 1000 devices it would be fine, and the system feels MUCH more responsive after the changes I did. Realized today the return path was also effectively being serialized. With this new version I've moved the deserialization out of the communication thread, so no more random 60 second delays wondering why a plugin seems unresponsive when its simply blocked on another of its own threads thats in either serialization or deserialization.

            In the above code, I would recommend raising the count of retries and delaying about a minute between them. When it times out the old call still winds up running, so retiring immediately adds even more load... And thanks for fixing the device creation case, much much appreciated!

            Comment


            • #7
              Spud, an update. I changed my overall recommendation to:

              Your GetDeviceEnumerator call should loop until it completes or ShutdownIO is called. If it fails, wait randomly 30s to 8 minutes to try again. On lower power machines trying again right away actually increases load making the problem worse. For example right now on my SEL box with a few plugins running its taking 85 seconds to deserialize the data. I also moved the deserialization into the callers thread, which is after the timeout. So now the timeout is JUST the server communications (and the server doing the serialization). This should resolve the issue, but you should ideally cache results and greatly limit the number of calls to GDE you make overall (you being plugin developers in this context).

              Comment


              • #8
                Spud, one last request (low priority bug). In GetSafeDeviceEnumerator, can you check for a flag and exit your loop if ShutdownIO has been called? Not a huge one but it causes you to throw exceptions if the system is shutdown in the first few minutes after startup and its not completed yet.

                Comment


                • #9
                  Please test version 3.0.0.30. I added a few optimizations as per the timings you sent.

                  Comment


                  • #10
                    Originally posted by spud View Post
                    Please test version 3.0.0.30. I added a few optimizations as per the timings you sent.
                    Installed, initial reaction is tremendous improvement. I will watch it over the next few days, but so far CPU usage is night and day better. Thank you!

                    Comment


                    • #11
                      Originally posted by bsobel View Post

                      Installed, initial reaction is tremendous improvement. I will watch it over the next few days, but so far CPU usage is night and day better. Thank you!
                      Ok after letting it run over night I am hoping you can consider one more change. Its to cache the device #s you get back after your call to GetDeviceEnumerator() since startup 15 hours ago its been called 242 times and spend 108 cpu minutes in that call. With the fixed HSCF Im running GetDeviceEnumerator() no longer fails, so now that its always running to completion I have better data on just how much CPU that entails. It is by far the most expensive thing you do when interacting with HS. The only other high number is SetDeviceByRef() and you kind of need to do that one when you have data ,)

                      Comment

                      Working...
                      X