Announcement

Collapse
No announcement yet.

Plugin won't initialize at startup

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • w.vuyk
    replied
    Ok, had finished the check on the enumarator as you said earlier, but will also put the repeat in the HS init part, which adds the webages.

    Wim

    Leave a comment:


  • bsobel
    replied
    Originally posted by w.vuyk View Post
    Ok, was hoping the energy database would hold up process as well for you, would have been a lucky shot ....

    I will update the plugin with a 10 time retry on the device enumurators, but I do hope there are no other points throwing an issue as I also see a error registering webpages.
    If the plugin has to loop, it will log how may it had to loop. If it exceeded 10, I guess the plugin better quits, so it will not create more issues ok?
    Really any call that sets up remoting on a secondary thread will initially see this issue, so if the webpage registration happens on a different thread than the device enumerator you may need to do that there as well. One option (I haven't tested this, but from what we know this should work) would be to put a call to something like HS.ApiVersion or another side effect free call as the entry point of any background threads (along with a try/catch to eat the potential NPE). After that first failed call the rest should work.

    I went the retry route, and like you suggested, I log and exit if I can't get the enumerator in 10 attempts. On 4.5.2 I always get it by the second attempt, on when earlier built with 4.6+ it sometimes failed up to 10 times (but sometimes didn't!). Moving back to 4.5.2 fixed that issue, leaving the single retry remaining with 100% success.

    Leave a comment:


  • w.vuyk
    replied
    Ok, was hoping the energy database would hold up process as well for you, would have been a lucky shot ....

    I will update the plugin with a 10 time retry on the device enumurators, but I do hope there are no other points throwing an issue as I also see a error registering webpages.
    If the plugin has to loop, it will log how may it had to loop. If it exceeded 10, I guess the plugin better quits, so it will not create more issues ok?

    Leave a comment:


  • bsobel
    replied
    Thanks for the hint but its at 35mb, I had reset it once before trying to narrow this down, but that version was 36.4mb. So looks like I haven't seen that unconstrained growth. Right now I suspect one startup culprit is the parent child relationship checking, I have about 2500 devices so it takes awhile. I get the events dropped log entry immediately after it runs and plugins starting around that time often run into the error Im chasing...

    Leave a comment:


  • w.vuyk
    replied
    Originally posted by bsobel View Post
    I think this is another case of one of your uses (also named Bill) running into the bug; https://forums.homeseer.com/forum/li...lugin-restarts Notice with config changes, it went away, its part of the 'fun' transient nature of this one (like you, I much prefer 100% reproducible bugs!)

    Edit; I am more and more convinced that HS is doing some long running operation on its primary thread and is not pumping windows messages during those operations. Async callbacks rely on the message queue to get dispatched. I just sent Rich a long email on this and asked him to audit the startup code with this in mind, Ill let you know what he comes back with. If thats indeed the root cause, we can fix this by ensuring HS itself is pumping messages correctly, but will remain a legacy bug for anyone before whatever version includes that fix (meaning the retry, if you do agree to do it, won't be completely wasted work)
    Bill,

    Your edit makes me have a ohhhhhh!!!!! wave here!
    On Windows I had a growing irriration myself for a few years already. The startup of HomeSeer was more or less random in duration. At certain times I had to wait for 15 minutes before HS3 would finish initialising plugins. Sometimes plugins would timeout even before it would start. At other times it would take a minute or two, but HS would start at reasonable speed.

    So almost three weeks ago I got hit by the name change of Philips and my lights started to be not controllable(how conveniant at 10:15 PM of course.....). The issue that is now solved in the beta....
    But I was analysing what really happend and wanted to have a clean start of HS. But after half an hour, HS did not start, rebooted the machine, kicked it and shook it, but HS would just stick at initialising and time outing plugins. Pulled my hair there...

    So I moved all files to another computer and again tried to restart it, with the same results - it just would not start. On this second pc I checked for large files and found one above 1 GB... a log file, removed it and as expected without result. The second large file was my energy database. It holds 5 years of energy usage.I had to clean the database myself as HST leaves all 0 value rows in it, even with compacting. Did add some other compaction actions in it to make the database even smaller.
    Anyway, I removed the database and restarted HS. Within 10 seconds HS started!! This was on the second much more powerfull PC

    Of course did the same action on the production PC - removed the energy database and started Homeseer - fast, within 30 seconds all is buzzing again!
    I restored the energy database after creating a history table that is holding compacted data to one line per day/device from two months back and longer ago. The database is now reduced from 200Mb to 33 Mb and HS still starts fast. AND I still have my history

    I had contact with rjh in the past on this issue, but Rich could not copy the issue and to be honest I did let it rest as I saw not many complaints on the startup time, hoping that in a future release the issue would dissapear. Was planning to mail him, but waited as I wanted to be sure adjusting the Energy db was lasting.

    For you it might be very interesting to test removing the energy database? How large is your energy database?

    Wim

    Leave a comment:


  • Warcorp
    replied
    Disregard, the issue seems resolved in the most recent Beta.

    Leave a comment:


  • Warcorp
    replied
    I too am having this same issue. The bridge is pingable, I can see it's notification webpage yet HS reports it as unreachable. All of my devices then show unreachable yet I can control them via Alexa or the Hue app so I know the bridge is alive and functioning.

    Leave a comment:


  • bsobel
    replied
    I think this is another case of one of your uses (also named Bill) running into the bug; https://forums.homeseer.com/forum/li...lugin-restarts Notice with config changes, it went away, its part of the 'fun' transient nature of this one (like you, I much prefer 100% reproducible bugs!)

    Edit; I am more and more convinced that HS is doing some long running operation on its primary thread and is not pumping windows messages during those operations. Async callbacks rely on the message queue to get dispatched. I just sent Rich a long email on this and asked him to audit the startup code with this in mind, Ill let you know what he comes back with. If thats indeed the root cause, we can fix this by ensuring HS itself is pumping messages correctly, but will remain a legacy bug for anyone before whatever version includes that fix (meaning the retry, if you do agree to do it, won't be completely wasted work)

    Leave a comment:


  • bsobel
    replied
    If you search the forum for: "__icall_wrapper_mono_remoting_wrapper(intptr,intptr)" many of those cases are manifestations of this bug, its just taken a long time to get all the pieces together to figure it out...

    Leave a comment:


  • bsobel
    replied
    Originally posted by w.vuyk View Post
    So far I know you are now the only one reporting this bug on the plugin. There are several other users of the plugin using Mono that did not report (yet) this issue. For where I am standing it could be a specific issue in your installation?
    I am a bit unsure on what you mean by 'remoting'? Is the plugin used on a remote machine for you or are you referring to the plugin being an external ap for HS3? If the plugin is used on a remote machine I might better understand the slowness - although then synchronous starting woult not be needed I guess?
    Wim
    What plugins are effected by the bug varies as since it seems to manifest the most during startup, right now it seems to be hitting you. Im sure you have other users who have hit this without knowing, if you search the forum for that specific mono line at the top of the stack trace you'll see threads on it.

    As for remoting, HomeSeer uses .NET remoting to talk to the plugins and for the plugins to talk back to HS. They used an open source package which is just a wrapper over the dot net framework called SCS (https://github.com/hikalkan/scs). The plugin is running on the same box as HS, but it still uses remoting since its a separate process. its this remoting setup that fails, and like I mentioned when it occurs its the first HS call on an secondary threads (the main thread already has remoting setup, so thats why calls on the main thread don't fail).

    I've been dealing with this darn bug for quite some time, as have a few others users (you'll notice when you search for that mono line). Its frustrating because its nothing in your code, so I realize I'm asking you to work around a bug. My code catches the NPE and retries after a 1 second delay up to 10 times, and then if that fails it throws the original exception. This way if there is a real bug its not masked, but these transient ones are dealt with.

    The 10 times came from when I was building on 4.7 not realizing it made the problem much worse, on 4.5.2 Im consistently seeing the one exception for each new thread (if the thread dies and has to connect again, it can happen again, but seems to depend on if HS is busy). I suspect there is timeout somewhere or a full message queue that needs to be dumped while HS is doing some startup tasks.

    I am continuing to try and find the final root cause on this, as honestly HS doesn't fully work on Mono today IMHO until this is addressed. Even if you made retires optional, I'd really really appreciate a build with it, until root cause (and a fix from mono or HS) its kinda whack-a-mole at startup seeing which plugins failed and which didn't (and since its timing related, that changes as new plugins are added, etc)...

    Leave a comment:


  • w.vuyk
    replied
    So far I know you are now the only one reporting this bug on the plugin. There are several other users of the plugin using Mono that did not report (yet) this issue. For where I am standing it could be a specific issue in your installation?
    I am a bit unsure on what you mean by 'remoting'? Is the plugin used on a remote machine for you or are you referring to the plugin being an external ap for HS3? If the plugin is used on a remote machine I might better understand the slowness - although then synchronous starting woult not be needed I guess?

    Wim

    Leave a comment:


  • bsobel
    replied
    "Not sure here if it is wise to just 'ignore' errors and just bump it again. Does not sound as correct coding to me....."

    I agree, but it's not really ignoring the issue, its a known bug causing the first attempt on a secondary thread to fail. Since we know what it is, at least there is a known way to resolve it. Its always a Null pointer exception in a call via remoting to HS and always from from the first call to remoting on a non-primary thread.

    On the beta, I have not, would you like me to? The .net level change was effecting plugins >= .NET 4.6. This issue occurs regardless of the .NET level do to this remoting bug (the issues on later .NET happen much more often, the symptom is similar. In those versions its like the remoting is torn down more and has to reestablish or something else is apparently retriggering the exact or similar issue, but on 4.5.2 and above its always 'secondary thread, first call'...

    "At the same time Windows is just buzzing around. I am getting more doubts if Linux is the road to go with Mono around as it keeps throwing strange bugs."

    I have the same concerns, part of that is the HS itself is only doing development on Windows. They really need a few folks on Mono full time, IMHO if they are serious about non Windows platforms.

    Leave a comment:


  • w.vuyk
    replied
    Well, that is one of those challences with Mono. Earlier I had a discussion about the plugin raising CPU and memory, a bit later irt was not the plugin anymore, but a bug in a unsupported (by HS) Mono version.... later it was the plugin taking more cpu usage, now it is HS3 not being ready to take calls from plugins. At the same time Windows is just buzzing around. I am getting more doubts if Linux is the road to go with Mono around as it keeps throwing strange bugs.

    Not sure here if it is wise to just 'ignore' errors and just bump it again. Does not sound as correct coding to me.....

    But even if I want to try this test, did you test the latest beta? I compiled it on a different .NET level for you to test, maybe the issue changes too on the beta?

    Thanks,

    Wim

    Leave a comment:


  • bsobel
    replied
    Getting the logs now, on the initial startup you are getting an exception calling GetDeviceEnumerator. We've found on Mono when running on a secondary thread that if the plugin is built on .NET 4.5.2 or less, it often gets an exception on the first attempt (if HS is busy) and works fine on the second. It seems to have to do with the connection to HS for remoting trying to setup while HS doesn't respond promptly. On .NET >= 4.6 I have seen this error much more randomly (fails sometimes, not others, sometimes multiple retries are required). But for your case, I really really suspect if you retry those HS calls at least once after any null pointer error, you will see it start working consistently on Mono.

    I'd probably be much happier if I moved to Windows, but really am working towards root cause on this (I think its an HS/mono issue that plugin authors are running into, as having to retry the HS calls should in an ideal world not be required)...

    Here is the problematic call:

    10:28:31.2259190 -- (HueTools.InitPlugin) Error: :Object reference not set to an instance of an object
    10:28:33.1662960 -- (HueTools.InitPlugin) System.NullReferenceException(0x80004003): Object reference not set to an instance of an object
    at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke (System.Runtime.Remoting.Proxies.RealProxy rp, System.Runtime.Remoting.Messaging.IMessage msg, System.Exception& exc, System.Object[]& out_args) [0x00108] in <15c986724bdc480293909469513cfdb3>:0
    at (wrapper managed-to-native) System.Object.__icall_wrapper_mono_remoting_wrapper(intptr,i ntptr)
    at (wrapper remoting-invoke) HomeSeerAPI.IHSApplication.GetDeviceEnumerator()
    at HSPI_JowiHue.MyBridges.FindBridgesSelf () [0x000be] in <49591be7829f4c96b8b5a8559a6b5db2>:0
    at HSPI_JowiHue.MyBridges.Init () [0x0001b] in <49591be7829f4c96b8b5a8559a6b5db2>:0
    at HSPI_JowiHue.HueTools.InitPlugin () [0x000b3] in <49591be7829f4c96b8b5a8559a6b5db2>:0

    10:28:33.1665650 -- (HueTools.InitPlugin) Debug: Config Done InitOK=False

    I will send the two logs to you right now.

    Leave a comment:


  • bsobel
    replied
    Will do. As to the error, i have been fighting startup errors on Mono the entire time I've been on HS. Rich knows, but hasn't been able to identify a root cause. I see random exceptions on HS calls (which often work when retried, when HS is busy during startup). Likely you are hitting one that if retried would work but isn't retried, at the second startup the plugin doesn't hit that (because HS isnt busy) and life is good.

    Off to do the detailed logging now, thank you.

    Leave a comment:

Working...
X