Announcement

Collapse
No announcement yet.

Jon00 DataScraper Script

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Thanks Jon. I did post the wrong grab. But I was able to go back to the original grab and get everything working for pollen. I will post that shortly as a new message here in case it can help others.

    Comment


    • Using Jon00's Datascraper Script plug in I was able to get pollen values for my zipcode. I'm posting it here in case it helps others. Many pollen sites are not set up to allow datascraping. However I found one that was. Set up is very quick.

      First go to pollen.com and create an account with your email and zip code. Then get a daily email sent to you with the local pollen information. At the top of that email will be a link: "Having problems viewing this email? Click here to view the online version."
      Use the link embedded in "Click here" for your Path. That site allows scraping. Regular pollen.com does not allow scraping.

      Here is my set up which creates 5 devices (today's pollen level, today's high-med-low, today's pollen type, tomorrow's pollen level, tomorrow's high-med-low):

      [Grab2]
      Path=http://www.pollenapps.com/email/aa/default.asp?e=[NOTE: customized for your unique URL...see text above]
      TextFile=1
      Encoding=
      Username=
      Password=
      Options=
      UserAgent=
      Devicemode=2
      StripHTML=0
      UseIE=0
      Delay=10

      Pattern1=(?s)TODAY.*?"http://www.pollenapps.com/email/aa/images/levels/gauge-(.*?).png"
      Pattern2=(?s)TODAY.*?style="font-size:18px;font-family:Georgia !important;font-style:italic;margin:0;padding-top:10px;">(.*?)<
      Pattern3=(?s)Today's Top Allergens.*?icons/type-Tree-sm.png" width="40" height="40" alt="(.*?)"
      Pattern4=(?s)Today's Top Allergens.*?icons/type-Tree-sm.png" width="40" height="40" alt=".*?icons/type-Tree-sm.png" width="40" height="40" alt="(.*?)"
      Pattern5=(?s)Today's Top Allergens.*?icons/type-Tree-sm.png" width="40" height="40" alt=".*?icons/type-Tree-sm.png" width="40" height="40" alt=".*?icons/type-Tree-sm.png" width="40" height="40" alt="(.*?)"
      Pattern6=(?s)tomorrow.*?src="http://www.pollenapps.com/email/aa/images/levels/gauge-(.*?).png
      Pattern7=(?s)tomorrow.*?style="font-size:18px;font-family:Georgia !important;font-style:italic;margin:0;padding-top:10px;">(.*?)<

      DeviceName1=Today's Pollen Level
      DeviceText1=[0]
      DeviceValue1=[0]
      DeviceImage1=pollen.png
      Speakbutton1=0
      TriggerString1=
      SearchMode1=1
      TriggerEvent1=

      DeviceName2=Today's Pollen High-Med-Low
      DeviceText2=[100]
      DeviceValue2=[100]
      DeviceImage2=pollen.png
      Speakbutton2=0
      TriggerString2=
      SearchMode2=1
      TriggerEvent2=

      DeviceName3=Today's Pollen Type
      DeviceText3=[200], [300], [400]
      DeviceValue3=[200]
      DeviceImage3=pollen.png
      Speakbutton3=0
      TriggerString3=
      SearchMode3=1
      TriggerEvent3=

      DeviceName4=Tomorrow's Pollen Level
      DeviceText4=[500]
      DeviceValue4=[500]
      DeviceImage4=pollen.png
      Speakbutton4=0
      TriggerString4=
      SearchMode4=1
      TriggerEvent4=

      DeviceName5=Tomorrow's Pollen High-Med-Low
      DeviceText5=[600]
      DeviceValue5=[600]
      DeviceImage5=pollen.png
      Speakbutton5=0
      TriggerString5=
      SearchMode5=1
      TriggerEvent5=



      I run the script (Main, 2) once each day at 5:30am. It goes to the site and pulls the pollen data into HS3. Devices image below. Thank you Jon00 for enabling this!!
      Attached Files

      Comment


      • Hi! I'm trying to scrape a site, but I can't even get the grab text file to be created... Note, I have several other sites that I scrape that work fine, so I can't for the life of me understand why this one is a problem A couple of other users have no problem, but I know they're using HS3 on Linux, I'm on Windows.

        Using this in the config:

        Path=https://pollenkontroll.no/api/pollen-count?country=no&location=126
        TextFile=1

        Opening the link in a browser yields the correct result. Any tips?

        Comment


        • Originally posted by mk1 black limited View Post
          Hi! I'm trying to scrape a site, but I can't even get the grab text file to be created... Note, I have several other sites that I scrape that work fine, so I can't for the life of me understand why this one is a problem A couple of other users have no problem, but I know they're using HS3 on Linux, I'm on Windows.

          Using this in the config:

          Path=https://pollenkontroll.no/api/pollen-count?country=no&location=126
          TextFile=1

          Opening the link in a browser yields the correct result. Any tips?
          I've just tried it here and works fine? (see attached grab)

          grab1.txt
          Jon

          Comment


          • Right. I just don’t get what I could be doing wrong

            Comment


            • I don't have an answer. I just copied your settings and it works with my Windows setup. If you are on the latest version of the script, you can try the new UseIE setting to see if that works.
              Jon

              Comment


              • I don't blame you I've already tried the new UseIE-setting, no go. I can see the request going through my firewall, so the script does do something, I just never get the result. Very strange indeed.

                Comment


                • Hi Jon,
                  I'm trying to add a new scrape to my system but running into wall on the latest one.
                  I've got access to the RealTrainTimes API, which uses HTTP auth, and is HTTPS, the JSON output i recieve is below (a portion of it anyway).

                  Code:
                           "locationDetail":{  
                              "realtimeActivated":true,
                              "tiploc":"BCKNHMJ",
                              "crs":"BKJ",
                              "description":"Beckenham Junction",
                              "gbttBookedArrival":"1210",
                              "gbttBookedDeparture":"1210",
                              "origin":[  
                                 {  
                                    "tiploc":"ORPNGTN",
                                    "description":"Orpington",
                                    "workingTime":"115400",
                                    "publicTime":"1154"
                                 }
                  My config looks like this:

                  Code:
                  Path=https://api.rtt.io/api/v1/json/search/bkj/to/brx
                  TextFile=0
                  Encoding=
                  Username=****
                  Password=****
                  Options=
                  UserAgent=
                  Devicemode=0
                  StripHTML=1
                  
                  Pattern1="gbttBookedArrival":"(.*?)","
                  Pattern2=
                  Pattern3=
                  Pattern4=
                  Pattern5=
                  
                  DeviceName1=RTT - BKJ to VIC: 1
                  DeviceText1=[0]
                  DeviceValue1=[0]
                  DeviceImage1=
                  Speakbutton1=1
                  TriggerString1=
                  SearchMode1=1
                  TriggerEvent1=
                  These settings had been working fine when i was scraping the webpage rather than the API, and have used almost identical settings to scrap a JSON API previously.
                  I have no errors in my HS log, so not sure where i've gone wrong but there is just no grab data for grab3 in my data file after running my scrape event with just ID 3.

                  Any ideas where i've messed up?

                  EDIT: Ive just realised everytime i run the script, there is a login prompt popping up on my HS box (this may be since ive tried using the UseIE flag, but even adding credentials here does not get it working.

                  Comment


                  • Unfortunately you won't get any response (grab data) if it cannot authenticate. Matters are getting worse now that sites are changing over to SSL. Check this thread regarding SSL when running scripts under .NET4: https://forums.homeseer.com/forum/de...turl-ssl-issue
                    Jon

                    Comment


                    • I did wonder if this was somehow ssl related.
                      Will look at removing ssl through an internal proxy

                      Thanks as always

                      Comment


                      • I've now released V1.0.16. This provides better error/logging capabilities to see what's going wrong and adds the capability to set the SSL security protocol (i.e. SSL3, TLS 1.0, 1.1, 1.2 or 1.3)
                        Jon

                        Comment


                        • Just a quick question before i try to update,

                          right now it still closes iexplore.exe even without UseIE= is present in the ini file.
                          i did also try to not include scrape 6 and 7 but still closes iexplore.

                          regards
                          Preferred -> Jon's Plugins, Pushover, Phlocation, Easy-trigger,
                          Rfxcom, Blade Plugins, Pushbullet, homekit, Malosa Scripts




                          HS3Pro 3.0.0.531 on windows 7 ultimate X64 on hp quadcore laptop 8 GB.

                          Comment


                          • It should now be fine.....
                            Jon

                            Comment


                            • hi jon,

                              i see it works now but i got an error in the beginning, but no error anymore,

                              May-17 13:07:28 Error Authenticating SSL stream inner exception: An unknown error occurred while processing the certificate
                              May-17 13:07:28 Error Authenticating SSL stream: A call to SSPI failed, see inner exception.
                              May-17 13:07:28 Error Authenticating SSL stream inner exception: An unknown error occurred while processing the certificate
                              May-17 13:07:28 Error Authenticating SSL stream: A call to SSPI failed, see inner exception.
                              and thx for the new update,
                              regards
                              Preferred -> Jon's Plugins, Pushover, Phlocation, Easy-trigger,
                              Rfxcom, Blade Plugins, Pushbullet, homekit, Malosa Scripts




                              HS3Pro 3.0.0.531 on windows 7 ultimate X64 on hp quadcore laptop 8 GB.

                              Comment


                              • Originally posted by Malosa View Post
                                hi jon,

                                i see it works now but i got an error in the beginning, but no error anymore,

                                May-17 13:07:28 Error Authenticating SSL stream inner exception: An unknown error occurred while processing the certificate
                                May-17 13:07:28 Error Authenticating SSL stream: A call to SSPI failed, see inner exception.
                                May-17 13:07:28 Error Authenticating SSL stream inner exception: An unknown error occurred while processing the certificate
                                May-17 13:07:28 Error Authenticating SSL stream: A call to SSPI failed, see inner exception.
                                and thx for the new update,
                                regards
                                Not sure what that is. It is not coming from the script so must be generated by HS3.
                                Jon

                                Comment

                                Working...
                                X