Announcement

Collapse
No announcement yet.

Web Scraper Plug-in for HS3

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Web Scraper Plug-in for HS3

    The Web Scraper Plug-in for HomeSeer allows the user to create virtual devices containing information that is scraped from a web site.

    This plugin is written in C# .Net 4.0 and requires Homeseer Version 3 or later.



    Features:

    • User can use regular expressions to scrap content from any web site
    • Content can be refreshed using a user specified update frequency
    • Configuration is done through a configuration file(HSPIWeb Scraper.ini) in the HS3 config directory
    • Device value is incremented everytime the device is updated. This makes it easy to fire events when the status text change.

    Download from the following location:
    http://www.rhusoft.com/downloads/hs3...er_1-0-1-1.zip

    Extract the contents to your HS3 root folder, except for the install.txt file.

    After starting the plugin it will create a HSPI_Web Scraper.ini file in the HS3 config directory along with 3 new devices.

    The file can be updated in real-time and if the debug flag is set it will create 2 files in the log directory showing the results of the web site scrapping with a filename prefix of the ini file section heading.

    I have requested for it to be placed in the downloader so hopefully it will show up soon.

    Use this thread for now for comments or feature requests. I plan on adding many new features.

    Example regular expressions below:

    Code:
    'Homeseer announcements
    [homeseer]
    URL=https://twitter.com/HomeSeer
    RegExSearch=(?s)js-tweet-text.*?dir="ltr">(.*?)-* *<a.*?</p>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    
    'ESPN feed - should work for any twitter feed though - just replace the URL
    [espn]
    URL=https://twitter.com/espn
    RegExSearch=(?s)js-tweet-text.*?dir="ltr">(.*?)<a.*?</p>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    
    'CNN Twitter feed
    [cnntwitterfeed]
    URL=https://twitter.com/cnnbrk
    RegExSearch=(?s)js-tweet-text.*?dir="ltr">(.*?)<a.*?</p>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    
    [cnnbreakingnews]
    URL=http://www.cnn.com/.element/ssi/www/breaking_news/3.0/banner.html
    RegExSearch="content":\s*"(?<cnnbreakingnews>.*)",
    RegExReplace=cnnbreakingnews
    RefreshInterval=60
    Enabled=true
    Debug=false
    
    [forecast]
    URL=http://forecast.weather.gov/MapClick.php?lat=32.9657005&lon=-117.1147095&site=all&smap=1&searchresult=San%20Diego%2C%20CA%2092129%2C%20USA
    RegExSearch=(?s)<li class="row-odd"><span class="label">.*?</span>(.*?)</li>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=false
    
    'All earthquakes in the world 
    [earthquake-world]
    URL=http://earthquake.usgs.gov/earthquakes/feed/v0.1/summary/all_hour.atom
    RegExSearch=<title>M (.*?)</title>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    
    'Magnitude 5 or higher earthquakes in the world
    [earthquake-world-M5+]
    URL=http://earthquake.usgs.gov/earthquakes/feed/v0.1/summary/all_hour.atom
    RegExSearch=<title>M ([5-9].*?)</title>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    
    'All earthquakes in California
    [earthquake-ca]
    URL=http://earthquake.usgs.gov/earthquakes/feed/v0.1/summary/all_hour.atom
    RegExSearch=<title>M (.*?), California</title>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    
    'Magnitude 2 or higher earthquakes in California
    [earthquake-ca-M2+]
    URL=http://earthquake.usgs.gov/earthquakes/feed/v0.1/summary/all_hour.atom
    RegExSearch=<title>M ([2-9].*?), California</title>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    
    [School Status]
    URL=http://www.calvertnet.k12.md.us/info/status/schoolstatus.asp
    RegExSearch=(?s)<div style='padding-left:5px'>(.*?)</div>
    RegExReplace=1
    RefreshInterval=60
    Enabled=true
    Debug=true
    Jason
    Last edited by jrhubott; August 10, 2014, 12:03 AM. Reason: New Viersion

    #2
    Updated to V1.0.0.6 - Now has speech action

    Comment


      #3
      Hi! This looks interesting can you post some examples/screenshots of how it works and what it does?

      Comment


        #4
        Instructions

        When first started it will create a config file in the HS3 configuration directory.

        HSPI_Web Scraper.ini
        Code:
        [cnnbreakingnews]
        URL=http://www.cnn.com/.element/ssi/www/breaking_news/3.0/banner.html
        RegExSearch="content":\s*"(?<cnnbreakingnews>.*)",
        RegExReplace=cnnbreakingnews
        RefreshInterval=60
        Enabled=true
        Debug=true
        
        [forecast]
        URL=http://forecast.weather.gov/MapClick.php?lat=32.9657005&lon=-117.1147095&site=all&smap=1&searchresult=San%20Diego%2C%20CA%2092129%2C%20USA
        RegExSearch=<li class="row-odd"><span class="label">.*?</span>(.*)</li>
        RegExReplace=1
        RefreshInterval=60
        Enabled=true
        Debug=true
        
        [earthquake]
        URL=http://earthquake.usgs.gov/earthquakes/feed/v0.1/summary/all_hour.atom
        RegExSearch=<title>M ([3-9].*)California</title>
        RegExReplace=1
        RefreshInterval=60
        Enabled=true
        Debug=false
        You can edit and add new sections to this file and HS3 devices will be created in real-time as the file is updated.

        URL=The URL that you want to retrieve and scrape for information
        RegExSearch=The regular expression that you want to perform against the returned data
        RegExReplace=The regular expression group that you want to be placed into the status of the device
        RefreshInterval=The time in seconds before attempting a refresh
        Enabled=true/false
        Debug=true/false


        Last edited by jrhubott; August 7, 2014, 10:36 AM.

        Comment


          #5
          Hello,

          Iam getting a error when i start the plugin

          any idea's


          **FATAL**: Failed getting InterfaceStatus from Web Scraper - the interface was not found in the list of active in.

          Ed

          Comment


            #6
            Can you turn on developer mode on the plugins and tell what appears in the console window when you activate the interface.

            I will add logging to the next version.

            Comment


              #7
              I just posted version 1.0.1.1 that has changes that should address these issues. If not it now has additional logging that is placed in the HS logs folder.

              Comment


                #8
                Hello Jason,

                Thanks for the new version.
                this one work good no error.
                and the examples ar working.


                Ed

                Comment


                  #9
                  Does anyone know how to get a plugin added to the updater? I sent a email to updater@homeseer.com several days ago and have yet to receive a response. I stopped working on HS2 plugins in the past due to HS3 and now I have decided to convert my house to HS3 and have plans on making at least 3 additional plugins but I'm concerned now with the lack of support from the HS team.

                  Anyone have any suggestions?

                  Thanks,

                  Jason

                  Comment


                    #10
                    What are the other plugins? Did you call then?

                    Comment


                      #11
                      Pool Control - EasyTouch RS485 interface - I have this working on HS2
                      Yamaha receiver controller
                      Log to MSSQL

                      Comment


                        #12
                        Having a little trouble as i'm not a programmer. How would I get the school status off of this site?
                        http://www.calvertnet.k12.md.us/info...hoolstatus.asp
                        https://forums.homeseer.com/forum/de...plifier-plugin

                        Comment


                          #13
                          Originally posted by happnatious1 View Post
                          Having a little trouble as i'm not a programmer. How would I get the school status off of this site?
                          http://www.calvertnet.k12.md.us/info...hoolstatus.asp

                          Good idea!
                          Originally posted by rprade
                          There is no rhyme or reason to the anarchy a defective Z-Wave device can cause

                          Comment


                            #14
                            Well this is my best guess but it doesnt work.

                            [School Status]
                            URL=http://www.calvertnet.k12.md.us/info/status/schoolstatus.asp
                            RegExSearch=<div style='padding-left:5px'>
                            RegExReplace=1
                            RefreshInterval=60
                            Enabled=true
                            Debug=false
                            https://forums.homeseer.com/forum/de...plifier-plugin

                            Comment


                              #15
                              Scraping a password protected web site

                              I have installed the plug-in, and it works well when scraping webpages that are not password protected. I have a home alarm system that has no ability to be accessed by an API or NO relay contact on the controller. The only way I can get the status of the alarm system is by accessing a web page. This web page recommends user name and password. How can I use the plug-in when I have to log in to get access to the web-page?

                              Comment

                              Working...
                              X