Announcement

Collapse
No announcement yet.

Scraping Weather Images with Big5

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scraping Weather Images with Big5

    Check this out, I figured out how to scrape live weather images from the National Weather Service and rebroadcast them in HS using Big5. What makes this unique is that i'm not storing the images locally, I'm just rebroadcasting them. So therefore you don't have to worry about cleaning up an image folder or any complicated scripting. Just a single regex statement in a Big5 HTTP profile is all you need.

    I'll post a how-to a little later over the weekend. BTW, the radar image is actually a live radar loop that plays in Homeseer, but it doesn't show up that way here on the forum.

    Click image for larger version  Name:	hazards.png Views:	0 Size:	495.1 KB ID:	1305735
    Click image for larger version  Name:	radar.png Views:	0 Size:	621.4 KB ID:	1305737
    Click image for larger version  Name:	forecast.png Views:	0 Size:	532.2 KB ID:	1305738
    Click image for larger version  Name:	7dayforecast.png Views:	0 Size:	69.3 KB ID:	1305739

    --Barry

  • #2
    Interesting. Looking forward to seeing how you did this.
    Don

    Comment


    • #3
      Looks great,

      Not using this weather service, but looking forward for the instructions.

      ---
      John

      Comment


      • #4
        +1

        Comment


        • #5
          To scrape the weather images you'll need to setup a Big5 HTTP profile pointing to https://www.weather.gov/XXX/, except replace the XXX with your nearest NWS forecast office. I'm using MEG, which is Memphis. You can go to https://www.weather.gov and click the map to find your local forecast office.

          Click image for larger version  Name:	bgprofile.png Views:	0 Size:	84.5 KB ID:	1306087

          Note that each local forecast office provides their own maps, so layouts of the web pages will vary somewhat from location to location. However, you should be able to modify the code below to work for your specific location. Use a Chrome browser and hit F12 to open up a side window that shows the underlying HTML for the page. For whatever picture you are trying to scrape you need to begin your scrape at the beginning div tag for that picture and end the scrape after including the full image URI.

          Here's the Regex I'm using to scrape the images from https://www.weather.gov/meg/

          Code:
          regex(input, "(?si)<div class=\"graphicast\">.+<div class=\"more\"")[0] && regex(input, "(?si)<img alt=\"Local Radar\" src=\"//radar.weather.gov/lite/N0R/NQA_loop.gif\"")[0] && regex(input, "(?si)<img alt=\"Weather Map\" src=\"//www.wpc.ncep.noaa.gov/noaa/noaad1.gif\"")[0] && regex(input, "(?si)<img alt=\"Graphical Forecast\" src=\"//graphical.weather.gov/images/conus/MaxT1_conus.png\"")[0]
          That will bring back the 5 weather images I posted above.

          Note that in your regex all quotes within the specific HTML you are naming will need to be "escaped" by preceding them with a backslash. So if "Local Radar" is within your regex statement, you'd escape the quotes to make it look like this:
          Code:
          \"Local Radar\"
          That basically tells the regex statement "hey, don't pay any attention to this quote right here".

          I use the following to name the devices that Big5 auto-creates:
          .
          HTML Code:
          "Hazardous Weather Outlook" && "Radar Loop" && "Weather Map" && "Temperature Map"
          You may also need to modify you HS.css file to allow the larger images to display properly. To do so, scan the Homeseer\html\css folder for the hs.css file. Open in notepad and search for .device_status_box. It needs to be modified to look like this:

          Code:
          .device_status_box
          {
              max-height: 600px;
              max-width: 550px;
              min-width: 100px;
              overflow:auto;
          }
          This will allow the larger images to be shown and "overflow:auto" will allow scroll bars to appear in case the images are still too large.

          Next you'll need to setup an event to scrape the website on a set interval. My event looks like this:

          Click image for larger version  Name:	bg5event.png Views:	0 Size:	298.7 KB ID:	1306086

          That's it! Hope you find it useful.

          --Barry

          Comment


          • #6
            Hi Barry,

            Thanks for sharing. I wanted to repeat your experience before venturing into my own area. I found some discrepancies. First you show 5 images originally but in the instructions you show only 4 devices
            "Hazardous Weather Outlook" && "Radar Loop" && "Weather Map" && "Temperature Map" Furthermore when I copy your setup blindly and use your area "meg" I do receive images for 3. I did not receive any image for "Hazardous..."

            Comment


            • #7
              Correction. All 4 images for "meg" do show up per instructions. It was a mess up on my side.

              Comment


              • #8
                Originally posted by risquare View Post
                Correction. All 4 images for "meg" do show up per instructions. It was a mess up on my side.
                The "Hazardous Weather Outlook" sometimes returns 2 images and sometimes 1 image, depending on what the weather service puts out for the day. If there are 2 images available, they will return under the same device number. Rather than create 2 separate devices, I decided to lump those 2 images under 1 device so that it wouldn't return an error on the days when only 1 "Hazardous Weather Outlook" image was posted by NWS.

                Each local forecast office is different, and some provide several images (such as forecast rain amounts) that I haven't provided for. For those the method of scraping should be basically the same.

                --Barry

                Comment


                • #9
                  logman
                  Can we open a little RegEx workshop here if you don't mind?
                  What does (?si) do and where do the strings following it come from.
                  What is the Input and what is Big5 output after applying RegEx (I assume it is the address of the picture, but not sure).
                  Also can I show the pictures in HSTouch designer and client.
                  Thanks,

                  ​​​​​​​Ivan

                  Comment


                  • #10
                    Originally posted by risquare View Post
                    logman
                    Can we open a little RegEx workshop here if you don't mind?
                    What does (?si) do and where do the strings following it come from.
                    What is the Input and what is Big5 output after applying RegEx (I assume it is the address of the picture, but not sure).
                    Also can I show the pictures in HSTouch designer and client.
                    Thanks,

                    Ivan
                    So what we are basically doing is capturing the image href from div tag to div tag. This is what tells the NWS webpage how to assemble and display the pictures. We are grabbing those instructions individually and using them to inform HS3 how to do the same.

                    The easiest way to figure out what code you need to grab is to use Chrome and hit the F12 key. This will open an html editor that will show the behind the scenes code. Hover over the image you want to scrape and that code will be highlighted in the editor. Be sure to keep opening the subsections of code until you find the smallest portion of code that will still highlight the picture you want to scrape.

                    See example below... Note the picture I want to scrape is highlighted and i have circled the arrows I clicked to keep drilling down to the smallest amount of code that would leave the picture and caption highlighted:

                    Click image for larger version

Name:	nws.png
Views:	71
Size:	635.7 KB
ID:	1306537

                    Now, let's look at the RegEx to scrape the above image:

                    Code:
                    regex(input, "(?si)<div class=\"graphicast\">.+<div class=\"more\"")[0]
                    The command "input" tells the regex to grab everything that follows that is enclosed within quotes and then use it for the input.

                    The regex tokens (?si) means literally "Global Modifier (?) Case Insensitive (i) Single Line String (s)". This is useful for scraping content subject to change where you don't know if the website author is going to use capital letters or not.

                    As mentioned above the backslash before each quote is an escape character and tells the regex "the quotation mark immediately following me is not a command nor a closure and should be matched literally".

                    The next regex tokens are ".+" (dot plus) which means literally "capture everything in between the first specified pattern and the next specified pattern". This is useful for when the content in the middle of the 2 anchor patterns may change but the beginning and ending anchors always remains the same. Look again at the image above and you can see the beginning and ending anchors I picked within the red square.

                    Finally, the zero in brackets [0] tells the regex to use the first match of the pattern it finds and return it as the input.

                    I'm not a user of HSTouch designer, so can't comment on that. However as long as the html client has access to the internet to fetch the pictures then the images should display.

                    --Barry

                    Comment


                    • #11
                      Originally posted by logman View Post



                      Code:
                      regex(input, "(?si)<div class=\"graphicast\">.+<div class=\"more\"")[0]
                      The command "input" tells the regex to grab everything that follows that is enclosed within quotes and then use it for the input.

                      The regex tokens (?si) means literally "Global Modifier (?) Case Insensitive (i) Single Line String (s)". This is useful for scraping content subject to change where you don't know if the website author is going to use capital letters or not.

                      As mentioned above the backslash before each quote is an escape character and tells the regex "the quotation mark immediately following me is not a command nor a closure and should be matched literally".

                      The next regex tokens are ".+" (dot plus) which means literally "capture everything in between the first specified pattern and the next specified pattern". This is useful for when the content in the middle of the 2 anchor patterns may change but the beginning and ending anchors always remains the same. Look again at the image above and you can see the beginning and ending anchors I picked within the red square.

                      Finally, the zero in brackets [0] tells the regex to use the first match of the pattern it finds and return it as the input.


                      --Barry
                      Thanks Barry,

                      As I've said before the full RegEx theory is not for the average HS3 user (including myself) , however Big5 short one page list of examples (part of Big5 documentation) is enough for 90%+ of the use cases. Here you went one step beyond and I do appreciate that.

                      I think that the "input" in the command Regex(input, "(?si)..... refers to the input stream that Big5 receives from the remote web server after sending out GET command.
                      After it gets processed by Big5 following the regex rules than it becomes an attribute of the HS3 device created by Big5. For this you also use the word "input" which is correct as it is fed into HS3. However it should be noted that this is different input than the input from the remote site.

                      Thanks again for the good use case of an advanced RegEx use with Big5.

                      Ivan

                      Comment

                      Working...
                      X