Announcement

Collapse
No announcement yet.

GetURL text string compare - help needed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    GetURL text string compare - help needed

    I'm using hs.GetURL to get a webpage and using VBScript to parse the result up and compare a text string. I keep getting the result 1, that the strings don't match when they appear to me to match. I'm also getting a unusually long length for the text string. The text I'm scraping is in a table but the html tags seem to get striped off when using the GetURL command, I'm assuming that is because the html file is being rendered. Here's my code:
    Code:
    sub main()
    
    dim page
    
    page = hs.GetURL("http://www.pscleanair.org/airq/status.aspx","/",TRUE,80)
    PageLen = Len(Page)
    BurnStatusEnd = (InStr(page,"The Puget Sound Clean Air Agency issues air quality burn bans when air"))-2
    PageBurnStatus = Left(page,BurnStatusEnd)
    SnoPos = InStrRev(PageBurnStatus,"Snohomish")
    SnohomishText = Right(PageBurnStatus,BurnStatusEnd-SnoPos+1)
    hs.SetDeviceString "V8", SnohomishText
    
    SnohomishStatus = StrComp((Left(SnohomishText,70)),"Snohomish STAGE 1",1)
    'if SnohomishStatus does not equal Snohomish Stage 1 then set V9 to off
    If SnohomishStatus = 1 then
    hs.SetDeviceStatus "V9", 3
    End If
    'if SnohomishStatus does equal Snohomish Stage 1 then set V9 to on
    If SnohomishStatus = 0 then
    hs.SetDeviceStatus "V9", 2
    End If
    hs.writelog "Info", "Page Length = " & PageLen & " BurnStatusEnd = " & BurnStatusEnd &  " PageBurnStatus Length = " & Len(PageBurnStatus) & " Snohomish pos = " & SnoPos & " SnoStatusPos = " & SnoStatusPos
    hs.writelog "Info", "PageBurnStatus = " & PageBurnStatus
    hs.writelog "Info", "SnohomishText = " & SnohomishText
    hs.writelog "Info", "SnohomishText Length = " & Len(SnohomishText)
    hs.writelog "Info", "Snohomish Burn Text =" & Left(SnohomishText,70)
    hs.writelog "Info", "Snohomish Burn Ban Status = " & SnohomishStatus
    end sub
    This is what I'm getting in my log file:

    1/16/2013 6:54:56 PM Info PageBurnStatus = Air Quality Burn Ban Status | Puget Sound Clean Air Agency Home | About Us | Contact Us | Site Map Air Quality Forecast Current Air Quality Current Ozone --> Burn Ban Visibility Camera Air Quality Basics Air Quality Data & Reports Today's Forecast King Kitsap Pierce Snohomish Tomorrow'sForecast King Kitsap Pierce Snohomish Forecast Discussion Current Air Quality Data and Reports One old, uncertified wood stove can release as much fine-particle pollution as more than 1,000 natural gas furnaces for the same heat output. Learn more. Air Quality Burn Ban Status Updated Wednesday, January 16, 2013 County Status King NO BAN IN EFFECT Kitsap NO BAN IN EFFECT Pierce STAGE 1 IN EFFECT AS OF 1 PM 1/15/13 Snohomish STAGE 1 IN EFFECT AS OF 1 PM 1/15/13
    1/16/2013 6:54:56 PM Info SnohomishText = Snohomish STAGE 1 IN EFFECT AS OF 1 PM 1/15/13
    1/16/2013 6:54:56 PM Info SnohomishText Length = 251
    1/16/2013 6:54:56 PM Info Snohomish Burn Text =Snohomish STAGE 1
    1/16/2013 6:54:56 PM Info Snohomish Burn Ban Status = 1

    Any help?

    Thanks,
    Jabran

    #2
    Not sure exactly why your script does not work but this may be an alternative vb.net script you could try if you wanted. Sometimes what can happen is that it can appear in the web HS log fine but if you look in the GUI log you might find missing characters (like &nbsp or something) which could lead to your comparison failing - also the problem with web page scraping is there are 1000 ways to do it.

    HTML Code:
    Imports System.Text.RegularExpressions
    
    Sub Main(ByVal Parms As Object)
    
    Dim Page As String = hs.GetURL("http://www.pscleanair.org/airq/status.aspx","/", False, 80)
    Dim Start As Integer = InStr(Page, "<td>Snohomish</td>")
    Dim Finish As Integer = InStr(Start, Page, "</p>")
    
    Dim TagLess As String = Regex.Replace(Page.SubString(Start + 3, (Finish - Start)), "<(.|\n)+?>", "")
    
    TagLess = Tagless.Replace("Snohomish", "")
    TagLess = TagLess.Trim
    
    If TagLess.Contains("STAGE 1 IN EFFECT") Then
    hs.writelog("Info", "Stage 1 Is In Effect")
    Else
    hs.writelog("Info", "Stage 1 Is Not In Effect")
    End If
    
    End Sub
    Last edited by mrhappy; January 17, 2013, 12:55 PM. Reason: think I broke the board again

    Comment


      #3
      Mr Happy,
      Thanks for the pointers. You are correct, the HS log console was displaying a different result than the web interface log, there were a lot of spaces on the end of the text. I removed all spaces and got the text result down to the exact number of characters I'm expecting but still can't get a match. I think I'll try your script suggestion and see how I get along with that.

      Jabran

      Comment


        #4
        Hi jabrans,

        I to had this problem as the html in the string wont show in the log so i run this function to strip out the html.

        Code:
            Function StripTags(ByVal html As String) As String
                ' Remove HTML tags.
                Return Regex.Replace(html, "<.*?>", "")
            End Function
        Greig.
        Zwave = Z-Stick, 3xHSM100� 7xACT ZDM230, 1xEverspring SM103, 2xACT HomePro ZRP210.
        X10 = CM12U, 2xAM12, 1xAW10, 1 x TM13U, 1xMS13, 2xHR10, 2xSS13
        Other Hardware = ADI Ocelot + secu16, Global Cache GC100, RFXtrx433, 3 x Foscams.
        Plugings = RFXcom, ActiveBackup, Applied Digital Ocelot, BLDeviceMatrix, BLGarbage, BLLAN, Current Cost, Global Cache GC100,HSTouch Android, HSTouch Server, HSTouch Server Unlimited, NetCAM, PowerTrigger, SageWebcamXP, SqueezeBox, X10 CM11A/CM12U.
        Scripts =
        Various

        Comment


          #5
          Well I'm now playing the game of chasing a changing webpage to do a scrape since they don't offer a rss feed or anything reliable. When I use Firefox with inspector I see what it appears I need to scrape but HS.GetURL doesn't appear to get the same thing. When I view source in IE I also can't find the same text string.

          Here's the url:
          http://www.pscleanair.org/priorities.../burnbans.aspx

          I want to scrape the text in the box to the right of the county name "Snohomish" which is the current stage for Snohomish, it's currently "stage 2" but in a few days it will probably be back to "No Ban". Can someone with a little more html experience point me in the right direction?

          Here's my script:
          Code:
          Imports System.Text.RegularExpressions
          
          Sub Main(ByVal Parms As Object)
          
          Dim Page As String = hs.GetURL("http://www.pscleanair.org/priorities/woodheating/Pages/burnbans.aspx","/", False, 80)
          Dim Start As Integer = InStr(Page, "ctl00$MainContent$Repeater1$ctl00$TextBox")
          hs.writelog("Info", "Start = " & Start)
          Dim Finish As Integer = InStr(Start, Page, "text")
          
          Dim TagLess As String = Regex.Replace(Page.SubString(Start + 3, (Finish - Start)), "<(.|\n)+?>", "")
          
          TagLess = TagLess.Trim
          Dim Status As String = Left(TagLess, 64)
          hs.writelog("Info", "TagLess = " & TagLess)
          hs.writelog("Info", "Status = " & Status)
          
          hs.SetDeviceString ("V8", Status)
          hs.SetDeviceLastChange ("V8", now)
          
          If TagLess.Contains("STAGE 2") Then
          hs.writelog("Info", "Stage 2 Is In Effect")
          'if TagLess does equal Snohomish Stage 2 then set V9 to on
          hs.SetDeviceStatus ("V9", 2)
          Else
          hs.writelog("Info", "Stage 2 Not In Effect")
          'if TagLess does not equal Snohomish Stage 2 then set V9 to off
          hs.SetDeviceStatus ("V9", 3)
          End If
          
          End Sub
          This is what I get in my HS log when I run the script:

          Scripting runtime error: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. --- System.ArgumentException: Argument 'Start' must be greater than zero. at Microsoft.VisualBasic.Strings.InStr(Int32 Start, String String1, String String2, CompareMethod Compare) at scriptcode66.scriptcode66.Main(Object Parms) --- End of inner exception stack trace --- at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner) at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters) at Scheduler.VsaScriptHost.Invoke(String ModuleName, String MethodName, Object[] Arguments)


          Thanks,
          Jabran
          Last edited by jabrans; November 29, 2015, 10:36 PM.

          Comment


            #6
            Jabrans have you looked at using Jon00's script for this? He has quite a robust its grabber with regex support that might save you some time and effort.
            Author of Highpeak Plugins | SMS-Gateway Plugin | Blue Iris Plugin | Paradox (Beta) Plugin | Modbus Plugin | Yamaha Plugin

            Comment


              #7
              That source seems to point that the data is stored in a separate page and is in an iFrame, so it is here http://wc.pscleanair.org/burnban411/ (whether that URL changes I do not know, not sure what the last three numbers signify). There is a real possibility that this might fail to work at some point and break but this seems to work for me at the minute - you might be able to see how it works over the next days. There are probably countless ways to do this better but I am stuck in my ways.

              HTML Code:
              Sub Main(ByVal Parms As Object)
              
                  Dim Page As String = hs.GetURL("http://wc.pscleanair.org/burnban411/", "/", False, 80)
                  Dim RowSplit() As String = Split(Page, "<div class=""Row"">")
                  Dim HowManyRows As Integer = RowSplit.GetUpperBound(0)
              
                  'hs.writelog("", "Row Count: " & HowManyRows)
                  'hs.writelog("", RowSplit(1))
              
                  Dim Start As Integer = InStr(RowSplit(1), "value=""")
                  Dim Finish As Integer = InStr(Start, RowSplit(1), Chr(34))
              
                  'hs.writelog("", "Start = " & Start)
                  'hs.writelog("", "Finish = " & Finish)
              
                  Dim FinalString As String = RowSplit(1).Substring(Start + 6, (Finish - Start + 1)).Trim
              
                  'hs.writelog("", FinalString)
              
                  If FinalString.ToLower = "stage 2" Then
                      hs.writelog("", "Stage 2 Is In Effect")
                  ElseIf FinalString.ToLower = "no ban" Then
                      hs.writelog("", "Stage 2 Is Not In Effect")
                  Else : hs.writelog("", "Unknown Data/Page Error")
                  End If
              End Sub

              Comment


                #8
                Thanks mrhappy, you wrote this out the last time they changed it almost three years ago...

                Well now that there is no ban for Snohomish it is now in row 2 and the only area left with a ban is in row 1. I think it needs to pick the row based off the area name "Snohomish" because there's no way to know which row it's on. Also, the value "no ban" is shorter in length than "stage 2" so with no ban I get no ban" which doesn't match the last ElseIf condition. I modified a bit of the other items, this is where I'm at now:

                Code:
                Sub Main(ByVal Parms As Object)
                
                    Dim Page As String = hs.GetURL("http://wc.pscleanair.org/burnban411/", "/", False, 80)
                    Dim RowSplit() As String = Split(Page, "<div class=""Row"">")
                    Dim HowManyRows As Integer = RowSplit.GetUpperBound(0)
                
                    hs.writelog("", "Row Count: " & HowManyRows)
                    hs.writelog("", RowSplit(2))
                
                    Dim Start As Integer = InStr(RowSplit(2), "value=""")
                    Dim Finish As Integer = InStr(Start, RowSplit(2), Chr(34))
                
                    hs.writelog("", "Start = " & Start)
                    hs.writelog("", "Finish = " & Finish)
                    hs.writelog("", "Page= " & Page)
                
                    Dim FinalString As String = RowSplit(2).Substring(Start + 6, (Finish - Start + 1)).Trim
                
                    hs.writelog("", "FinalSting= " & FinalString)
                
                    hs.SetDeviceString ("V8", FinalString)
                    hs.SetDeviceLastChange ("V8", now)
                
                    If FinalString.ToLower = "stage 2" Then
                        hs.writelog("", "Stage 2 Is In Effect")
                        hs.SetDeviceStatus ("V9", 2)
                    ElseIf FinalString.ToLower = "stage 1" Then
                        hs.writelog("", "Stage 1 Is In Effect")
                        hs.SetDeviceStatus ("V9", 3)
                    ElseIf FinalString.ToLower = "no ban" Then
                        hs.writelog("", "Stage 2 Is Not In Effect")
                        hs.SetDeviceStatus ("V9", 3)
                    Else : hs.writelog("", "Unknown Data/Page Error")
                    End If
                End Sub

                Comment


                  #9
                  Ok let me have a look, if it jumps around it should be still possible to get it to work and I'll have a look at the other issue.

                  Comment


                    #10
                    Whilst I don't want to discourage Adam's never ending enthusiasm to help, as BeeryGaz states, there is an easy way to do this with my Datascraper script.

                    Just download it from my site and add the following entry to the Jon00Datascraper.ini file (to replace the existing [Grab2] entry):

                    PHP Code:
                    [Grab2]
                    Path=http://wc.pscleanair.org/burnban411/
                    TextFile=0
                    Encoding
                    =
                    Username=
                    Password=
                    Options=
                    UserAgent=
                    Devicemode=0
                    Pattern1
                    =(?s)Snohomish</p>.*?value="(.*?)"
                    DeviceName1=Air Quality Snohomish
                    DeviceText1
                    =[0]
                    DeviceValue1=
                    DeviceImage1=
                    Speakbutton1=
                    Create an event to run the script as shown and a virtual device is automatically created with the data you need.
                    Attached Files
                    Jon

                    Comment


                      #11
                      Originally posted by jabrans View Post
                      Thanks mrhappy, you wrote this out the last time they changed it almost three years ago...

                      Well now that there is no ban for Snohomish it is now in row 2 and the only area left with a ban is in row 1. I think it needs to pick the row based off the area name "Snohomish" because there's no way to know which row it's on. Also, the value "no ban" is shorter in length than "stage 2" so with no ban I get no ban" which doesn't match the last ElseIf condition. I modified a bit of the other items, this is where I'm at now:

                      Code:
                      Sub Main(ByVal Parms As Object)
                      
                          Dim Page As String = hs.GetURL("http://wc.pscleanair.org/burnban411/", "/", False, 80)
                          Dim RowSplit() As String = Split(Page, "<div class=""Row"">")
                          Dim HowManyRows As Integer = RowSplit.GetUpperBound(0)
                      
                          hs.writelog("", "Row Count: " & HowManyRows)
                          hs.writelog("", RowSplit(2))
                      
                          Dim Start As Integer = InStr(RowSplit(2), "value=""")
                          Dim Finish As Integer = InStr(Start, RowSplit(2), Chr(34))
                      
                          hs.writelog("", "Start = " & Start)
                          hs.writelog("", "Finish = " & Finish)
                          hs.writelog("", "Page= " & Page)
                      
                          Dim FinalString As String = RowSplit(2).Substring(Start + 6, (Finish - Start + 1)).Trim
                      
                          hs.writelog("", "FinalSting= " & FinalString)
                      
                          hs.SetDeviceString ("V8", FinalString)
                          hs.SetDeviceLastChange ("V8", now)
                      
                          If FinalString.ToLower = "stage 2" Then
                              hs.writelog("", "Stage 2 Is In Effect")
                              hs.SetDeviceStatus ("V9", 2)
                          ElseIf FinalString.ToLower = "stage 1" Then
                              hs.writelog("", "Stage 1 Is In Effect")
                              hs.SetDeviceStatus ("V9", 3)
                          ElseIf FinalString.ToLower = "no ban" Then
                              hs.writelog("", "Stage 2 Is Not In Effect")
                              hs.SetDeviceStatus ("V9", 3)
                          Else : hs.writelog("", "Unknown Data/Page Error")
                          End If
                      End Sub
                      I'd go with Jon's suggestion I think to use his application, if it is jumping around in the rows then you could test each line for the data containing the town name and then use that row but in reality these sort of things frequently break and Jon's way of using RegEx is going to be more reliable.

                      Comment


                        #12
                        Jon,
                        Thanks for the suggestion. Will your script run under HS2?

                        Jabran

                        Comment


                          #13
                          Sorry, I did not realize you were still on HS2. This is for HS3 only.
                          Jon

                          Comment

                          Working...
                          X