No announcement yet.

Simple Extract String from Website.

  • Filter
  • Time
  • Show
Clear All
new posts

    Yeah this sort of thing can happen, some parsing gurus may have an idea of how to do it a bit better - I did think perhaps extracting the whole table and then replacing the HTML tags with commas and then splitting on the comma may be a slightly better solution. This works for now though...

    Imports System.Text.RegularExpressions
    Sub Main(ByVal Parms As Object)
        Dim WebStr As String = hs.GetURL("","/content/rivers_and_creeks/rainfall_and_river_level_data/site.asp?SiteID=16&bhcp=1", False, 80)
        Dim SplitStr() As String = Split(WebStr, "</td>")
        If UBound(SplitStr) > 0 Then
    	hs.writelog("Flow", Regex.Replace(SplitStr(2), "<[^>]*>", ""))
    	hs.writelog("FlowValue", Convert.ToDecimal(Regex.Replace(SplitStr(2), "[^0-9.]", "")))        
    	hs.writelog("Level", Regex.Replace(SplitStr(5), "<[^>]*>", ""))
           	hs.writelog("LevelValue", Convert.ToDecimal(Regex.Replace(SplitStr(5), "[^0-9.]", "")))
        End If
    End Sub


      Thanks mrhappy, that works perfectly.

      I think what has happened is that rainfall data is temporally unavailable, and because of that it changes the table layout.

      On a side note / question, this method of parsing data works well because there is always more than one set of values i am looking for, so it makes sense to break down the page based on a split string, for example td, however i wanted to ask, when there is only one variable you are after, what is the easiest way to get that variable out.

      I am starting to understand the getURL, so i could pull out the website into a variable like webstr, or websolar, but once i have that information in the string, how can i use two markers to extract the variable in the middle, and secondly and more confusing, how can i do this when one of the markers has " in it, which seems to confuse the string significantly.

      The example i am thinking of is extracting solar radiation, i did actually find a source which has it in text value, but now instead of extracting lots of information through splitting the string all i am really after is the one bit of information.

      The page is at:

      and looking at the source, i can see

      <li><b>Sun:</b><span class="ajax" id="ajaxsolar">
      Whilst on the other side i can see

      The variable (in this case 0) fits well in-between those two values, but i am unsure of how to pull it out, but also how to use the first code as a start point when it has " in it.

      Thanks again for all your help!
      HS3 PRO, Win10, WeatherXML, HSTouch, Pushover, UltraGCIR, Heaps of Jon00 Plugins, Just sold and about to move so very slim system.

      Facebook | Twitter | Flickr | Google+ | Website | YouTube


        These (I think) are old vb6 commands but work fine (until HS3?)...

        Sub Main(ByVal Parms As Object)
            Dim WebStr As String = hs.GetURL("", "/soil.php", False, 80)
        	Dim Find1 As String = "<li><b>Sun:</b><span class=""ajax"" id=""ajaxsolar"">"
        	Dim Find2 As String = "</span>"
        	Dim Pos1 As Integer = Instr(WebStr, Find1) + Len(Find1)
        	Dim Pos2 As Integer = Instr(Pos1, WebStr, Find2)
        	Dim RStr As String = Mid(WebStr, Pos1, Pos2 - Pos1)
        	hs.writelog("", "The Value For The Sun is: " & Convert.ToInt32(RStr) & " W/mē")
        End Sub
        Surrounding a string with a double set of quotes will leave the quote marks in.

        The command Instr will find the position of a string in another string, I find the first set of HTML tags first and get that position. Then starting at this position I then look for the other set of tags. When I have both of the positions I can then using the Mid command take out the middle bits and then last convert them to an integer.

        It can get a little confusing as when you find the string in the other string it will report the position of the first character so you need to add the length of the string (not sure if there is an option to find the last character).

        As the value is 0 at the minute I can't really test it any more, but keep an eye on it and see if it parses values correctly.