Announcement

Collapse
No announcement yet.

Simple Extract String from Website.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mrhappy
    replied
    These (I think) are old vb6 commands but work fine (until HS3?)...

    Code:
    Sub Main(ByVal Parms As Object)
    
        Dim WebStr As String = hs.GetURL("http://www.britanniacreek.com", "/soil.php", False, 80)
    
    	Dim Find1 As String = "<li><b>Sun:</b><span class=""ajax"" id=""ajaxsolar"">"
    	Dim Find2 As String = "</span>"
    
    	Dim Pos1 As Integer = Instr(WebStr, Find1) + Len(Find1)
    	Dim Pos2 As Integer = Instr(Pos1, WebStr, Find2)
    
    	Dim RStr As String = Mid(WebStr, Pos1, Pos2 - Pos1)
    
    	hs.writelog("", "The Value For The Sun is: " & Convert.ToInt32(RStr) & " W/m")
    
    
    
    End Sub
    Surrounding a string with a double set of quotes will leave the quote marks in.

    The command Instr will find the position of a string in another string, I find the first set of HTML tags first and get that position. Then starting at this position I then look for the other set of tags. When I have both of the positions I can then using the Mid command take out the middle bits and then last convert them to an integer.

    It can get a little confusing as when you find the string in the other string it will report the position of the first character so you need to add the length of the string (not sure if there is an option to find the last character).

    As the value is 0 at the minute I can't really test it any more, but keep an eye on it and see if it parses values correctly.

    Leave a comment:


  • travisdh
    replied
    Thanks mrhappy, that works perfectly.

    I think what has happened is that rainfall data is temporally unavailable, and because of that it changes the table layout.

    On a side note / question, this method of parsing data works well because there is always more than one set of values i am looking for, so it makes sense to break down the page based on a split string, for example td, however i wanted to ask, when there is only one variable you are after, what is the easiest way to get that variable out.

    I am starting to understand the getURL, so i could pull out the website into a variable like webstr, or websolar, but once i have that information in the string, how can i use two markers to extract the variable in the middle, and secondly and more confusing, how can i do this when one of the markers has " in it, which seems to confuse the string significantly.

    The example i am thinking of is extracting solar radiation, i did actually find a source which has it in text value, but now instead of extracting lots of information through splitting the string all i am really after is the one bit of information.

    The page is at: http://www.britanniacreek.com/soil.php

    and looking at the source, i can see

    Code:
    <li><b>Sun:</b><span class="ajax" id="ajaxsolar">
    Whilst on the other side i can see

    Code:
    </span>
    The variable (in this case 0) fits well in-between those two values, but i am unsure of how to pull it out, but also how to use the first code as a start point when it has " in it.

    Thanks again for all your help!

    Leave a comment:


  • mrhappy
    replied
    Yeah this sort of thing can happen, some parsing gurus may have an idea of how to do it a bit better - I did think perhaps extracting the whole table and then replacing the HTML tags with commas and then splitting on the comma may be a slightly better solution. This works for now though...

    Code:
    Imports System.Text.RegularExpressions
    
    Sub Main(ByVal Parms As Object)
    
        Dim WebStr As String = hs.GetURL("http://www.melbournewater.com.au","/content/rivers_and_creeks/rainfall_and_river_level_data/site.asp?SiteID=16&bhcp=1", False, 80)
        Dim SplitStr() As String = Split(WebStr, "</td>")
    
        If UBound(SplitStr) > 0 Then
    
    	hs.writelog("Flow", Regex.Replace(SplitStr(2), "<[^>]*>", ""))
    	hs.writelog("FlowValue", Convert.ToDecimal(Regex.Replace(SplitStr(2), "[^0-9.]", "")))        
    	hs.writelog("Level", Regex.Replace(SplitStr(5), "<[^>]*>", ""))
           	hs.writelog("LevelValue", Convert.ToDecimal(Regex.Replace(SplitStr(5), "[^0-9.]", "")))
        
        End If
    
    End Sub

    Leave a comment:


  • travisdh
    replied
    Hi MrHappy,

    Just a (hopefully quick) question, the script has started to error up, i am guessing the string has slightly changed. Would you be able to take a look and make some suggestions on how it could be fixed?

    Thanks

    4/29/2012 3:22:38 PM Error Scripting runtime error: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.FormatException: Input string was not in a correct format. at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) at System.Number.ParseDecimal(String value, NumberStyles options, NumberFormatInfo numfmt) at System.Convert.ToDecimal(String value) at scriptcode5.scriptcode5.Main(Object Parms) --- End of inner exception stack trace --- at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner) at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters) at Scheduler.VsaScriptHost.Invoke(String ModuleName, String MethodName, Object[] Arguments) 4/29/2012 3:22:38 PM Level 0.250 m Flow as at 29-Apr-2012 2 PM : 39.300 Ml/day Chart View Detailed historical data Period Max Min Period Max Min 4/29/2012 3:22:38 PM FlowValue 39.300 4/29/2012 3:22:38 PM Flow 39.300 Ml/day

    Leave a comment:


  • travisdh
    replied
    Thanks,

    That is perfect!

    Leave a comment:


  • mrhappy
    replied
    Try this;

    Code:
    Imports System.Text.RegularExpressions
    
    Sub Main(ByVal Parms As Object)
    
        Dim WebStr As String = hs.GetURL("http://www.melbournewater.com.au","/content/rivers_and_creeks/rainfall_and_river_level_data/site.asp?SiteID=16&bhcp=1", False, 80)
        Dim SplitStr() As String = Split(WebStr, "<td>")
    
        If UBound(SplitStr) > 0 Then
    
    	hs.writelog("Flow", Regex.Replace(SplitStr(3), "<[^>]*>", ""))
    	hs.writelog("FlowValue", Convert.ToDecimal(Regex.Replace(SplitStr(3), "[^0-9.]", "")))        
    	hs.writelog("Level", Regex.Replace(SplitStr(6), "<[^>]*>", ""))
           	hs.writelog("LevelValue", Convert.ToDecimal(Regex.Replace(SplitStr(6), "[^0-9.]", "")))
        
        End If
    
    End Sub
    You will need to add the device setting back in, due to the fact they are decimals then you will need to multiply them to get an accurate figure or if you are happy you could just set the value from there. I think what was happening is that you were getting some HTML left in the device strings that might have been causing some formatting issues.

    Leave a comment:


  • travisdh
    replied
    Hi mrhappy,

    I slightly modified your code example for another website (melbourne water data), so that i could use the water levels in my home automation setup to warn me if the creek near me is getting to high. The code i used is below but i was hoping you might be able to take a look. I am sure it will be simple but i am trying to set the device string as the extracted string, but i think because i am just pulling out the whole string including blankspace it is making the device look horrible.

    I had also hoped to be able to use the values to put in the device value section of the homeseer devices, but for some reason whenever that was in it errored up due to wrong input type.

    Would you be able to take a look and make any suggestions on changes required to neaten up the extracted string, and extract the value for the device value.

    Thanks!

    Code:
    Imports System.Text.RegularExpressions
    
    Sub Main(ByVal Parms As Object)
    
        Dim WebStr As String = hs.GetURL("http://www.melbournewater.com.au","/content/rivers_and_creeks/rainfall_and_river_level_data/site.asp?SiteID=16&bhcp=1", False, 80)
    
        'hs.writelog("WATER-RAW", WebStr)
        Dim SplitStr() As String = Split(WebStr, "<td>")
    
        For i As Integer = 0 To UBound(SplitStr) - 1
        'hs.writelog(i, SplitStr(i))
        Next
    
        If UBound(SplitStr) > 0 Then
    
            hs.writelog("Flow", SplitStr(3))
            hs.SetDeviceString("Y13","Flow: " & SplitStr(3))
            hs.writelog("Level", SplitStr(6))
            hs.SetDeviceString("Y14","Level: " & SplitStr(6))
    
           
        End If
    
    End Sub

    Leave a comment:


  • travisdh
    replied
    Thanks for the tip(s)!

    Murphy's law says i am more than 200km away from home (for the next few days) and the Arduino in the greenhouse is not responding, so can't test it .

    I will give it a go when i get back, I was slack when running the network cable which runs about 50m outside, and i am sure there is a break in the cable which prevents it from responding. Even when it does the speed is about 1mbps (on LAN) so it always takes quite a while to respond.

    Leave a comment:


  • mrhappy
    replied
    There are probably ways of doing this slightly better but here might be something for you to try;

    Code:
    Imports System.Text.RegularExpressions
    
    Sub Main(ByVal Parms As Object)
    
        Try
    
        Dim WebStr As String = "Arduino powered webserver#Serving temperature and humidity values from a DHT22 sensor#Light (Lux): 0#Temperature (oC): 24.40#Humidity (%): +54.00#"
    
        My.Computer.Network.DownloadFile("http://192.168.1.178/", hs.getapppath & "\adstation.txt", "", "", False, 10000, True)
    
        If System.IO.File.Exists(hs.getapppath & "\adstation.txt") Then
    
        Dim objReader As New System.IO.StreamReader(hs.getapppath & "\adstation.txt")
    
        WebStr = objReader.ReadToEnd
    
        hs.writelog("RAW", WebStr)
        Dim SplitStr() As String = Split(WebStr, "#")
    
        If UBound(SplitStr) > 0 Then
    
    	hs.writelog("Light", convert.todouble(Regex.Replace(SplitStr(2), "[^\d|\.\-]", "")) & " lux")
            hs.writelog("Temp", convert.todouble(Regex.Replace(SplitStr(3), "[^\d|\.\-]", "")) & " C")
            hs.writelog("Humidity", convert.todouble(Regex.Replace(SplitStr(4), "[^\d|\.\-]", "")) & " %")
    
        End If
    
        'delete the file, not absolutely needed but prevents the data caching
    
        objReader.Close()
    
    
        System.IO.File.Delete(hs.getapppath & "\adstation.txt")
    
        End If
    
        Catch Ex As Exception
    
        hs.writelog("Arduino Web", "Error: " & ex.message)
    
        End Try
    
    End Sub
    Basically it tries to download the file and has a 10 second timeout, it downloads it to your root HS directory as a text file, attempts to parse it and then deletes the file. There is no absolute need for it to delete the file (it will overwrite it if it finds it) but it prevents the webserver going down and it still reading old data and the script not knowing that it is actually old data it is viewing.

    Leave a comment:


  • mrhappy
    replied
    I'm not sure that you can with the HS scripting commands to download a URL, it would be possible to switch to a .net method of downloading a file (the simplest way being the my.computer.downloadfile), that command can accept a timeout and then you would download the page and reopen it in the script. What sort of time delay are you experiencing?

    Leave a comment:


  • travisdh
    replied
    Timeout

    Just a quick question, is there any way to extend the amount of time the script will wait to get the page to load. The page is normally slow to load (because it is based on an arduino, over a long cable etc) and i get lots of timed out. If i could get it to wait a bit longer it would work more reliably.

    Thanks

    Leave a comment:


  • mrhappy
    replied
    Originally posted by travisdh View Post
    Thank you very much,
    the dark art of regex appears to have worked well. So to understand, the d allows for extraction of any digit, does this mean that it can be positive or negative, or would it just extract the digit regardless of if positive or negative?

    Once again thanks for your help, this is perfect!
    Very valid point and one I forgot about, try this;

    Code:
    Imports System.Text.RegularExpressions
    
    Sub Main(ByVal Parms As Object)
    
        Dim WebStr As String = "Arduino powered webserver#Serving temperature and humidity values from a DHT22 sensor#Light (Lux): 
    
    0#Temperature (oC): -24.40#Humidity (%): +54.00#"
    
        'Dim Path As String = "http://192.168.1.178/"
        'Dim WebStr As String = hs.geturl(Path, "/", FALSE, 80)
    
        hs.writelog("RAW", WebStr)
        Dim SplitStr() As String = Split(WebStr, "#")
    
        'For i As Integer = 0 To UBound(SplitStr) - 1
        'hs.writelog(i, SplitStr(i))
        'Next
    
        If UBound(SplitStr) > 0 Then
    
            hs.writelog("Light", convert.todouble(Regex.Replace(SplitStr(2), "[^\d|\.\-]", "")) & " lux")
            hs.writelog("Temp", convert.todouble(Regex.Replace(SplitStr(3), "[^\d|\.\-]", "")) & " C")
            hs.writelog("Humidity", convert.todouble(Regex.Replace(SplitStr(4), "[^\d|\.\-]", "")) & " %")
    
    
        End If
    
    End Sub
    Different regex string, should handle negative numbers and decimals now.

    Leave a comment:


  • travisdh
    replied
    Thank you very much,
    the dark art of regex appears to have worked well. So to understand, the d allows for extraction of any digit, does this mean that it can be positive or negative, or would it just extract the digit regardless of if positive or negative?

    Once again thanks for your help, this is perfect!

    Leave a comment:


  • mrhappy
    replied
    It was poor on my part really as I knew this was going to come up, I would say you have a couple of options;

    1) A further split of each return on the space character, so you would have 0 as the title of the return and 1 as the value.
    2) A replace that replaces the string with nothing
    3) The dark art of regex, this is an example which may work with a regex.

    Code:
    Imports System.Text.RegularExpressions
    
    Sub Main(ByVal Parms As Object)
    
        Dim WebStr As String = "Arduino powered webserver#Serving temperature and humidity values from a DHT22 sensor#Light (Lux): 0#Temperature (oC): 24.40#Humidity (%): 54.00#"
    
        'Dim Path As String = "http://192.168.1.178/"
        'Dim WebStr As String = hs.geturl(Path, "/", FALSE, 80)
    
        hs.writelog("RAW", WebStr)
        Dim SplitStr() As String = Split(WebStr, "#")
    
        'For i As Integer = 0 To UBound(SplitStr) - 1
        'hs.writelog(i, SplitStr(i))
        'Next
    
        If UBound(SplitStr) > 0 Then
    
            hs.writelog("Light", convert.todouble(Regex.Replace(SplitStr(2), "[\D]", "")) & " lux")
            hs.writelog("Temp", convert.todouble(Regex.Replace(SplitStr(3), "[\D]", "")) / 100 & " C")
            hs.writelog("Humidity", convert.todouble(Regex.Replace(SplitStr(4), "[\D]", "")) / 100 & " %")
    
    
        End If
    
    End Sub

    Leave a comment:


  • travisdh
    replied
    Thanks for your help so far, for some reason it was fine to extract the parts from your string, but the web site it struggled with, so i modified the website code (run from an arduino) to include # before the line breaks and it now splits the string perfectly.

    Quick question, the temperature & humidity are easy ones because they are fixed, i.e: temperature should always be 5 digits (i hope), but with light as an example, and applying the same to temperature and humidity, how can I parse the string if the length of the integer changes, for example instead of one digit :9:, light goes up to 4 digits, and
    with temperature and humidity, instead of one digit it has six in the case of temperature to include a negative, or four if it is 6.50 Deg C etc, same with humidity.

    For example light at the moment is 9 because it is dark in the room, but when it is sunny outside it might go up to 1024, so i cant apply a simple trim for 5 digits.

    Code:
    [FONT=Century Gothic][SIZE=2]Arduino powered webserver#<br /> Serving temperature and humidity values from a DHT22 sensor#<br /> Light : 16 #<br /> Temperature (oC): 22.30 #<br /> Humidity (%): 56.50 #<br />[/SIZE][/FONT][FONT=Century Gothic][SIZE=2][/SIZE][/FONT]


    The modified vb code is below:

    Code:
    Sub Main(ByVal Parms As Object)
    
    'Dim WebStr As String = "Arduino powered webserver<br>Serving temperature and humidity values from a DHT22 sensor<br>Light (Lux): 0<br>Temperature (oC): 24.40<br>Humidity (%): 54.00<br>"
    
    Dim Path As String = "http://192.168.1.178/"
    Dim WebStr As String = hs.geturl(Path, "/", FALSE, 80)
    
    hs.writelog("RAW", WebStr)
    Dim SplitStr() As String = Split(WebStr, "#")
    For i As Integer = 0 to UBound(SplitStr) - 1
    hs.writelog(i, SplitStr(i))
    Next
    
    If UBound(SplitStr) > 0 Then
    
    hs.writelog("Light", Trim(Right(SplitStr(2), 6)))
    hs.writelog("Temp", Trim(Right(SplitStr(3), 6)) & " C")
    hs.writelog("Humidity", Trim(Right(SplitStr(4), 6)) & " %")
    
    
    End If
    
    End Sub
    This results in the log below:

    Code:
    3/3/2012 10:01:55 AM 	Humidity 	56.50 %
    3/3/2012 10:01:55 AM 	Temp 	22.30 C
    3/3/2012 10:01:55 AM 	Light 	: 15
    3/3/2012 10:01:55 AM 	4 	Humidity (%): 56.50
    3/3/2012 10:01:55 AM 	3 	Temperature (oC): 22.30
    3/3/2012 10:01:55 AM 	2 	Light : 15
    3/3/2012 10:01:55 AM 	1 	Serving temperature and humidity values from a DHT22 sensor
    3/3/2012 10:01:55 AM 	0 	Arduino powered webserver
    3/3/2012 10:01:55 AM 	RAW 	Arduino powered webserver#
                                            Serving temperature and humidity values from a DHT22 sensor#
                                            Light : 15 #
                                            Temperature (oC): 22.30 #
                                            Humidity (%): 56.50 #
    Once again thanks for all the help, I am starting to understand the Right and Left, just not how to get the length of a certain char to determine the length of the integer or string.

    Leave a comment:

Working...
X