Regular Expressions in html with multiline
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4

Thread: Regular Expressions in html with multiline

  1. #1
    Join Date
    Jul 2006
    Posts
    4

    Thumbs up Regular Expressions in html with multiline

    Hi everyone

    I have html page that contains :

    Code:
    ....
                                    <TR ^M
                                            class="regular
    ">^M
                                            <TD class="regular" height="18" width="81"><A href="/wps/portal/!ut/p/_s.7_0_A/7_0_4BC?tabOrder=1&amp
    ;symbol=1010"^M
                                                    >Title1</A></TD>^M
                                            <TD class="table_numbers" height="18">98.00</TD>^M
                                            <TD class="table_numbers" height="18">228</TD>^M
                                            <TD class="index_down" height="18"> -2.50</TD>^M
                                            <TD class="index_down" height="18">-2.49</TD>^M
                                            <TD class="table_numbers" height="18">342</TD>^M
                                            <TD class="table_numbers" height="18">209,005 </TD>^M
                                            <TD class="table_numbers" height="18">98.50</TD>^M
                                            <TD class="table_numbers" height="18">100</TD>^M
                                            <TD class="table_numbers" height="18">99.00</TD>^M
                                            <TD class="table_numbers" height="18">1,847</TD>^M
                                            <TD class="table_numbers" height="18">100.00</TD>^M
                                            <TD class="table_numbers" height="18">100.00</TD>^M
                                            <TD class="table_numbers" height="18">98.00</TD> ^M
                                    </TR>^M
                                    ^M
                                    <TR ^M
                                            class="table_back
    ">^M
                                            <TD class="regular" height="18" width="81"><A href="/wps/portal/!ut/p/_s.7_0_A/7_0_4BC?tabOrder=1&amp;symbol=1020"^M
                                                    >Title2</A></TD>^M
                                            <TD class="table_numbers" height="18">369.00</TD>^M
                                            <TD class="table_numbers" height="18">4,822</TD>^M
                                            <TD class="index_up" height="18"> 25.25</TD>^M
                                            <TD class="index_up" height="18">7.35</TD>^M
                                            <TD class="table_numbers" height="18">3,294</TD>^M
                                            <TD class="table_numbers" height="18">1,620,903 </TD>^M
                                            <TD class="table_numbers" height="18">355.50</TD>^M
                                            <TD class="table_numbers" height="18">139</TD>^M
                                            <TD class="table_numbers" height="18">354.00</TD>^M
                                            <TD class="table_numbers" height="18">219</TD>^M
                                            <TD class="table_numbers" height="18">359.50</TD>^M
                                            <TD class="table_numbers" height="18">375.00</TD>^M
                                            <TD class="table_numbers" height="18">352.00</TD> ^M
                                    </TR>^M
                                    ^M
                                    <TR ^M
    ...
    ....

    is it possible to get the every title and it's number in array ?

    for example

    Code:
                                    <TR ^M
                                            class="table_back
    ">^M
                                            <TD class="regular" height="18" width="81"><A href="/wps/portal/!ut/p/_s.7_0_A/7_0_4BC?tabOrder=1&amp;symbol=1020"^M
                                                    >Title2</A></TD>^M
                                            <TD class="table_numbers" height="18">369.00</TD>^M
                                            <TD class="table_numbers" height="18">4,822</TD>^M
                                            <TD class="index_up" height="18"> 25.25</TD>^M
                                            <TD class="index_up" height="18">7.35</TD>^M
                                            <TD class="table_numbers" height="18">3,294</TD>^M
                                            <TD class="table_numbers" height="18">1,620,903 </TD>^M
                                            <TD class="table_numbers" height="18">355.50</TD>^M
                                            <TD class="table_numbers" height="18">139</TD>^M
                                            <TD class="table_numbers" height="18">354.00</TD>^M
                                            <TD class="table_numbers" height="18">219</TD>^M
                                            <TD class="table_numbers" height="18">359.50</TD>^M
                                            <TD class="table_numbers" height="18">375.00</TD>^M
                                            <TD class="table_numbers" height="18">352.00</TD> ^M
                                    </TR>^M
    will be :

    Title==> Title2
    price1 ==> 25.25
    price2 ==> 7.35
    price3 ==> 3,294
    price4 ==>1,620,903
    price5 ==>355.50
    price6 ==> 139
    price7 ==> 354.00
    price8 ==> 219
    price9 ==> 359.50
    price10==> 375.00
    price11 ==>352.00




    I can't think about anyway, can you ?

  2. #2
    Join Date
    Jun 2006
    Posts
    6

    Re: Regular Expressions in html with multiline

    You can use XML for this.

    example :

    catelog.xml
    ----------------------
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <CATALOG>
    <ITEM>
    <TITLE>Park Avenue</TITLE>
    <PRICE>110.00</PRICE>

    </ITEM>
    <ITEM>
    <TITLE>Pears</TITLE>
    <PRICE>120.00</PRICE>

    </ITEM>
    </CATALOG>

    test.html
    -----------------
    <html>
    <body>

    <xml id="catelog" src="catelog.xml"></xml>

    <table border="1" datasrc="#catelog">
    <tr>
    <td><span datafld="TITLE"></span></td>
    <td><span datafld="PRICE"></span></td>
    </tr>
    </table>

    </body>
    </html>

  3. #3
    Join Date
    Jul 2006
    Posts
    4

    Re: Regular Expressions in html with multiline

    Thanks but I can't control the html page

    Basically I fetch the page from a website so I canít control the format

  4. #4
    Join Date
    Jul 2006
    Posts
    4

    Re: Regular Expressions in html with multiline

    Hi everyone,

    I wrote this code and it seems it's work

    First split the html using the html code <TD class="regular" height="18" width="81">

    Code:
    		String myContent = new String (Content);
    	    String[] ary = myContent.split("<(?i)td class=\"regular\" height=\"18\" width=\"81\">");
    	    System.out.println("# of Stocks: " + (ary.length-1));
    then loop in other method (getStockItem(ary[i])) to get the prices

    Code:
    	    for(int i=1;i<ary.length;i++){ List_Of_Stocks.add(getStockItem(ary[i]));}

    Code:
    	private static Collection<String> getStockItem(String Item){
    		Pattern pattern = null;
    		Matcher matcher = null;
    		//String[] ItemInfo = new String[4];
    		Collection<String> ItemInfo = new ArrayList<String>();
    		
    		Item = Item.replaceAll("[\n\t\r]","");
    		
    		pattern = Pattern.compile( "<A href=[^>]*+>([^><]*)</a>", Pattern.CASE_INSENSITIVE  );
    		matcher = pattern.matcher( Item );
    		
    		if(matcher.find()){
    			ItemInfo.add(matcher.group().replaceAll("<.*?>","").trim()) ;
    		}else{
    			ItemInfo.add("N/A");
    			System.err.println("no name!!");
    		}
    		
    		
    		pattern = Pattern.compile( "<(?i)td (?i)class=[^>]*+>([^><]*)</(?i)td>", Pattern.CASE_INSENSITIVE ) ;
    		matcher = pattern.matcher( Item ) ;
    		
    
    		while(matcher.find()){
    				ItemInfo.add(matcher.group().replaceAll("<.*?>","").trim());
    		}
    		
    		return ItemInfo;
    	}


    What do you think about the code ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center