-
July 3rd, 2006, 04:10 PM
#1
Regular Expressions in html with multiline
Hi everyone
I have html page that contains :
Code:
....
<TR ^M
class="regular
">^M
<TD class="regular" height="18" width="81"><A href="/wps/portal/!ut/p/_s.7_0_A/7_0_4BC?tabOrder=1&
;symbol=1010"^M
>Title1</A></TD>^M
<TD class="table_numbers" height="18">98.00</TD>^M
<TD class="table_numbers" height="18">228</TD>^M
<TD class="index_down" height="18"> -2.50</TD>^M
<TD class="index_down" height="18">-2.49</TD>^M
<TD class="table_numbers" height="18">342</TD>^M
<TD class="table_numbers" height="18">209,005 </TD>^M
<TD class="table_numbers" height="18">98.50</TD>^M
<TD class="table_numbers" height="18">100</TD>^M
<TD class="table_numbers" height="18">99.00</TD>^M
<TD class="table_numbers" height="18">1,847</TD>^M
<TD class="table_numbers" height="18">100.00</TD>^M
<TD class="table_numbers" height="18">100.00</TD>^M
<TD class="table_numbers" height="18">98.00</TD> ^M
</TR>^M
^M
<TR ^M
class="table_back
">^M
<TD class="regular" height="18" width="81"><A href="/wps/portal/!ut/p/_s.7_0_A/7_0_4BC?tabOrder=1&symbol=1020"^M
>Title2</A></TD>^M
<TD class="table_numbers" height="18">369.00</TD>^M
<TD class="table_numbers" height="18">4,822</TD>^M
<TD class="index_up" height="18"> 25.25</TD>^M
<TD class="index_up" height="18">7.35</TD>^M
<TD class="table_numbers" height="18">3,294</TD>^M
<TD class="table_numbers" height="18">1,620,903 </TD>^M
<TD class="table_numbers" height="18">355.50</TD>^M
<TD class="table_numbers" height="18">139</TD>^M
<TD class="table_numbers" height="18">354.00</TD>^M
<TD class="table_numbers" height="18">219</TD>^M
<TD class="table_numbers" height="18">359.50</TD>^M
<TD class="table_numbers" height="18">375.00</TD>^M
<TD class="table_numbers" height="18">352.00</TD> ^M
</TR>^M
^M
<TR ^M
...
....
is it possible to get the every title and it's number in array ?
for example
Code:
<TR ^M
class="table_back
">^M
<TD class="regular" height="18" width="81"><A href="/wps/portal/!ut/p/_s.7_0_A/7_0_4BC?tabOrder=1&symbol=1020"^M
>Title2</A></TD>^M
<TD class="table_numbers" height="18">369.00</TD>^M
<TD class="table_numbers" height="18">4,822</TD>^M
<TD class="index_up" height="18"> 25.25</TD>^M
<TD class="index_up" height="18">7.35</TD>^M
<TD class="table_numbers" height="18">3,294</TD>^M
<TD class="table_numbers" height="18">1,620,903 </TD>^M
<TD class="table_numbers" height="18">355.50</TD>^M
<TD class="table_numbers" height="18">139</TD>^M
<TD class="table_numbers" height="18">354.00</TD>^M
<TD class="table_numbers" height="18">219</TD>^M
<TD class="table_numbers" height="18">359.50</TD>^M
<TD class="table_numbers" height="18">375.00</TD>^M
<TD class="table_numbers" height="18">352.00</TD> ^M
</TR>^M
will be :
Title==> Title2
price1 ==> 25.25
price2 ==> 7.35
price3 ==> 3,294
price4 ==>1,620,903
price5 ==>355.50
price6 ==> 139
price7 ==> 354.00
price8 ==> 219
price9 ==> 359.50
price10==> 375.00
price11 ==>352.00
I can't think about anyway, can you ?
-
July 4th, 2006, 01:09 AM
#2
Re: Regular Expressions in html with multiline
You can use XML for this.
example :
catelog.xml
----------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<CATALOG>
<ITEM>
<TITLE>Park Avenue</TITLE>
<PRICE>110.00</PRICE>
</ITEM>
<ITEM>
<TITLE>Pears</TITLE>
<PRICE>120.00</PRICE>
</ITEM>
</CATALOG>
test.html
-----------------
<html>
<body>
<xml id="catelog" src="catelog.xml"></xml>
<table border="1" datasrc="#catelog">
<tr>
<td><span datafld="TITLE"></span></td>
<td><span datafld="PRICE"></span></td>
</tr>
</table>
</body>
</html>
-
July 4th, 2006, 06:35 AM
#3
Re: Regular Expressions in html with multiline
Thanks but I can't control the html page
Basically I fetch the page from a website so I can’t control the format
-
July 4th, 2006, 01:54 PM
#4
Re: Regular Expressions in html with multiline
Hi everyone,
I wrote this code and it seems it's work
First split the html using the html code <TD class="regular" height="18" width="81">
Code:
String myContent = new String (Content);
String[] ary = myContent.split("<(?i)td class=\"regular\" height=\"18\" width=\"81\">");
System.out.println("# of Stocks: " + (ary.length-1));
then loop in other method (getStockItem(ary[i])) to get the prices
Code:
for(int i=1;i<ary.length;i++){ List_Of_Stocks.add(getStockItem(ary[i]));}
Code:
private static Collection<String> getStockItem(String Item){
Pattern pattern = null;
Matcher matcher = null;
//String[] ItemInfo = new String[4];
Collection<String> ItemInfo = new ArrayList<String>();
Item = Item.replaceAll("[\n\t\r]","");
pattern = Pattern.compile( "<A href=[^>]*+>([^><]*)</a>", Pattern.CASE_INSENSITIVE );
matcher = pattern.matcher( Item );
if(matcher.find()){
ItemInfo.add(matcher.group().replaceAll("<.*?>","").trim()) ;
}else{
ItemInfo.add("N/A");
System.err.println("no name!!");
}
pattern = Pattern.compile( "<(?i)td (?i)class=[^>]*+>([^><]*)</(?i)td>", Pattern.CASE_INSENSITIVE ) ;
matcher = pattern.matcher( Item ) ;
while(matcher.find()){
ItemInfo.add(matcher.group().replaceAll("<.*?>","").trim());
}
return ItemInfo;
}
What do you think about the code ?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|