needed website info not shown in html

Printable View

November 2nd, 2009, 07:59 AM
dre003

needed website info not shown in html

hi,
i'm trying to make a simple program to track internet ******** using java... the idea is to paste the auction url to the program, and then java opens the url, reads the html and from that extracts needed info to string.. such as, item name, last bidder and so on...
well, up to this point everything is working fine, but i've come across a problem..
when i open the url and try to extract the "auction timer", the field of the html where the time is supossed to be is "--:--:--", while the website shows the correct countdown timer...
since i'm stuck w/ this problem for a while now, i tried to open the contents w/ firefox view page source and it still says "--:--:--", but when i open it with firebug i get the correct time from timer in the part of the html where it should be...
i uploaded a pic to http://img230.imageshack.us/img230/1671/53801451.png , where you see that for the same site i get different html if opened with firefox view source or java url and firebug inspect element...

any ideas on how to reach needed info (timer) so i can parse it to string in java?

i tried so far with htmlparsers and various dom manipulators and nothing seems to do the trick...

tnx in advance for the answers!
November 2nd, 2009, 09:38 AM
ProgramThis

Re: needed website info not shown in html

I can't open up the pic you presented (blocked at work) but can you actually see the DOM element id for the time/date field? When you try to pull the element, are you doing it in a servlet or script?

getElementById("")

If you are doing it in a servlet, are you using:
request.getAttribute("time")
or
request.getParameter("time")
November 2nd, 2009, 09:50 AM
dre003

Re: needed website info not shown in html

well.. the thing is i opened the url up using

URL s=new URL(URL);
BufferedReader in = new BufferedReader(new InputStreamReader(s.openStream()));
String line;

while ((red=in.readLine())!=null)
HTML+=line;

and then using regex split the info i needed... the problem is the info such as current top bidder and time and such aren't available in the html via java or view source option in firefox; BUT i downloaded firbug, and when i view the source or use firebugs "inspect element" option, all the info i need to grab appears in the fields where it needs to be... the pic exlains a lot...

i get the same results with

try
{
Parser parser = new Parser ("http://www.whatever");
NodeList list = parser.parse (null);
for (NodeIterator i = list.elements(); i.hasMoreNodes();){
System.out.print(list.asString());
d.processMyNodes(i.nextNode());
}

whatever i do i get --:--:-- where the time displayed od the website needs to be...
November 2nd, 2009, 04:53 PM
jcaccia

Re: needed website info not shown in html

I think what you see in firefox is the actual contents of the page you are reading from the site (the same you get with your app). The time must be updated by javascript once the page is loaded in the browser (--:--:-- is the placeholder for the time value). I don't think you will be able to get the time, except for perhaps the initial value for the timer div (if you can find where it is set).
November 3rd, 2009, 08:40 AM
ProgramThis

Re: needed website info not shown in html

He's right, if you are reading in a static html with bufferedreader and the time (which most likely is updated via js or jquery), you are never going to be able to get it.

Instead of reading the static html, why don't you (in Java) perform a POST and read the response? You 'should' be able to get the time element from there (assuming that they change the initial --:--:-- to an initial time on first entry to the page).
November 3rd, 2009, 04:02 PM
dre003

Re: needed website info not shown in html

interesting...
so far i get that the html is dynamic and javascript driven...
so basically what i need to do is, for example import a browser of some sort, open a page via browser, and while it's opened read the html... not the source html, but the current html...

samething like ProgramThis said... hm...
thank you both for the answers, you've been great help!!!
November 4th, 2009, 08:40 AM
ProgramThis

Re: needed website info not shown in html

Actually that is not what I am saying. What I am saying is that you can perform a GET or a POST and read the response in a Java class without a browser. You can read the same data that a browser would as text.

The code below should work. I am not sure why your Parser class that you are passing the URL doesn't work. With the code below I seem to be getting dynamic data from a few test sites.

Code:

public void connect() { try { String url = "http://www.stackoverflow.com/", proxy = "proxy.mydomain.com", port = "8080"; URL server = new URL(url); Properties systemProperties = System.getProperties(); systemProperties.setProperty("http.proxyHost",proxy); systemProperties.setProperty("http.proxyPort",port); HttpURLConnection connection = ( HttpURLConnection)server.openConnection(); connection.connect(); InputStream in = connection.getInputStream(); readResponse(in); } catch(Exception e) { e.printStackTrace(); } } public void readResponse(InputStream is) throws IOException { BufferedInputStream bis = new BufferedInputStream(is); ByteArrayOutputStream buf = new ByteArrayOutputStream(); int result = bis.read(); while(result != -1) { byte b = (byte)result; buf.write(b); result = bis.read(); } System.out.println(buf.toString()); }

Now, instead of printing the output to the command line (as in my example) you can parse the string for the element you are looking for. What is the URL you are trying to get data from? I would like to test it.