CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 15 of 15
  1. #1
    Join Date
    Nov 2003
    Posts
    15

    About getHTMLSource

    Dear All,
    I am a Fresher in Java. Recently, I would like to write a progarm that can download the HTML file if the URL that i specify.
    However, i have no idea about how to use getHTMLSource(). I am using J2SDK1.4.2.1 . Would any person here can show me a simple getHTMLSource() program so that i can save the HTML file into a text file. Thanks

  2. #2
    Join Date
    Jan 2002
    Location
    Halifax, NS, Canada
    Posts
    985
    You could use URL.openStream(), and read the contents using an InputStream but I hope you are not trying to steal copyrighted material

  3. #3
    Join Date
    Nov 2003
    Posts
    15
    Thanks for your reply,
    i also want to know how can i use the downloaded and modified HTML and "reply" to the web site? For Example, i download a form from a web site and after fill in the form with my program( adding contents into the HTML) i want to sent it back to the web site. How can i do it ?

  4. #4
    Join Date
    Mar 2000
    Location
    Vancouver, BC, Canada
    Posts
    278
    are you doing it in a servlet? You may be able to figure out a way to submit the info using the request dispatch object somehow. That's an interesting question you have...when I get home tonight I'll check my servlet books to see if they have a sample for that

  5. #5
    Join Date
    Nov 2003
    Posts
    15
    Thanks a lot

  6. #6
    Join Date
    Oct 2003
    Location
    .NET2.0 / VS2005 Developer
    Posts
    7,104
    I recomment you look up one of the following components:

    HTTPUnit - http://httpunit.sourceforge.net/

    HTMLUnit - http://htmlunit.sourceforge.net/

    HTTPClient - http://jakarta.apache.org/commons/httpclient/


    all of these simplify the way to interact with an HTTP server on some level..

    however, if the academic excercise is to interact with the HTTP server yourself, i recommend either the mentioned approach of URL, URLConnection, getContent() etc.. (though thre are specifics to this tha tmust be observed, mentioned later)

    or Socket programming (not as hard as you think) directly


    In submitting forms you need to be aware that it too, is not as technical as it seems. There are 2 form types:

    GET
    POST

    a GET type form, causes the browser to tag the information along on the URL, separating initially with a ? and every other time, with an &.. like this:

    (fill in the form on the forums page and press submit..)
    Code:
     GET http://www.codeguru.com/whatever/forums/newreply.php?username=FRED&password=SMITH&postID=123456
    the other method, POST, the browser sends the data as name=value pairs, after the HTTP headers. THis is an example of what the browser transmits:

    Code:
    POST http://www.codeguru.com/whatever/forums/newreply.php HTTP/1.0
    Referer: http://www.somedomain.com/Direcory/file.html
    User-Agent: blah MSIE6
    Accept */*
    Content-type: application/x-www-form-urlencoded
    Content-length: 42
    
    username=FRED&password=SMITH&postID=123456
    the content length, 42, is equal to the length of the string we are sending the server (username=FRED&password=SMITH&postID=123456)


    another note, at this point.. the URL set of classes, are for handling ANY type of url.. not just http/html ones
    You should examine HttpURLConnection:

    http://java.sun.com/j2se/1.4.2/docs/...onnection.html

    heres the POST/GET method we were talking about earlier:
    http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)

    heres how you add the name=value pairs.. in our example: username=FRED:
    http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)

    -

    so what you do, if using this way, is:

    Make a new HttpURLConnection, set it's method to GET, the URL to whatever, and tell it to go. Then get the Content, read through it, parse out the forms..

    Find out the forms's METHOD (the tag looks like <FORM action=URL method=METHOD>) from the method= paramter.. find out the forms TARGET, with the action= parameter, then alter your HttpURLConnection, so its method is METHOD, the url is the form TARGET..
    then setRequestProperty repeatedly for all the name/value pairs in the form:

    Code:
    <FORM action="http://somesite.com/prog/adduser" method="post">
        <P>
        <LABEL for="firstname">First name: </LABEL>
                  <INPUT type="text" id="firstname"><BR>
        <LABEL for="lastname">Last name: </LABEL>
                  <INPUT type="text" id="lastname"><BR>
        <LABEL for="email">email: </LABEL>
                  <INPUT type="text" id="email"><BR>
        <INPUT type="radio" name="sex" value="Male"> Male<BR>
        <INPUT type="radio" name="sex" value="Female"> Female<BR>
        <INPUT type="submit" value="Send"> <INPUT type="reset">
        </P>
     </FORM>
    you can identify the form fragments:
    firstname=(something)
    lastname=(something)
    email=(something)

    work out how to make your java program read this form, "fill it out" by adding the parameters to the request property, and then send it
    Last edited by cjard; December 1st, 2003 at 05:29 AM.
    "it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

  7. #7
    Join Date
    Nov 2003
    Posts
    15
    Thanks for your reply, you help me a lot.
    Now i can get the HTML source code from the URL and add the content i want to add. However, i still feel confusing about how to sent back my modified HTML back to the server?
    Is there any API i can use ?

    Thanks a lot

  8. #8
    Join Date
    Oct 2003
    Location
    .NET2.0 / VS2005 Developer
    Posts
    7,104
    Originally posted by geo23
    Thanks for your reply, you help me a lot.
    Now i can get the HTML source code from the URL and add the content i want to add. However, i still feel confusing about how to sent back my modified HTML back to the server?
    Is there any API i can use ?

    Thanks a lot
    i gave it to you with these links:

    http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)
    http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)
    "it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

  9. #9
    Join Date
    Oct 2003
    Location
    .NET2.0 / VS2005 Developer
    Posts
    7,104
    also, you dont ever send HTML to the server; it knows not, nor cares not,for html

    re-read my post about how forms actually work
    "it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

  10. #10
    Join Date
    Nov 2003
    Posts
    15
    Thanks for your help again , and you are very nice...

    However, as i am still the beginner of Java.( also because i am stupid) . I am not too understand the setRequestMethod().
    And also the principle behind. Would you mind telling me a bit more detail ?

  11. #11
    Join Date
    Oct 2003
    Location
    .NET2.0 / VS2005 Developer
    Posts
    7,104
    Originally posted by geo23
    Thanks for your help again , and you are very nice...

    However, as i am still the beginner of Java.( also because i am stupid) . I am not too understand the setRequestMethod().
    And also the principle behind. Would you mind telling me a bit more detail ?
    youre using a web browser
    the web browser emits some text to the COdeGuru.COM webserver
    the webserver sends a response
    the text that the browser emits, looks like this (at its very simplest):

    GET Http://www.host.com/path/file.html HTTP/1.0

    the server responds with:

    HTTP/1.0 200 OK

    and thats it.
    someone decided that it would also be good to give more information ABOUT the file, BEFORE the file is sent

    HTTP/1.0 200 OK
    content-length: 1234
    content-type: text/html

    these extra bits of information are called RESPONSE HEADERS.
    someone also decided that it would be good if the browser could give some info to the web server. the webserver MIGHT use this to change what it sends back.

    GET http://www.host.com/path/file.html HTTP/1.0
    accept-encoding: */*
    user-agent: Mozilla (MSIE 6.0)

    these are called REQUEST HEADERS
    then, someone thought, well there might be a situation where someone might send something to the web server. i.e. something might be POSTed to it.

    POST http://www.host.com/path/file.jsp HTTP/1.0

    so POST and GET exist; post is to send stuff, GET is to retrieve stuff. Both methods can send bits of info to the web server in order to change what is returned, using the REQUEST HEADERS

    witha POST request, however, there is also (usually) some data sent to the web server. data is the first thing after the first blank line:

    POST url HTTP/1.0
    header: value
    header: value
    header: value

    data data data data - this is the data that is being posted


    HttpURLConnection takes care of much of this for you, by formatting and presenting correctly, the HEADERS..
    you create the data you want to post, and post it by writing it to the HttpURLConnections's output stream.
    it is important to note that the data should be UTF-8 encoded..

    -

    so what is the process?

    issue a GET request (first link in my first post)
    read the html, get the url for the form, and the method
    build a string of the elements you find in the form
    mimic the form by setting the URL of your HttpURLConnection to that url, set the request method (GET or POST) to the form's method (what you read in the html)
    if the method is GET concatenate your data line to the URL
    if the method is POST, use the normal URL, set the request header( "content-length: " + dataString.length() )
    then get a handle on the output stream, and write to it with an outputstream writer like PrintWriter

    here's some code:

    http://javaalmanac.com/egs/java.net/Post.html


    the following
    "it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

  12. #12
    Join Date
    Oct 2003
    Location
    .NET2.0 / VS2005 Developer
    Posts
    7,104
    i recommend you go to www.simtec.ltd.uk and download HTTPWatch 3,0

    then go to amazon.com and in the search the web box on the left, type HELLO

    now start httpwatch (click the icon at the top of the screen, to the far right of your icon bar with the back, forward, stop and refresh buttons. it looks like a paper towel (!) with a magnifying glass on top. the httpwatch window appears. click start

    now click GO on amazon

    and have a LOOK at what your browser sends/does...
    "it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

  13. #13
    Join Date
    Nov 2003
    Posts
    15
    Thanks again and again ,
    I got the basic idea now.
    One futher question........... what is UTF-8 encoded ?

  14. #14
    Join Date
    Nov 2003
    Posts
    15
    try {
    // Construct data
    String data = URLEncoder.encode("key1", "UTF-8") + "=" + URLEncoder.encode("value1", "UTF-8");
    data += "&" + URLEncoder.encode("key2", "UTF-8") + "=" + URLEncoder.encode("value2", "UTF-8");

    // Send data
    URL url = new URL("http://hostname:80/cgi");
    URLConnection conn = url.openConnection();
    conn.setDoOutput(true);
    OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
    wr.write(data);
    wr.flush();

    // Get the response
    BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
    String line;
    while ((line = rd.readLine()) != null) {
    // Process line...
    }
    wr.close();
    rd.close();
    } catch (Exception e) {
    }



    In this sample code? what is key1, value1, key2, value2?
    Are They something like: key1=name, value1=geo23
    key2=sex, value2=male..........
    in my POST request, i should fill in all keys in the form, right ?

  15. #15
    Join Date
    Nov 2003
    Posts
    15
    <TD width=150>NAME:</TD>
    <TD width=300><INPUT maxLength=30 size=30
    name=eng_name></TD></TR>


    in this example, key1=eng_name, and value1=geo23?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured