-
November 30th, 2003, 09:38 AM
#1
About getHTMLSource
Dear All,
I am a Fresher in Java. Recently, I would like to write a progarm that can download the HTML file if the URL that i specify.
However, i have no idea about how to use getHTMLSource(). I am using J2SDK1.4.2.1 . Would any person here can show me a simple getHTMLSource() program so that i can save the HTML file into a text file. Thanks
-
November 30th, 2003, 10:35 AM
#2
You could use URL.openStream(), and read the contents using an InputStream but I hope you are not trying to steal copyrighted material
-
November 30th, 2003, 11:59 AM
#3
Thanks for your reply,
i also want to know how can i use the downloaded and modified HTML and "reply" to the web site? For Example, i download a form from a web site and after fill in the form with my program( adding contents into the HTML) i want to sent it back to the web site. How can i do it ?
-
November 30th, 2003, 10:02 PM
#4
are you doing it in a servlet? You may be able to figure out a way to submit the info using the request dispatch object somehow. That's an interesting question you have...when I get home tonight I'll check my servlet books to see if they have a sample for that
-
December 1st, 2003, 12:05 AM
#5
Thanks a lot
-
December 1st, 2003, 05:27 AM
#6
I recomment you look up one of the following components:
HTTPUnit - http://httpunit.sourceforge.net/
HTMLUnit - http://htmlunit.sourceforge.net/
HTTPClient - http://jakarta.apache.org/commons/httpclient/
all of these simplify the way to interact with an HTTP server on some level..
however, if the academic excercise is to interact with the HTTP server yourself, i recommend either the mentioned approach of URL, URLConnection, getContent() etc.. (though thre are specifics to this tha tmust be observed, mentioned later)
or Socket programming (not as hard as you think) directly
In submitting forms you need to be aware that it too, is not as technical as it seems. There are 2 form types:
GET
POST
a GET type form, causes the browser to tag the information along on the URL, separating initially with a ? and every other time, with an &.. like this:
(fill in the form on the forums page and press submit..)
Code:
GET http://www.codeguru.com/whatever/forums/newreply.php?username=FRED&password=SMITH&postID=123456
the other method, POST, the browser sends the data as name=value pairs, after the HTTP headers. THis is an example of what the browser transmits:
Code:
POST http://www.codeguru.com/whatever/forums/newreply.php HTTP/1.0
Referer: http://www.somedomain.com/Direcory/file.html
User-Agent: blah MSIE6
Accept */*
Content-type: application/x-www-form-urlencoded
Content-length: 42
username=FRED&password=SMITH&postID=123456
the content length, 42, is equal to the length of the string we are sending the server (username=FRED&password=SMITH&postID=123456)
another note, at this point.. the URL set of classes, are for handling ANY type of url.. not just http/html ones
You should examine HttpURLConnection:
http://java.sun.com/j2se/1.4.2/docs/...onnection.html
heres the POST/GET method we were talking about earlier:
http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)
heres how you add the name=value pairs.. in our example: username=FRED:
http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)
-
so what you do, if using this way, is:
Make a new HttpURLConnection, set it's method to GET, the URL to whatever, and tell it to go. Then get the Content, read through it, parse out the forms..
Find out the forms's METHOD (the tag looks like <FORM action=URL method=METHOD>) from the method= paramter.. find out the forms TARGET, with the action= parameter, then alter your HttpURLConnection, so its method is METHOD, the url is the form TARGET..
then setRequestProperty repeatedly for all the name/value pairs in the form:
Code:
<FORM action="http://somesite.com/prog/adduser" method="post">
<P>
<LABEL for="firstname">First name: </LABEL>
<INPUT type="text" id="firstname"><BR>
<LABEL for="lastname">Last name: </LABEL>
<INPUT type="text" id="lastname"><BR>
<LABEL for="email">email: </LABEL>
<INPUT type="text" id="email"><BR>
<INPUT type="radio" name="sex" value="Male"> Male<BR>
<INPUT type="radio" name="sex" value="Female"> Female<BR>
<INPUT type="submit" value="Send"> <INPUT type="reset">
</P>
</FORM>
you can identify the form fragments:
firstname=(something)
lastname=(something)
email=(something)
work out how to make your java program read this form, "fill it out" by adding the parameters to the request property, and then send it
Last edited by cjard; December 1st, 2003 at 05:29 AM.
-
December 1st, 2003, 06:18 AM
#7
Thanks for your reply, you help me a lot.
Now i can get the HTML source code from the URL and add the content i want to add. However, i still feel confusing about how to sent back my modified HTML back to the server?
Is there any API i can use ?
Thanks a lot
-
December 1st, 2003, 06:31 AM
#8
Originally posted by geo23
Thanks for your reply, you help me a lot.
Now i can get the HTML source code from the URL and add the content i want to add. However, i still feel confusing about how to sent back my modified HTML back to the server?
Is there any API i can use ?
Thanks a lot
i gave it to you with these links:
http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)
http://java.sun.com/j2se/1.4.2/docs/...a.lang.String)
-
December 1st, 2003, 06:51 AM
#9
also, you dont ever send HTML to the server; it knows not, nor cares not,for html
re-read my post about how forms actually work
-
December 1st, 2003, 10:50 AM
#10
Thanks for your help again , and you are very nice...
However, as i am still the beginner of Java.( also because i am stupid) . I am not too understand the setRequestMethod().
And also the principle behind. Would you mind telling me a bit more detail ?
-
December 1st, 2003, 12:08 PM
#11
Originally posted by geo23
Thanks for your help again , and you are very nice...
However, as i am still the beginner of Java.( also because i am stupid) . I am not too understand the setRequestMethod().
And also the principle behind. Would you mind telling me a bit more detail ?
youre using a web browser
the web browser emits some text to the COdeGuru.COM webserver
the webserver sends a response
the text that the browser emits, looks like this (at its very simplest):
GET Http://www.host.com/path/file.html HTTP/1.0
the server responds with:
HTTP/1.0 200 OK
and thats it.
someone decided that it would also be good to give more information ABOUT the file, BEFORE the file is sent
HTTP/1.0 200 OK
content-length: 1234
content-type: text/html
these extra bits of information are called RESPONSE HEADERS.
someone also decided that it would be good if the browser could give some info to the web server. the webserver MIGHT use this to change what it sends back.
GET http://www.host.com/path/file.html HTTP/1.0
accept-encoding: */*
user-agent: Mozilla (MSIE 6.0)
these are called REQUEST HEADERS
then, someone thought, well there might be a situation where someone might send something to the web server. i.e. something might be POSTed to it.
POST http://www.host.com/path/file.jsp HTTP/1.0
so POST and GET exist; post is to send stuff, GET is to retrieve stuff. Both methods can send bits of info to the web server in order to change what is returned, using the REQUEST HEADERS
witha POST request, however, there is also (usually) some data sent to the web server. data is the first thing after the first blank line:
POST url HTTP/1.0
header: value
header: value
header: value
data data data data - this is the data that is being posted
HttpURLConnection takes care of much of this for you, by formatting and presenting correctly, the HEADERS..
you create the data you want to post, and post it by writing it to the HttpURLConnections's output stream.
it is important to note that the data should be UTF-8 encoded..
-
so what is the process?
issue a GET request (first link in my first post)
read the html, get the url for the form, and the method
build a string of the elements you find in the form
mimic the form by setting the URL of your HttpURLConnection to that url, set the request method (GET or POST) to the form's method (what you read in the html)
if the method is GET concatenate your data line to the URL
if the method is POST, use the normal URL, set the request header( "content-length: " + dataString.length() )
then get a handle on the output stream, and write to it with an outputstream writer like PrintWriter
here's some code:
http://javaalmanac.com/egs/java.net/Post.html
the following
-
December 1st, 2003, 12:28 PM
#12
i recommend you go to www.simtec.ltd.uk and download HTTPWatch 3,0
then go to amazon.com and in the search the web box on the left, type HELLO
now start httpwatch (click the icon at the top of the screen, to the far right of your icon bar with the back, forward, stop and refresh buttons. it looks like a paper towel (!) with a magnifying glass on top. the httpwatch window appears. click start
now click GO on amazon
and have a LOOK at what your browser sends/does...
-
December 1st, 2003, 01:27 PM
#13
Thanks again and again ,
I got the basic idea now.
One futher question........... what is UTF-8 encoded ?
-
December 1st, 2003, 02:11 PM
#14
try {
// Construct data
String data = URLEncoder.encode("key1", "UTF-8") + "=" + URLEncoder.encode("value1", "UTF-8");
data += "&" + URLEncoder.encode("key2", "UTF-8") + "=" + URLEncoder.encode("value2", "UTF-8");
// Send data
URL url = new URL("http://hostname:80/cgi");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(data);
wr.flush();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
// Process line...
}
wr.close();
rd.close();
} catch (Exception e) {
}
In this sample code? what is key1, value1, key2, value2?
Are They something like: key1=name, value1=geo23
key2=sex, value2=male..........
in my POST request, i should fill in all keys in the form, right ?
-
December 1st, 2003, 02:30 PM
#15
<TD width=150>NAME:</TD>
<TD width=300><INPUT maxLength=30 size=30
name=eng_name></TD></TR>
in this example, key1=eng_name, and value1=geo23?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|