Click to See Complete Forum and Search --> : Help required with java programming


ark_123
December 31st, 2009, 12:17 AM
Need to know how to read an html file and then find the index of a string using indexOf function and use the replace function to replace that string with example.
Working with html file now...need to know how to find the index of "<h1>" and "</h1>" and then replace the string between the two with another string.
the index value when i tried was returned as -1. the file was not read
need help urgently!

Londbrok
December 31st, 2009, 03:54 AM
You can approach this in two ways.

1) Read the html document into xml using DOM or SAX. This approach is a bit more demanding, and might not work, because all html is not necessarily well formatted. Choosing this approach, however, is beneficial, because you would have more guarantee of a well formed and working result. String -approach is more error prone.

Suppose you use the SAX -parsing approach. You could write a parser, that is given a list of corrections. Such as a h1 -tags with "Some content" -as content, would be replaced to h1 -tag with "Some other content". These corrections would be made on the fly as you read the document, providing as result xml including the alterations. Might suit your needs.

2) Read the whole mess into a String. Create a regex pattern that searches for what you need to find. Replace the content with your correction. Note that tiniest of errors are probable to corrupt the html utterly. Hence, test the code with various different html -files. Make sure your regex patterns are to the money.

And, of course, you need to read the file.

Google Java SAX and Java regex for more information. Both are well covered in the net.