-
January 26th, 2011, 09:43 PM
#1
Looping Issue While Parsing...Help Please.
Hello, I'm trying to write a HTML parser that will find links then store them. However when given a simple input page example it only runs and finds the first link, then quits. I assume its a simple looping issue I'm having. Any help is appreciated, Thanks.
The Input:
Code:
<HTML><HEAD><TITLE>Test</TITLE></HEAD><BODY><a href="index.html">Hello World.</a><a href="secondURL.html">Another Link</a><a href="yetanotherone2.html">Last One</a></BODY></HTML>
Code where I'm trying to grab them from a string that HTML is saved to.
Code:
for(index1=0;index1<savedLines.length();)
{
index1=savedLines.indexOf("href",index1);
System.out.println("First href" + index1);
index1=savedLines.indexOf("\"", index1);
System.out.println("First Quote: " + index1);
index2=savedLines.indexOf("\"", (index1+1));
System.out.println("Second Quote: " + index2);
String NewSTR=savedLines.substring((index1+1),index2);
System.out.println("New String: "+ NewSTR);
if(index1==index2);
{
dispose();
System.exit(0);
}
index1=index2;
}
-
January 27th, 2011, 01:56 AM
#2
Re: Looping Issue While Parsing...Help Please.
You seem to have some very odd counting going on there. First off you aren't incrementing the loop. You do set the index1 to the value of index2 which I assume is an attempt to increment the loop towards savedLines.length(), but nothing is guarantying that it is systematically incrementing its way toward that value without jumping right past it in the first run through the loop. Plus you are using your counting variable in the loop. So index1 gets what ever value is in index2 while index2 got some value from value derived from an indexOf() method call using index1 as a base.
Code:
index2=savedLines.indexOf("\"", (index1+1)); //seems odd to use the count this way
what is savedLines anyway? A String? A Collection?
I think you are trying to get your loop to do too much. I would be for using the enhanced for loop to iterate over your data structure or, if it is a collection then use an iterator.
Good luck.
-
January 27th, 2011, 02:29 AM
#3
Re: Looping Issue While Parsing...Help Please.
Saved Lines is a string where I stored the HTML I pulled from website. I've progressed to the following code. Still not sure how to get my loop to work correctly. What I am trying to get it to do in a nutshell is:
- Grab index1(start of the link)
- Grab index2(end of the link)
- Take subString of link
- Add grabbed link to JList on GUI
- Delete Junk Before the link
- Rinse and Repeat til end of String
It does all of this, however I can't get it to progress. How would I go about progressing to a enhanced for loop, Not sure I quite understand what I've looked up thus far? I want the loop to look til the end of the string. Then quit.
Code:
StringBuffer parseBuffer = new StringBuffer(savedLines);
System.out.println("String Length: "+ parseBuffer.length());
for(int i=0; i<parseBuffer.length();i++)
{
index1=parseBuffer.indexOf("href",index1);
System.out.println("First href" + index1);
index1=parseBuffer.indexOf("\"", index1);
System.out.println("First Quote: " + index1);
index2=parseBuffer.indexOf("\"", (index1+1));
System.out.println("Second Quote: " + index2);
String NewSTR=parseBuffer.substring((index1+1),index2);
linksList.add(NewSTR);
System.out.println("New String: "+ NewSTR);
parseBuffer.delete(0,(index1+1));
System.out.println(parseBuffer);
if(index1==index2);
{
dispose();
System.exit(0);
}
}
And the output:
Code:
<HTML><HEAD><TITLE>Test</TITLE></HEAD><BODY><a href="index.html">Hello World.</a><a href="secondURL.html">Another Link</a><a href="yetanotherone2.html">Last One</a></BODY></HTML>
String Length: 178
First href47
First Quote: 52
Second Quote: 63
New String: index.html
index.html">Hello World.</a><a href="secondURL.html">Another Link</a><a href="yetanotherone2.html">Last One</a></BODY></HTML>
Last edited by MadaNoswad; January 27th, 2011 at 02:32 AM.
-
January 27th, 2011, 09:06 AM
#4
Re: Looping Issue While Parsing...Help Please.
I can't see all of the code, but if the HTML is all on one line then it may be only going through the loop once. Have you printed out a "hello" inside of the loop to see how many times it is running? Are you getting to the System.exit(1) ever inside of the if statement?
-
January 28th, 2011, 04:27 PM
#5
Re: Looping Issue While Parsing...Help Please.
My guess would be <code>parseBuffer.delete(0,(index1+1));</code> is messing with your indexing...remove this line
-
January 28th, 2011, 10:10 PM
#6
Re: Looping Issue While Parsing...Help Please.
The problem is you are partly using 2 different indexing strategies and they are interferring with each other.
You are iterating over the string char by char and so when you find a match you need to either jump past the end of the found text or delete the found text.
To jump past the found text set 'i' to index 2 and do not delete anything from the string.
Or
If you want to delete the found text you need to set the indexing variable 'index1' and 'i' to 0 so they start searching from the new begining of the string and you need to delete everything up to 'index2'.
With either strategy you should be testing to see if the search for 'href' succeeded or not before proceeding.
-
February 1st, 2011, 07:47 PM
#7
Re: Looping Issue While Parsing...Help Please.
Thanks for the suggestions but managed to solve it all on my own. I was having a "off" day when I started this project. Managed to prevail and now just need to set up the mailto search for e-mail addresses and i should be complete.
Have a look, and any suggestions are appreciated.
Code:
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import java.io.*;
import java.net.*;
import java.util.Vector;
import java.util.*;
//=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
public class ADProjectOne
{
public static void main(String args[])
{
JFrame f;
f=new ADProjectOneFrame();
f.setVisible(true);
}
}
//=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
class ADProjectOneFrame extends JFrame implements ActionListener
{
JTextField URLtf;
JButton button;
JLabel label;
JLabel linkLabel;
JLabel emailLabel;
JLabel blank;
JPanel mainPanel;
public JList linkList;
public JList emailList;
Container cp;
public String getURL="";
public URL myURL;
public URLConnection myURLConnection;
public InputStream myInputStream;
public InputStreamReader myInputStreamReader;
public BufferedReader myHTMLReader;
public static Vector<String> linksVector=new Vector<String>();
public boolean allDone=false;
public DefaultListModel lm;
String savedLines="";
ADProjectOneFrame()
{
displayGUI();
setupMainFrame();
}
//=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
void setupMainFrame()
{
Toolkit tk;
Dimension d;
tk = Toolkit.getDefaultToolkit();
d = tk.getScreenSize();
setSize(d.width/2, d.height/2);
setLocation(d.width/5,d.height/5);
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
setTitle("Project #1 - URL Search");
setVisible(true);
}
public void actionPerformed(ActionEvent e)
{
if(e.getActionCommand().equals("GO"));
{
grabURL();
parseURL();
}
}
//=========================( Getting the URL Entered )================================
public void grabURL()
{
try
{
getURL=URLtf.getText();
System.out.println("URL to be grabbed: "+ getURL);
}
catch(Exception e)
{
System.out.println("Problem in getURL(), Check for Errors");
}
}
public void displayGUI()
{
label = new JLabel("URL to search");
linkLabel = new JLabel("Links Found:");
emailLabel = new JLabel("E-mails Found: ");
blank = new JLabel();
URLtf = new JTextField("http://www2.fairmontstate.edu/users/adawson/");
button = new JButton("Go!");
button.setActionCommand("GO");
button.addActionListener(this);
lm = new DefaultListModel();
linkList = new JList(lm);
linkList.setSelectionMode(ListSelectionModel.SINGLE_SELECTION);
linkList.setSelectedIndex(0);
emailList = new JList();
emailList.setSelectionMode(ListSelectionModel.SINGLE_SELECTION);
emailList.setSelectedIndex(0);
mainPanel=new JPanel(new GridLayout(4,2));
mainPanel.add(label);
mainPanel.add(URLtf);
mainPanel.add(linkLabel);
mainPanel.add(emailLabel);
mainPanel.add(linkList);
mainPanel.add(emailList);
mainPanel.add(blank);
mainPanel.add(button);
cp=getContentPane();
cp.add(mainPanel);
}
//=========================( Parsing the URL Entered )================================
public void parseURL()
{
int index1=0;
int index2=0;
try
{
myURL = new URL(getURL);
System.out.println(myURL);
myURLConnection = myURL.openConnection();
myInputStream= myURLConnection.getInputStream();
myInputStreamReader=new InputStreamReader(myInputStream);
myHTMLReader=new BufferedReader(myInputStreamReader);
String inputLine;
while((inputLine = myHTMLReader.readLine()) != null)
{
savedLines+=inputLine;
}
myHTMLReader.close();
System.out.println(savedLines);
}
catch(MalformedURLException e)
{
System.out.println("Sorry, Problem with URL. Try Again.");
}
catch(IOException e)
{
System.out.println("Problem with openConnection...Check it Out Fool");
}
StringBuffer parseBuffer = new StringBuffer(savedLines);
System.out.println("String Length: "+ parseBuffer.length());
int cnt;
while(index1<parseBuffer.length() && allDone!=true)
{
index1=parseBuffer.indexOf("<a",index1);
if(index1<0)
{
System.out.println("Parse Complete.");
allDone=true;
}
System.out.println("First href" + index1);
index1=parseBuffer.indexOf("\"", index1);
System.out.println("First Quote: " + index1);
index2=parseBuffer.indexOf("\"", (index1+1));
System.out.println("Second Quote: " + index2);
String NewSTR=parseBuffer.substring((index1+1),index2);
linksVector.add(NewSTR); // NewSTR added to Vector
System.out.println("New String: "+ NewSTR);
index1=index2;
System.out.println("---------------------------------------");
}
System.out.println("Links Vector Size:" + linksVector.size());
for (int v = 0; v < linksVector.size(); v++)
{
lm.addElement(linksVector.elementAt(v));
System.out.println("Vector Added: " + linksVector.elementAt(v));
}
mainPanel.repaint();
}
}
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|