CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 7 of 7
  1. #1
    Join Date
    Nov 2010
    Posts
    15

    Looping Issue While Parsing...Help Please.

    Hello, I'm trying to write a HTML parser that will find links then store them. However when given a simple input page example it only runs and finds the first link, then quits. I assume its a simple looping issue I'm having. Any help is appreciated, Thanks.

    The Input:
    Code:
    <HTML><HEAD><TITLE>Test</TITLE></HEAD><BODY><a href="index.html">Hello World.</a><a href="secondURL.html">Another Link</a><a href="yetanotherone2.html">Last One</a></BODY></HTML>
    Code where I'm trying to grab them from a string that HTML is saved to.
    Code:
    for(index1=0;index1<savedLines.length();)
    {
    	index1=savedLines.indexOf("href",index1);
    	System.out.println("First href" + index1);
    	index1=savedLines.indexOf("\"", index1);
    	System.out.println("First Quote: " + index1);
    	index2=savedLines.indexOf("\"", (index1+1));
    	System.out.println("Second Quote: " + index2);
    	String NewSTR=savedLines.substring((index1+1),index2);
    	System.out.println("New String: "+ NewSTR);
    
    	if(index1==index2);
    	{
    			dispose();
    			System.exit(0);
    	}
    	index1=index2;
    }

  2. #2
    Join Date
    Jan 2011
    Location
    Tacoma, Washington
    Posts
    31

    Re: Looping Issue While Parsing...Help Please.

    You seem to have some very odd counting going on there. First off you aren't incrementing the loop. You do set the index1 to the value of index2 which I assume is an attempt to increment the loop towards savedLines.length(), but nothing is guarantying that it is systematically incrementing its way toward that value without jumping right past it in the first run through the loop. Plus you are using your counting variable in the loop. So index1 gets what ever value is in index2 while index2 got some value from value derived from an indexOf() method call using index1 as a base.

    Code:
    index2=savedLines.indexOf("\"", (index1+1));  //seems odd to use the count this way
    what is savedLines anyway? A String? A Collection?

    I think you are trying to get your loop to do too much. I would be for using the enhanced for loop to iterate over your data structure or, if it is a collection then use an iterator.

    Good luck.

  3. #3
    Join Date
    Nov 2010
    Posts
    15

    Re: Looping Issue While Parsing...Help Please.

    Saved Lines is a string where I stored the HTML I pulled from website. I've progressed to the following code. Still not sure how to get my loop to work correctly. What I am trying to get it to do in a nutshell is:
    - Grab index1(start of the link)
    - Grab index2(end of the link)
    - Take subString of link
    - Add grabbed link to JList on GUI
    - Delete Junk Before the link
    - Rinse and Repeat til end of String

    It does all of this, however I can't get it to progress. How would I go about progressing to a enhanced for loop, Not sure I quite understand what I've looked up thus far? I want the loop to look til the end of the string. Then quit.
    Code:
                  StringBuffer parseBuffer = new StringBuffer(savedLines);
    	System.out.println("String Length: "+ parseBuffer.length());
    
    	for(int i=0; i<parseBuffer.length();i++)
    	{
    		index1=parseBuffer.indexOf("href",index1);
    		System.out.println("First href" + index1);
    
    		index1=parseBuffer.indexOf("\"", index1);
    		System.out.println("First Quote: " + index1);
    
    		index2=parseBuffer.indexOf("\"", (index1+1));
    		System.out.println("Second Quote: " + index2);
    
    
    		String NewSTR=parseBuffer.substring((index1+1),index2);
    		linksList.add(NewSTR);
    
    		System.out.println("New String: "+ NewSTR);
    
    		parseBuffer.delete(0,(index1+1));
    		System.out.println(parseBuffer);
    		if(index1==index2);
    		{
    			dispose();
    			System.exit(0);
    		}
    	}
    And the output:
    Code:
    <HTML><HEAD><TITLE>Test</TITLE></HEAD><BODY><a href="index.html">Hello World.</a><a href="secondURL.html">Another Link</a><a href="yetanotherone2.html">Last One</a></BODY></HTML>
    String Length: 178
    First href47
    First Quote: 52
    Second Quote: 63
    New String: index.html
    index.html">Hello World.</a><a href="secondURL.html">Another Link</a><a href="yetanotherone2.html">Last One</a></BODY></HTML>
    Last edited by MadaNoswad; January 27th, 2011 at 02:32 AM.

  4. #4
    Join Date
    Feb 2008
    Posts
    966

    Re: Looping Issue While Parsing...Help Please.

    I can't see all of the code, but if the HTML is all on one line then it may be only going through the loop once. Have you printed out a "hello" inside of the loop to see how many times it is running? Are you getting to the System.exit(1) ever inside of the if statement?

  5. #5
    Join Date
    Jan 2011
    Posts
    1

    Re: Looping Issue While Parsing...Help Please.

    My guess would be <code>parseBuffer.delete(0,(index1+1));</code> is messing with your indexing...remove this line

  6. #6
    Join Date
    May 2006
    Location
    UK
    Posts
    4,473

    Re: Looping Issue While Parsing...Help Please.

    The problem is you are partly using 2 different indexing strategies and they are interferring with each other.

    You are iterating over the string char by char and so when you find a match you need to either jump past the end of the found text or delete the found text.

    To jump past the found text set 'i' to index 2 and do not delete anything from the string.
    Or
    If you want to delete the found text you need to set the indexing variable 'index1' and 'i' to 0 so they start searching from the new begining of the string and you need to delete everything up to 'index2'.

    With either strategy you should be testing to see if the search for 'href' succeeded or not before proceeding.
    Posting code? Use code tags like this: [code]...Your code here...[/code]
    Click here for examples of Java Code

  7. #7
    Join Date
    Nov 2010
    Posts
    15

    Re: Looping Issue While Parsing...Help Please.

    Thanks for the suggestions but managed to solve it all on my own. I was having a "off" day when I started this project. Managed to prevail and now just need to set up the mailto search for e-mail addresses and i should be complete.

    Have a look, and any suggestions are appreciated.
    Code:
    import java.awt.*;
    import java.awt.event.*;
    import javax.swing.*;
    import java.io.*;
    import java.net.*;
    import java.util.Vector;
    import java.util.*;
    
    
    //=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    public class ADProjectOne
    {
    	public static void main(String args[])
    	{
    		JFrame	f;
    		f=new ADProjectOneFrame();
    		f.setVisible(true);
    	}
    }
    //=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    class ADProjectOneFrame extends JFrame implements ActionListener
    {
    JTextField URLtf;
    JButton	button;
    JLabel	label;
    JLabel	linkLabel;
    JLabel	emailLabel;
    JLabel blank;
    JPanel mainPanel;
    public JList linkList;
    public JList emailList;
    Container cp;
    public String getURL="";
    public URL myURL;
    public URLConnection myURLConnection;
    public InputStream myInputStream;
    public InputStreamReader myInputStreamReader;
    public BufferedReader myHTMLReader;
    public static Vector<String> linksVector=new Vector<String>();
    public boolean allDone=false;
    public DefaultListModel lm;
    String savedLines="";
    
    
    ADProjectOneFrame()
    {
    	displayGUI();
    	setupMainFrame();
    }
    //=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    void setupMainFrame()
    {
    	    Toolkit tk;
    	    Dimension   d;
    
    	    tk = Toolkit.getDefaultToolkit();
    	    d = tk.getScreenSize();
    	    setSize(d.width/2, d.height/2);
    	    setLocation(d.width/5,d.height/5);
    
    	    setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    	    setTitle("Project #1 - URL Search");
       		setVisible(true);
    }
    public void actionPerformed(ActionEvent e)
    {
    	if(e.getActionCommand().equals("GO"));
    	{
    		grabURL();
    		parseURL();
    	}
    }
    
    //=========================( Getting the URL Entered )================================
    public void grabURL()
    {
    try
    	{
    	getURL=URLtf.getText();
    	System.out.println("URL to be grabbed: "+ getURL);
    	}
    catch(Exception e)
    	{
    	System.out.println("Problem in getURL(), Check for Errors");
    	}
    }
    public void displayGUI()
    {
    	label = new JLabel("URL to search");
    	linkLabel = new JLabel("Links Found:");
    	emailLabel = new JLabel("E-mails Found: ");
    	blank = new JLabel();
    	URLtf = new JTextField("http://www2.fairmontstate.edu/users/adawson/");
    	button = new JButton("Go!");
    	button.setActionCommand("GO");
    	button.addActionListener(this);
    
    	lm = new DefaultListModel();
    	linkList = new JList(lm);
    	linkList.setSelectionMode(ListSelectionModel.SINGLE_SELECTION);
    	linkList.setSelectedIndex(0);
    	emailList = new JList();
    	emailList.setSelectionMode(ListSelectionModel.SINGLE_SELECTION);
    	emailList.setSelectedIndex(0);
    
    
    	mainPanel=new JPanel(new GridLayout(4,2));
    	mainPanel.add(label);
    	mainPanel.add(URLtf);
    	mainPanel.add(linkLabel);
    	mainPanel.add(emailLabel);
    	mainPanel.add(linkList);
    	mainPanel.add(emailList);
    	mainPanel.add(blank);
    	mainPanel.add(button);
    	cp=getContentPane();
        cp.add(mainPanel);
    }
    //=========================( Parsing the URL Entered )================================
    public void parseURL()
    {
    	int index1=0;
    	int index2=0;
    	try
    	{
    		myURL = new URL(getURL);
    		System.out.println(myURL);
    		myURLConnection = myURL.openConnection();
    		myInputStream= myURLConnection.getInputStream();
    		myInputStreamReader=new InputStreamReader(myInputStream);
    		myHTMLReader=new BufferedReader(myInputStreamReader);
    
    		String inputLine;
    
    		while((inputLine = myHTMLReader.readLine()) != null)
    		{
    			savedLines+=inputLine;
    		}
    		myHTMLReader.close();
    		System.out.println(savedLines);
    	}
    	catch(MalformedURLException e)
    	{
    		System.out.println("Sorry, Problem with URL. Try Again.");
    	}
    	catch(IOException e)
    	{
    		System.out.println("Problem with openConnection...Check it Out Fool");
    	}
    	StringBuffer parseBuffer = new StringBuffer(savedLines);
    	System.out.println("String Length: "+ parseBuffer.length());
    
    	int cnt;
    	while(index1<parseBuffer.length() && allDone!=true)
    	{
    		index1=parseBuffer.indexOf("<a",index1);
    		if(index1<0)
    		{
    			System.out.println("Parse Complete.");
    			allDone=true;
    		}
    		System.out.println("First href" + index1);
    
    		index1=parseBuffer.indexOf("\"", index1);
    		System.out.println("First Quote: " + index1);
    
    		index2=parseBuffer.indexOf("\"", (index1+1));
    		System.out.println("Second Quote: " + index2);
    
    		String NewSTR=parseBuffer.substring((index1+1),index2);
    		linksVector.add(NewSTR);														//		NewSTR added to Vector
    
    		System.out.println("New String: "+ NewSTR);
    		index1=index2;
    		System.out.println("---------------------------------------");
    	}
    		System.out.println("Links Vector Size:" + linksVector.size());
    		for (int v = 0; v < linksVector.size(); v++)
    			{
    				lm.addElement(linksVector.elementAt(v));
    				System.out.println("Vector Added: " + linksVector.elementAt(v));
    			}
    		mainPanel.repaint();
    }
    }

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured