CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 12 of 12

Threaded View

  1. #1
    Join Date
    Sep 2009
    Posts
    9

    search and match

    Hi all

    I am having a problem. There is an input file [Input_sentences.txt file] consisting of sentences line by line. I pick up each sentence and generate sub-sentences of length 2 to 6 words. There are two more files [Combined_Man_phrase.txt and Combined_Eng_phrase.txt] which store the sub-sentence patterns aligned in parallel fashion. I want to write the corresponding pattern from the Combined_Eng_phrase.txt file for each sub-sentences match found in Combined_Man_phrase.txt file. If I find any sub-sentence match found from the input sentence starting with two words upto six words , then I remove that sub-sentence and go ahead with the remaining part of the input sentence for next subsentence match ( starting with 2 words upto 6 again ) . I have written the program in Java as given below but I am unable to get the matches for all sub-sentence [Assuming that all the entries of the sub-sentences are there]. Even if there are multiple match, I want to pick up the first match only and avoid writing multiple matched patterns to the output file. Any help is appreciated.

    Code:
    import java.io.BufferedReader;
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.io.FileReader;
    import java.io.FileWriter;
    import java.io.IOException;
    
    public class search_Match {
    
    /** Creates a new instance of search_Match */
             public search_Match() {
                 }
                     
    
        public void replace() throws FileNotFoundException, IOException {
        
        //Create output file writter
        FileWriter fr=new FileWriter("replaced_output.txt");
        
        //Read Input sentences to be replaced
        BufferedReader br=new BufferedReader(new FileReader("Input_sentences.txt"));
        
        String sentence=null;
        boolean found=false;
        while((sentence=br.readLine())!=null){
        String[] choppedUpString = sentence.trim().split(" ");
       
        // Sub-sentence files
        BufferedReader br4 = new BufferedReader(new FileReader("Combined_Ma_phrase.txt"));
        BufferedReader br5 = new BufferedReader(new FileReader("Combined_En_phrase.txt"));     
       
        String ManPhrase=null;
        String EngPhrase=null; 
        String tempString=""; 
        int start=0;
        int index=-1;
        
    
    	 for (int i=0; i< choppedUpString.length ; i++)
    	{
    	     for(int j=2; j<6; j++)
                 {
                  for (int k=start; k<(start+j) && k < choppedUpString.length ; k++)                   
    	         tempString= tempString+" "+choppedUpString[k];
    	         tempString = tempString.replaceFirst("\\s", "");
    	         tempString= tempString.replaceAll("\\s+$","");
    	
    	         while((ManPhrase=br4.readLine())!=null)
    		 {
    	            EngPhrase=br5.readLine();
    	
    		    if(ManPhrase.matches(tempString))
    		      {                       
    	                        fr.write(EngPhrase);
    	                     	fr.write(" ");                      
    		                start=0;
    		                found=true;
    			        index = sentence.indexOf(tempString);
    			        if (index!=-1) sentence = sentence.substring(index+tempString.length(), 	sentence.length());
    			choppedUpString = sentence.trim().split(" ");                           
    		      }
    		  }
    	               	tempString="";
                  }
    		start++;                      
    	}
    	sentence=sentence.replaceFirst("\\s", ""); 
    	choppedUpString = sentence.split(" ");
    	fr.write("\r\n");
    	br4.close();
    	br5.close();
    	}
    	fr.close();
    }
    
    public static void main(String[] args) throws IOException{
    search_Match m=new search_Match();
    m.replace();
    }
    }

    This program works for the first sub-subsentence. However, for the following sub-sentences of the remaining sentence after removing the first sub-sentence, it does not work. [ which is my main problem]

    The sample files are attached. The first four words [i.e., This classic best seller ] of the Input_sentences.txt is matching with the fourth sub-sentences of Combined_Ma_phrase.txt, so the fourth sub-sentence of the corresponding order of Combined_En_phrase.txt is picked up and written to replaced_output.txt [i.e., thisthis classicclassic bestbest sellerseller ] . Again, from the remaining words of the input sentence, pick up sub-sentence [i.e., has been ] because the sub-sentence length is 2 to 6. So, the first match is picked up and the corresponding sub-sentence from Combined_En_phrase.txt [i.e., hashas beenbeen] is written to replaced_output.txt. Similarly, this process continues till the end of the sentence.
    Attached Files Attached Files
    Last edited by TDX; October 8th, 2009 at 01:40 PM. Reason: Indentation

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured