I am having a problem. There is an input file [Input_sentences.txt file] consisting of sentences line by line. I pick up each sentence and generate sub-sentences of length 2 to 6 words. There are two more files [Combined_Man_phrase.txt and Combined_Eng_phrase.txt] which store the sub-sentence patterns aligned in parallel fashion. I want to write the corresponding pattern from the Combined_Eng_phrase.txt file for each sub-sentences match found in Combined_Man_phrase.txt file. If I find any sub-sentence match found from the input sentence starting with two words upto six words , then I remove that sub-sentence and go ahead with the remaining part of the input sentence for next subsentence match ( starting with 2 words upto 6 again ) . I have written the program in Java as given below but I am unable to get the matches for all sub-sentence [Assuming that all the entries of the sub-sentences are there]. Even if there are multiple match, I want to pick up the first match only and avoid writing multiple matched patterns to the output file. Any help is appreciated.
Code:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class search_Match {
/** Creates a new instance of search_Match */
public search_Match() {
}
public void replace() throws FileNotFoundException, IOException {
//Create output file writter
FileWriter fr=new FileWriter("replaced_output.txt");
//Read Input sentences to be replaced
BufferedReader br=new BufferedReader(new FileReader("Input_sentences.txt"));
String sentence=null;
boolean found=false;
while((sentence=br.readLine())!=null){
String[] choppedUpString = sentence.trim().split(" ");
// Sub-sentence files
BufferedReader br4 = new BufferedReader(new FileReader("Combined_Ma_phrase.txt"));
BufferedReader br5 = new BufferedReader(new FileReader("Combined_En_phrase.txt"));
String ManPhrase=null;
String EngPhrase=null;
String tempString="";
int start=0;
int index=-1;
for (int i=0; i< choppedUpString.length ; i++)
{
for(int j=2; j<6; j++)
{
for (int k=start; k<(start+j) && k < choppedUpString.length ; k++)
tempString= tempString+" "+choppedUpString[k];
tempString = tempString.replaceFirst("\\s", "");
tempString= tempString.replaceAll("\\s+$","");
while((ManPhrase=br4.readLine())!=null)
{
EngPhrase=br5.readLine();
if(ManPhrase.matches(tempString))
{
fr.write(EngPhrase);
fr.write(" ");
start=0;
found=true;
index = sentence.indexOf(tempString);
if (index!=-1) sentence = sentence.substring(index+tempString.length(), sentence.length());
choppedUpString = sentence.trim().split(" ");
}
}
tempString="";
}
start++;
}
sentence=sentence.replaceFirst("\\s", "");
choppedUpString = sentence.split(" ");
fr.write("\r\n");
br4.close();
br5.close();
}
fr.close();
}
public static void main(String[] args) throws IOException{
search_Match m=new search_Match();
m.replace();
}
}
This program works for the first sub-subsentence. However, for the following sub-sentences of the remaining sentence after removing the first sub-sentence, it does not work. [ which is my main problem]
The sample files are attached. The first four words [i.e., This classic best seller ] of the Input_sentences.txt is matching with the fourth sub-sentences of Combined_Ma_phrase.txt, so the fourth sub-sentence of the corresponding order of Combined_En_phrase.txt is picked up and written to replaced_output.txt [i.e., thisthis classicclassic bestbest sellerseller ] . Again, from the remaining words of the input sentence, pick up sub-sentence [i.e., has been ] because the sub-sentence length is 2 to 6. So, the first match is picked up and the corresponding sub-sentence from Combined_En_phrase.txt [i.e., hashas beenbeen] is written to replaced_output.txt. Similarly, this process continues till the end of the sentence.
Last edited by TDX; October 8th, 2009 at 01:40 PM.
Reason: Indentation
* The Best Reasons to Target Windows 8
Learn some of the best reasons why you should seriously consider bringing your Android mobile development expertise to bear on the Windows 8 platform.