-
October 31st, 2009, 01:53 PM
#1
Java IO problem
Hi i need help.. Im actually trying to read a file line by line from the input text file and then write word by word into the output text file. In case a word is found to match exactly an abbreviation name in the database object, this method will additionally output the full names. The max_linewidth arguement has to be at least 30, and translafeFile has to ensure that the number of characters including space in every line in the output text file does not exceed max_linewidth. Below is my code so far.. i have not been able to sovle the max_linewidth problem to print out properly.
txt file
Code:
The MOE has given out a total of 420 teaching scholarships and
awards, the highest number of scholarships and awards given out
since 2003.
MOE said that it has a bumper harvest this year because of the
strong interest in teaching among quality candidates.
Four candidates have been awarded the prestigious Overseas Merit
Scholarships for Teaching by the Public Service Commission,
while 17 others have been given the coveted Education Merit
Scholarship.
Code:
import java.io.*;
import java.util.StringTokenizer;
public class Translate{
private Data db;
public Translate(Database database){
this.database=database;
}
public int translateF(String ifilename,String filename,int max_lwidth){
int numAbbr=0;
try{
BufferedReader br =new BufferedReader(new FileReader(input_filename+".txt"));
PrintWriter pw =new PrintWriter(new BufferedWriter(new FileWriter(output_filename+".txt")));
String returnStr=processToken(br.readLine());
if(returnStr.length()-1<max_linewidth)
pw.write(returnStr.substring(0,returnStr.length()-1));
//test
System.out.println("check:"+returnStr.substring(returnStr.length()-1));
//writer.write();
br.close();
pw.close();
}
catch(FileNotFoundException e){
System.out.println("TOError: File not found! -"+input_filename);
System.exit(0);
}
catch(IOException e){
System.out.println(e.getMessage());
}
return -1;
}
public String processToken(String file){
StringTokenizer st=new StringTokenizer(file);
int count=0;
String str;//store token
String line="";
while(st.hasMoreTokens()){
str=st.nextToken();
line+= str+" ";
//loop for the total number of abbreviations
for(int i=0;i<database.getNumAbbreviations();i++){
//loop for total number of fullname in a abbreviation
for(int y=0;y<database.getAbbreviationByIndex(i).getNumFullNames();y++){
System.out.println(database.getAbbreviationByIndex(i).getAbbrName().trim());
if(database.getAbbreviationByIndex(i).getAbbrName().trim().equalsIgnoreCase(str.trim())){
line+= "("+database.getAbbreviationByIndex(i).getFullName(y)+") ";
count++;
}
}//end for
}//end for
}//end while
System.out.println(line);
if(count!=0){
return line+count;
}
else
return null;
}
}
Last edited by hugo84; November 3rd, 2009 at 12:24 AM.
-
October 31st, 2009, 02:17 PM
#2
Re: Java IO problem
Originally Posted by hugo84
Hi i need help.. i have not been able to sovle the max_linewidth problem to print out properly
So you want to know how to limit the output so it isn't greater than max_linewidth ?
Every time you have a word to print, get the current length of the line, add the length of the word, and if the result is greater than the max_linewidth, start a new line. Simples...
Time is an excellent teacher; but eventually it kills all its students...
Anon.
Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.
-
October 31st, 2009, 02:46 PM
#3
Re: Java IO problem
yah.. i did it here.. problem is it only prints the 1st line..which is whats shown below. It dosent print onto the file and also no matter what linewidth i set, it dosent compare. What could be the problem?
Output
The MOE (Ministry of Education) has given out a total of 420 teaching scholarships and
if(returnStr.length()-1<max_linewidth)
pw.write(returnStr.substring(0,returnStr.length()-1));
//test
System.out.println("check:"+returnStr.substring(returnStr.length()-1));
Last edited by hugo84; October 31st, 2009 at 02:53 PM.
-
October 31st, 2009, 07:09 PM
#4
Re: Java IO problem
Originally Posted by hugo84
What could be the problem?
Your code is probably wrong.
I can't say more because you haven't supplied any useful information. If you want help, why not make an effort to supply the necessary information? You didn't answer any questions I asked in my first post, and now you post a snippet of code with no context at all. Although you say "yah.. i did it here..", you haven't posted the relevant code.
Why do we never have time to do it right, but always have time to do it over?
Anon.
Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.
-
November 1st, 2009, 12:03 AM
#5
Re: Java IO problem
Originally Posted by dlorde
Your code is probably wrong.
I can't say more because you haven't supplied any useful information. If you want help, why not make an effort to supply the necessary information? You didn't answer any questions I asked in my first post, and now you post a snippet of code with no context at all. Although you say "yah.. i did it here..", you haven't posted the relevant code.
Why do we never have time to do it right, but always have time to do it over?
Anon.
So you want to know how to limit the output so it isn't greater than max_linewidth ? Yes
now you post a snippet of code with no context at all: I posted my whole line of code in the 1st post
-
November 1st, 2009, 01:11 PM
#6
Re: Java IO problem
Oops, my mistake. I'm sorry hugo84, it was late, and I confused your post with someone else's
I've had another look at the code and the task, and by way of apology, I'd like to suggest a more structured approach.
It often helps to take a little thinking time to break the problem down into it's key parts and concepts. A key concept in this task is that you need to write your own output lines, regardless of the input file lines. To do this, you need to assemble lines word by word, so you can check to see whether each word will make the line too long. If this happens, You write the line and use the word to start the next line. This part doesn't care whether the words come from the input file or the database, so it can be separated from the word input section.
The word input section must supply the next word to the line assembly & output section. Each time it's called, it should return a word from the input file or from the full name of an abbreviation. The full names are effectively 'nested' inside the abbreviations, all the words in a full name must be returned before the next input file word. When reading the input file, the lines are irrelevant, because we're just interested in the next word, so a Scanner is more appropriate than a BufferedFileReader. If we assume a database with a simple query and retrieval API, we can create a method to return the next word, something like this:
Code:
Scanner fileWords; // scanner for the input file
Scanner abbrFullName; // scanner for the full name
...
public String getNextWord() {
String word = null;
// first check if we are processing an abbreviation
if (abbrFullName.hasNext()) {
word = abbrFullName.next();
}
// otherwise read the next word from the input file
else if (fileWords.hasNext()) {
word = fileWords.next();
// check for an abbreviation
if (database.containsKey(word)) {
// retrieve the full name, and put it in parentheses
String afn = '(' + database.get(word) + ')';
// create a new full name scanner with it
abbrFullName = new Scanner(afn);
}
}
return word;
}
Each time it's called it will return a word, either from the file or the current full name, and returns null when everything has been read.
Now we can write the line assembly and output section, using the getNextWord() method:
Code:
// use a StringBuilder for concatenating strings in a loop
StringBuilder line = new StringBuilder();
String word = getNextWord(); // get first word
while (word != null) { // finished yet?
// will this word make the line too long?
if ((line.length()+1 + word.length()) > max_linewidth) {
// yes - print the line and start a new one
pw.println(line);
line.setLength(0);
}
// add the latest word to the line, with a space
line.append(word).append(" ");
// get the next word and loop
word = getNextWord();
}
So, putting it all together, you get something like:
Code:
Scanner fileWords;
Scanner abbrFullName = new Scanner("");
...
public int translateFile(String input_filename, String output_filename, int max_linewidth) {
try {
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter(output_filename + ".txt")));
fileWords = new Scanner(new File(input_filename + ".txt"));
StringBuilder line = new StringBuilder();
String word = getNextWord();
while (word != null) {
if ((line.length() + 1 + word.length()) > max_linewidth) {
pw.println(line);
System.out.println(line);
line.setLength(0);
}
line.append(word).append(" ");
word = getNextWord();
}
pw.close();
}
catch (FileNotFoundException e) {
System.out.println("TOError: File not found! -" + input_filename);
System.exit(0);
}
catch (IOException e) {
System.out.println(e.getMessage());
}
return -1;
}
public String getNextWord() {
String word = null;
if (abbrFullName.hasNext()) {
word = abbrFullName.next();
}
else if (fileWords.hasNext()) {
word = fileWords.next();
if (database.containsKey(word)) {
String afn = '(' + database.get(word) + ')';
abbrFullName = new Scanner(afn);
}
}
return word;
}
There are ways to make this simpler and remove some duplicate code, and there are alternative ways to do the task (e.g. reading word by word from the input file and writing word by word to the output file, counting line length and throwing in a newline every so-often). But I thought that this way shows how to separate the concerns into different, independent methods or modules, which are flexible and make coding and maintenance easier. For example, this code doesn't allow for paragraphs in the input file, but by changing the getNextWord() method, you could read the input file a paragraph at a time, storing and scanning each one as is done with the abbreviation full name...
The key to performance is elegance, not batallions of special cases...
J. Bently & D. McIlroy
Last edited by dlorde; November 1st, 2009 at 01:20 PM.
Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.
-
November 2nd, 2009, 06:41 AM
#7
Re: Java IO problem
Thanks dlorde for the effort you put in to helpe me. Appreciate it dude. No issues about you mistaking me for another post. =)
1 more question how do i do paragraphing as per what looked like in the textfile originally For example below. Also why is it not printing the last 3 words from the textfile?
Code:
Original Text
he MOE has given out a total of 420 teaching scholarships and
awards, the highest number of scholarships and awards given out
since 2003.
MOE said that it has a bumper harvest this year because of the
strong interest in teaching among quality candidates.
Four candidates have been awarded the prestigious Overseas Merit
Scholarships for Teaching by the Public Service Commission,
while 17 others have been given the coveted Education Merit
Scholarship.
After Translation
Code:
The MOE (Ministry of Education) has
given out a total of 420 teaching
scholarships and awards, the highest
number of scholarships and awards given
out since 2003.
MOE (Ministry of
Education) said that it has a bumper
harvest this year because of the strong
interest in teaching among quality
candidates.
Four candidates have been
awarded the prestigious Overseas Merit
Scholarships for Teaching by the Public
Service Commission, while 17 others
have been given the coveted Education
Merit Scholarship.
Code:
import java.io.*;
import java.util.Scanner;
import java.util.StringTokenizer;
public class TranslateApp{
private Database database;
Scanner fileWords;
Scanner abbrFullName = new Scanner("");
public TranslateApp(Database database){
this.database=database;
}
public int translateFile(String input_filename, String output_filename, int max_linewidth) {
try {
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter(output_filename + ".txt")));
fileWords = new Scanner(new File(input_filename + ".txt"));
StringBuilder line = new StringBuilder();
String word = getNextWord();
while (word != null) {
if ((line.length() + 1 + word.length()) > max_linewidth) {
pw.println(line);
System.out.println(line);
line.setLength(0);
}
else if (word ==null){
System.out.println();
System.out.println();
}
line.append(word).append(" ");
word = getNextWord();
}//end while
pw.close();
}
catch (FileNotFoundException e) {
System.out.println("TOError: File not found! -" + input_filename);
System.exit(0);
}
catch (IOException e) {
System.out.println(e.getMessage());
}
return -1;
}
public String getNextWord() {
String word = null;
if (abbrFullName.hasNext()) {
word = abbrFullName.next();
}
else if (fileWords.hasNext()) {
word = fileWords.next();
return word;
}//end getNextWord()
}
Last edited by hugo84; November 2nd, 2009 at 10:09 AM.
-
November 2nd, 2009, 09:15 AM
#8
Re: Java IO problem
Duplicating the paragraphs is a bit more complicated, because it means you do need to keep track of the lines in the input file - which means reading line by line, converting each line to words, and handling a paragraph when you come to a blank or empty line.
To keep things tidy, and manageable, it's worth encapsulating that part in a separate method that returns individual words from the file and returns a word consisting of a newline character sequence whenever a paragraph (blank line) is encountered:
Code:
static final String NEWLINE = System.getProperty("line.separator");
Scanner lineWords = new Scanner(");
Scanner fileLines = new Scanner(new File(input_filename + ".txt"));
...
String getNextFileWord() {
String word = null;
// still processing the line?
if (lineWords.hasNext()) {
// return next word in line
word = lineWords.next();
}
// line is empty - get the next one
else if (fileLines.hasNextLine()) {
String line = fileLines.nextLine().trim();
// is it a paragraph ?
if (line.isEmpty()) {
word = NEWLINE;
}
// nope, set up the line scanner and return the first word
else {
lineWords = new Scanner(line);
word = lineWords.next();
}
}
return word;
}
You may have realised that this method replaces the fileWords scanner we used before, so it just plugs in where fileWords was used.
We also need to make some changes to the main printing loop because the newline word means the current line must be printed, followed by the newline. As it is, the newline word will be used to start the next line, which is OK, but we don't want to follow it with a space like a normal word, so we need to fix that:
Code:
while (word != null) {
// check for NEWLINE and line at max length
if (word.equals(NEWLINE) || (line.length() + 1 + word.length()) > max_linewidth) {
pw.println(line);
line.setLength(0);
}
// don't use a space if it's a paragraph
String space = word.equals(NEWLINE) ? "" : " ";
// start the next line
line.append(word).append(space);
// get the next word and loop
word = getNextWord();
}
Put that all together and you get:
Code:
private static final String NEWLINE = System.getProperty("line.separator");
Scanner lineWords = new Scanner("");
Scanner abbrFullName = new Scanner("");
Scanner fileLines;
...
public int translateFile(String input_filename, String output_filename, int max_linewidth) {
try {
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter(output_filename + ".txt")));
fileLines = new Scanner(new File(input_filename + ".txt"));
String word = getNextWord();
StringBuilder line = new StringBuilder();
while (word != null) {
if (word.equals(NEWLINE) || (line.length() + 1 + word.length()) > max_linewidth) {
pw.println(line);
System.out.println(line);
line.setLength(0);
}
String space = word.equals(NEWLINE) ? "" : " ";
line.append(word).append(space);
word = getNextWord();
}
pw.println(line);
System.out.println(line);
pw.close();
}
catch (FileNotFoundException e) {
System.out.println("TOError: File not found! -" + input_filename);
System.exit(0);
}
catch (IOException e) {
System.out.println(e.getMessage());
}
return -1;
}
public String getNextWord() {
String word = null;
// first check if we are processing an abbreviation
if (abbrFullName.hasNext()) {
word = abbrFullName.next();
}
// otherwise read the next word from the input file
else {
word = getNextFileWord();
if (word != null) {
// check for an abbreviation
if (database.containsKey(word)) {
// retrieve the full name and initialise the full name scanner
String afn = '(' + database.get(word) + ')';
abbrFullName = new Scanner(afn);
}
}
}
return word;
}
String getNextFileWord() {
String word = null;
if (lineWords.hasNext()) {
word = lineWords.next();
}
else if (fileLines.hasNextLine()) {
String line = fileLines.nextLine().trim();
if (line.isEmpty()) {
word = NEWLINE;
}
else {
lineWords = new Scanner(line);
word = lineWords.next();
}
}
return word;
}
Now you can see that assembling the output a line at a time can get fiddly when paragraphs are involved, and it might be worth thinking about changing this so we output a single word at a time (using PrintWriter.print(String)). Then we may get simpler and clearer handling of the words and spaces. This is an important part of coding - every so-often, you should take a step back and consider if there is a clearer, more elegant way to do things. With experience, you will get a feel for whether things could be improved. If code looks complicated, consider breaking it down into simpler methods. Don't be afraid to expand dense code to make it more readable - the number of lines isn't as important as simplicity and readability, and the source code is compiled anyway, so it probably won't take up any more room.
Every now and then go away, have a little relaxation, for when you come back to your work your judgment will be surer. Go some distance away because then the work appears smaller and more of it can be taken in at a glance and a lack of harmony and proportion is more readily seen...
Leonardo Da Vinci
Last edited by dlorde; November 2nd, 2009 at 09:18 AM.
Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.
-
November 2nd, 2009, 09:44 AM
#9
Re: Java IO problem
Omg ur so gay...i really have nothing to say about ur coding.. Let me digest some of ur codes.. i will be back! Thanks for your help once again.
-
November 2nd, 2009, 12:10 PM
#10
Re: Java IO problem
Q1: Sorry dlorde could you explain to me how does private static final String NEWLINE = System.getProperty("line.separator"); this work
Q2: Scanner abbrFullName = new Scanner("");
if (abbrFullName.hasNext()) {
word = abbrFullName.next();
}
If Scanner is "" how did you scan for the abbreviation? wouldnt word = abbrFullName.next(); be nothing?
Last edited by hugo84; November 2nd, 2009 at 12:19 PM.
-
November 2nd, 2009, 12:20 PM
#11
Re: Java IO problem
Originally Posted by hugo84
Sorry dlorde could you explain to me how does private static final String NEWLINE = System.getProperty("line.separator"); this work
Different operating systems use different conventions to mark the ends of lines in text files. For example, Unix/Linux uses newline, nl, 10, 0x0a, '\n'. DOS/Win 3.1/W95/W98/Me/NT/W2K/XP/W2K3/Vista/W7 use carriage-return:line-feed, CrLf, 13:10, 0x0d 0x0a "\r\n" OSX uses carriage-return, Cr, 10, 0x0a '\r'.
Source from http://mindprod.com/jgloss/lineseparator.html
------
If you are satisfied with the responses, add to the user's rep!
-
November 3rd, 2009, 06:02 PM
#12
Re: Java IO problem
Originally Posted by hugo84
Q1: Sorry dlorde could you explain to me how does private static final String NEWLINE = System.getProperty("line.separator"); this work
Look up the API Javadocs for System.getProperty(..) and System.getProperties().
f Scanner is "" how did you scan for the abbreviation? wouldnt word = abbrFullName.next(); be nothing?
Sure - the first time through, no word has been read yet, so there is no abbreviation or full name to process. The full name scanner has no next word, so processing continues on to read a word from the first line. If an abbreviation is found, it looks it up in the database and initialises the scanner with the full name that is returned. Next time round, the full name scanner does have a next word, so that gets output, and so-on.
Try stepping through the code by hand with pencil & paper - you should be able to see how it works fairly easily. Pencil and paper are the best debugging tools on the market.
Any fool can write code that a computer can understand. Good programmers write code that humans can understand...
M. Fowler
Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|