Hi..i need to tokenize my input text file and then categorize each word. The category is noun, verb, adjective or adverb. Here is my code. Can anyone help me?
public class Grammar extends JFrame implements ActionListener
{
final static boolean shouldFill = true;
final static boolean shouldWeightX = true;
final static boolean RIGHT_TO_LEFT = false;
BufferedInputStream in = new BufferedInputStream(new FileInputStream(file));
byte[] b = new byte[in.available()];
in.read(b, 0, b.length);
jtaOpen.append(new String(b, 0, b.length));
in.close();
private void saveFile(File file) //save file with the specified file instance
{
try
{
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(file));
byte[] b = (jtaOpen.getText()).getBytes();
out.write(b, 0,b.length);
out.close();
private void createMap()
{
String open = jtaOpen.getText(); /
StringTokenizer tokenizer = new StringTokenizer(open, delim); //tokenizes the string
map = new TreeMap();
while (tokenizer.hasMoreTokens())
{
String word = tokenizer.nextToken().toLowerCase().trim();
if(!word.equals(""))
{
if ( map.containsKey(word))
{
Integer count = (Integer) map.get(word);
//------------------increment value
map.put(word, new Integer(count.intValue() + 1 ));
}
else //otherwise add word with a value of 1 to map
{
map.put(word, new Integer(1));
}
}
}
}
public static void main(String[] args) throws Exception
{
Grammar g = new Grammar();
g.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
g.setVisible(true);
}
Please use [CODE]...[/CODE] tags when posting code, so it stays readable.
What, specifically, are you stuck on?
Experience is a poor teacher: it gives its tests before it teaches its lessons...
Anon.
Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.
Sorry for unreadable code. My first experience posting on forum. There is 2 things i need to do which is:
(1) tokenize the word in jtaOpen and
(2)categorize each word into its specification (noun, verb, adjective or adverb).
The problem is when i want to categorize the words, i need to used WordNet. Since i'm using Malay words, there's no WordNet exist for Malay. What shall i do?
Figure out the code tags. If you accomplish this, continue. If you cannot accomplish this, quit (your programming classes, because if that's too difficult....).
Sun have designated StringTokenizer a 'legacy' class (sort of half-way to being deprecated, I guess). They suggest using String.split(..) or the regex package classes. But anyway, what's the problem tokenizing the input string?
(2)categorize each word into its specification (noun, verb, adjective or adverb).
The problem is when i want to categorize the words, i need to used WordNet. Since i'm using Malay words, there's no WordNet exist for Malay. What shall i do?
I don't know - your statement is paradoxical. If you believe you need to use WordNet and WordNet doesn't meet your requirements, you need to rethink either what you believe you need, or your requirements.
If you don't think carefully, you might believe that programming is just typing statements in a programming language...
W. Cunningham
Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.
Bookmarks