-
January 15th, 2012, 07:30 AM
#1
Large text document processing in vb6
Hi, All... can anyone advice me how to process large text file in vb6 ??? how can i read multiple text from directory ?? how can I read Unicode text file from directory?? thanks in advance
-
January 15th, 2012, 01:35 PM
#2
Re: Large text document processing in vb6
Depends on what you'd like to do with the data. You should SEARCH the forums for examples
-
January 15th, 2012, 01:37 PM
#3
Re: Large text document processing in vb6
Originally Posted by dglienna
Depends on what you'd like to do with the data. You should SEARCH the forums for examples
I want to read a 2000 Unicode text files, each file is about 13-15 KB, the text files is contained 4 categories. I want to find the frequency of each word for each category then put the result in the database table or another text file. What i want to do is a reprocessing for the text file to be ready for text categorization. one thing more. can use the hash table for that ??? and how ??
-
January 15th, 2012, 02:46 PM
#4
Re: Large text document processing in vb6
13-15kB is not a large file.
In fact you could store easily one file in ONE string only.
So you could possibly store 2000 files in an array of strings.
You want to watch out, however, how to treat Unicode files.
I have some cool links for unicode processing in my office. I shall post them tomorrow.
Generally, when having unicode strings loaded properly, processing should be no problem. You only have to watch when writing unicode back to a file.
-
January 15th, 2012, 04:37 PM
#5
Re: Large text document processing in vb6
Originally Posted by WoF
13-15kB is not a large file.
In fact you could store easily one file in ONE string only.
So you could possibly store 2000 files in an array of strings.
You want to watch out, however, how to treat Unicode files.
I have some cool links for unicode processing in my office. I shall post them tomorrow.
Generally, when having unicode strings loaded properly, processing should be no problem. You only have to watch when writing unicode back to a file.
Thank you very much. really appreciated. will wait for ur links tomorrow.
by the way. how about finding the frequency of each word in a text at whole ? how can I store them? what is suitable data structure to store each word and its frequency ?
-
January 15th, 2012, 06:23 PM
#6
Re: Large text document processing in vb6
Kind of like GOOGLE does?
-
January 15th, 2012, 06:25 PM
#7
Re: Large text document processing in vb6
Originally Posted by dglienna
Kind of like GOOGLE does?
It is Automatic text categorization(ATC) for a specific language.
-
January 16th, 2012, 09:04 AM
#8
Re: Large text document processing in vb6
You have to be aware that there are several Unicode standards.
I have only experience in using UTF16 which always stores two bytes per character. It is very easy to handle.
The first two bytes in a unicode file identify the type of encoding. If they are hex FFFE it is a UTF16 VB6 can handle.
This is in short how to read a file:
Code:
Dim sig%, i$
Open FileName For Binary As #1
Get #1, , sig
'If Hex(sig) = "FFFE" Then
i$ = InputB(LOF(1) - 2, #1)
Else
MsgBox "No UTF16"
End If
Close #1
I$ contains then the complete file in unicode.
Look at this rather good tutorial:
http://www.cyberactivex.com/UnicodeT...lVb.htm#FileIO
In the FileIO section download the modUnicodeRW.bas
It contains several routines to read and write Unicode, also using API calls.
You can study them and choose.
I recommend to go through all interesting parts of the tutorial, too.
-
January 16th, 2012, 11:09 AM
#9
Re: Large text document processing in vb6
Originally Posted by WoF
You have to be aware that there are several Unicode standards.
I have only experience in using UTF16 which always stores two bytes per character. It is very easy to handle.
The first two bytes in a unicode file identify the type of encoding. If they are hex FFFE it is a UTF16 VB6 can handle.
This is in short how to read a file:
Code:
Dim sig%, i$
Open FileName For Binary As #1
Get #1, , sig
'If Hex(sig) = "FFFE" Then
i$ = InputB(LOF(1) - 2, #1)
Else
MsgBox "No UTF16"
End If
Close #1
I$ contains then the complete file in unicode.
Look at this rather good tutorial:
http://www.cyberactivex.com/UnicodeT...lVb.htm#FileIO
In the FileIO section download the modUnicodeRW.bas
It contains several routines to read and write Unicode, also using API calls.
You can study them and choose.
I recommend to go through all interesting parts of the tutorial, too.
Thank you very much
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|