|
-
June 10th, 1999, 02:08 PM
#1
How to determine if file is binary or text?
Can someone tell me how to determine if a file is a text file or a binary file ? Any sample source
code will be greatly appreciated !
Thx in advance,
Mango
-
June 11th, 1999, 01:47 AM
#2
Re: How to determine if file is binary or text?
The easiest way to do this is to sample the file and determine the number of "nulls" (zero bytes). By convention, text files do NOT contain nulls, while binary files usually do. I usually use a threshold of 2.5% and a minimum sample of 1KB: as soon as the number of nulls exceeds the threshold, the file is declared binary, otherwise it is text. Whether or not you choose to read the whole file is up to you...I usually randomly sample at least 10% of the file's content.
So: open the file and read in the first 1KB. Count the number of nulls, and if still below the threshold, select a random offset and read in another 1KB. Continue until you have either exceeded the threshold (it's binary) or have read enough samples to be confident in the result (it's text).
Cheers!
Humble Programmer
,,,^..^,,,
-
June 11th, 1999, 11:56 AM
#3
Re: How to determine if file is binary or text?
Thanx for ur suggestion. But I wonder if MFC/SDK provides a standard function to achieve this more easily and consistently ?
-
June 11th, 1999, 03:18 PM
#4
Re: How to determine if file is binary or text?
I'd take that threshold advice. There are no functions IsFileBinary() or IsFileText() since all files are binary, text files are just a subclass of those. A null character is a very good indication of a non-text file, but you might also want to check for other non-printable characters. Read in 1k again, and then call isprint() on each character. Count a bunch to see if they're non-printable and then make the call on whether or not the file is binary.
-
June 11th, 1999, 06:51 PM
#5
Re: How to determine if file is binary or text?
What he said. And if your program is to be compatible with multibyte and unicode character sets, use _istprint. I did this to automatically determine the transfer type for a file in an FTP program, and it worked fine. You generally want to err on the side of it being binary. So if 1% or more, say, are non-printable characters, treat it as binary.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|