CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5
  1. #1
    Join Date
    Jun 1999
    Posts
    7

    How to determine if file is binary or text?

    Can someone tell me how to determine if a file is a text file or a binary file ? Any sample source
    code will be greatly appreciated !

    Thx in advance,
    Mango


  2. #2
    Guest

    Re: How to determine if file is binary or text?

    The easiest way to do this is to sample the file and determine the number of "nulls" (zero bytes). By convention, text files do NOT contain nulls, while binary files usually do. I usually use a threshold of 2.5% and a minimum sample of 1KB: as soon as the number of nulls exceeds the threshold, the file is declared binary, otherwise it is text. Whether or not you choose to read the whole file is up to you...I usually randomly sample at least 10% of the file's content.

    So: open the file and read in the first 1KB. Count the number of nulls, and if still below the threshold, select a random offset and read in another 1KB. Continue until you have either exceeded the threshold (it's binary) or have read enough samples to be confident in the result (it's text).

    Cheers!
    Humble Programmer
    ,,,^..^,,,


  3. #3
    Join Date
    Jun 1999
    Posts
    7

    Re: How to determine if file is binary or text?

    Thanx for ur suggestion. But I wonder if MFC/SDK provides a standard function to achieve this more easily and consistently ?


  4. #4
    Join Date
    Apr 1999
    Posts
    396

    Re: How to determine if file is binary or text?

    I'd take that threshold advice. There are no functions IsFileBinary() or IsFileText() since all files are binary, text files are just a subclass of those. A null character is a very good indication of a non-text file, but you might also want to check for other non-printable characters. Read in 1k again, and then call isprint() on each character. Count a bunch to see if they're non-printable and then make the call on whether or not the file is binary.


  5. #5
    Join Date
    Apr 1999
    Posts
    52

    Re: How to determine if file is binary or text?

    What he said. And if your program is to be compatible with multibyte and unicode character sets, use _istprint. I did this to automatically determine the transfer type for a file in an FTP program, and it worked fine. You generally want to err on the side of it being binary. So if 1% or more, say, are non-printable characters, treat it as binary.



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured