What is this file?

Quote:

Originally Posted by strych9

what is the best way to read it?

You read the file in the same way you read other files. If this is a Unicode text file, it is perfectly valid.

The question is this: what is the purpose of your application? Why can't it handle Unicode files read in as-is, without messing around removing NULL characters? If the reason is that "I didn't think about Unicode", then maybe you should add in the ability (either by user option or some other means), to read in and process files based on 16-bit characters in a "natural" way, where you are not removing NULL characters..

Otherwise, just read the file in as-is, and if it is a 16-bit character file, then just inform the user "sorry my app handles only files based on single byte character set". Then take it from there as to what to do next in terms of supporting 16-bit character based files.

Also, you want to do this without removing the NULL characters, since I can produce a file that has 16 bit characters where you should not remove NULL bytes (i.e. an Asian language). What are you going to do when the file contains Chinese or Japanese 16-bit characters? Either you support 16-bit characters, or you don't. If you start removing NULL characters from a "true" 16-bit character set, you are not only changing the characters, you are turning the resulting text into nonsense, and then storing it in your database.

Regards,

Paul McKenzie