-
how to detect NULL bytes in a char array ?
Hi All,
I want to detect if any NULL bytes exist in a char array.
Consider the following code snippet:
Code:
char buffer[1000000];
ifstream fstr("myfile", ios::binary);
fstr.read(buffer, 1000000);
//check if buffer has any NULL characters
for(int i = 0; i < 1000000; i++)
{
if(!buffer[i]) detected = true;
}
Assuming that fstr has 1000000 bytes, constructing a for loop like above seems inefficient (in terms of speed) to me to detect NULL characters.
Buffer length could be much higher.
Which methodology do you suggest to detect NULL characters in the fastest way in a char array ?
Thanks.
-
Re: how to detect NULL bytes in a char array ?
Just use strlen! :rolleyes:
-
Re: how to detect NULL bytes in a char array ?
I'm not sure that you've got much choice other than to iterate through the chars. Even if you were to have an array of the appropriate length and treated it as an array of int or long you would only end up having to compare each one against four masks.
Better than using an explicit loop, use something like the following.
Code:
#include <algorithm>
#include <fstream>
using namespace std;
int main()
{
char buffer[1000000];
ifstream fstr("myfile", ios::binary);
fstr.read(buffer, 1000000);
const char *BEGIN = buffer;
const char *END = buffer + 1000000;
bool detected = (find(BEGIN, END, 0) != END);
}
You can adapt it for finding any value.
-
Re: how to detect NULL bytes in a char array ?
Again:
Code:
char buffer[1000000];
....
bool detected = strlen(buffer) < 1000000;
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
VictorN
Again:
Code:
char buffer[1000000];
....
bool detected = strlen(buffer) < 1000000;
What if I want to count for the number of NULL characters in a char array ?
Given the following value for buffer with no terminating null character:
Code:
buffer[0] = 'a'
buffer[1] = 0
buffer[2] = 0
buffer[3] = 'a'
How can I calculate the number of NULL characters (e.g., 2 in above example) without iterating through the array members ?
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
aryan1
How can I calculate the number of NULL characters (e.g., 2 in above example) without iterating through the array members ?
Is this case you have to iterate...
-
Re: how to detect NULL bytes in a char array ?
1) your original loop is OK, except you should break out once a NULL
is detected.
2) I would use std::find
3) strlen assumes that there is a NULL ... if there is not a NULL in
the buffer, it would continue checking untill it finds one.
4) I would expect that the read takes up more time than the
checking for NULL. You might want to break the read up into
blocks (512 or 1024) and check each block after you read it.
If a NULL is found early, it would save having to do a lot of reads.
-
Re: how to detect NULL bytes in a char array ?
If you want to count all the NULLs in the array, use std::count ...
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
Philip Nicoletti
3) strlen assumes that there is a NULL ... if there is not a NULL in
the buffer, it would continue checking untill it finds one.
Good point, Philip!
I missed the fact that buffer might be full filled with data without terminated NULL!
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
Philip Nicoletti
If you want to count all the NULLs in the array, use std::count ...
So the following code snippet will do the job:
Code:
const char *BEGIN = buffer;
const char *END = buffer + 1000000;
size_t numberOfNullChars = count(BEGIN, END, 0);
-
Re: how to detect NULL bytes in a char array ?
If you were to use std::vector then you can do this.
Code:
#include <algorithm>
#include <fstream>
#include <vector>
using namespace std;
int main()
{
vector<char> buffer(1000000);
ifstream fstr("myfile", ios::binary);
fstr.read(&buffer[0], buffer.size());
vector<char>::size_type null_count = count(buffer.begin(), buffer.end(), 0);
}
-
Re: how to detect NULL bytes in a char array ?
Also, note ... if you are only interested in the number of NULLs
in the file and are looking at the entire file, you do not need a buffer:
Code:
#include <algorithm>
#include <fstream>
#include <iterator>
using namespace std;
int main()
{
ifstream fstr("myfile", ios::binary);
size_t null_count = count(istreambuf_iterator<char>(fstr), istreambuf_iterator<char>(), 0);
}
-
Re: how to detect NULL bytes in a char array ?
Isn't the STL wonderful! :D
-
Re: how to detect NULL bytes in a char array ?
It's also probably worthy of note that every one of the posted solutions except for Philip's latest solution have the same horrible bug. That is that the number of bytes read from the file may be far less than the number of bytes in the buffer, yet the check for null characters covers the whole buffer. So what about the values found in the remainder of the buffer? The data may have no null characters but the remainder of the buffer might, thus an incorrect result may be produced.
-
Re: how to detect NULL bytes in a char array ?
Point taken, but I worked on the assumption that this is a very simplified example to demonstrate the algorithm for finding nulls, not robust file handling.
-
Re: how to detect NULL bytes in a char array ?
If you have to find whether there is a NULL.
You could do soemthing like the following..
char buffer[1000000] = {'\0'};
....
bool detected = strlen(buffer) < 1000000;
I think that initialization is not a costly affair.
-
Re: how to detect NULL bytes in a char array ?
Whichever method you choose, under the hood it's going to look at one character at a time looking for NULL. There's no other way to do it. Your original for loop is going to be just as efficient as anything else.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
JohnW@Wessex
Point taken, but I worked on the assumption that this is a very simplified example to demonstrate the algorithm for finding nulls, not robust file handling.
Fair enough - your solution did answer what the OP was asking for. I just thought I'd point it out though... just for completeness.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
GCDEF
Whichever method you choose, under the hood it's going to look at one character at a time looking for NULL. There's no other way to do it. Your original for loop is going to be just as efficient as anything else.
Oh, I'm sure I could come up with something less efficient if I tried....
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
Lindley
Oh, I'm sure I could come up with something less efficient if I tried....
I'm sure you could, but I doubt any of the other methods mentioned would be any more efficient that the OP for loop, which IIRC is what is being asked here.
-
Re: how to detect NULL bytes in a char array ?
Seriously though, while probably no better in terms of straight efficiency, there are good arguments for preferring std::find() to a hand-written loop. For one thing, it more explicitly self-documents what you're trying to do.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
Lindley
Seriously though, while probably no better in terms of straight efficiency, there are good arguments for preferring std::find() to a hand-written loop. For one thing, it more explicitly self-documents what you're trying to do.
I guess, although the original loop is about as basic as it gets.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
Lindley
Seriously though, while probably no better in terms of straight efficiency, there are good arguments for preferring std::find() to a hand-written loop. For one thing, it more explicitly self-documents what you're trying to do.
Well, what with the up coming new micro-threading libraries, there are also better chances for a "find" to be threaded than a simple for.
There are so many little optimizations possible under the hood with stl algorithms, it is really a good idea to use them.
For example:
copy will call memcopy if objects are POD
inplace_merge will attempt to create a buffer for moving stuff around
sort... will just plain be efficient.
etc.
Even if your for loop gets as dumb as it gets, there are always micro (and not-so-micro) gains.
In the case of OP, he would have read the ENTIRE buffer, even though found is equal to true. find would have aborted on the first null, at no extra branching cost.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
GCDEF
I'm sure you could, but I doubt any of the other methods mentioned would be any more efficient that the OP for loop, which IIRC is what is being asked here.
Right, the methods mentioned in this thread might not be more efficient.
And for just 1,000,000 bytes there might be no point of talking about efficiency at all.
But with 200,000,000 we are getting into “measurable” time – about 125ms.
Looking at this as a sport (or academic challenge), we can save 10% by replacing byte compare with int bitmask tests:
Code:
int* p = (int*)pBuffer;
//check if buffer has any NULL characters
for(int i = 0; i < arr_size/4; i++, p++)
{
if(!(*p & 0xFF000000))
count++;
if(!(*p & 0x00FF0000))
count++;
if(!(*p & 0x0000FF00))
count++;
if(!(*p & 0x000000FF))
count++;
}
Further micro-optimization (like loop unrolling) made no measurable difference. Looks like optimizing compiler somehow knew that it wouldn’t :)
There are still two ways how this code can be improved (in terms of efficiency): use multiple threads and utilize SSE.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
VladimirF
Right, the methods mentioned in this thread might not be more efficient.
And for just 1,000,000 bytes there might be no point of talking about efficiency at all.
But with 200,000,000 we are getting into “measurable” time – about 125ms.
Looking at this as a sport (or academic challenge), we can save 10% by replacing byte compare with int bitmask tests:
Code:
int* p = (int*)pBuffer;
//check if buffer has any NULL characters
for(int i = 0; i < arr_size/4; i++, p++)
{
if(!(*p & 0xFF000000))
count++;
if(!(*p & 0x00FF0000))
count++;
if(!(*p & 0x0000FF00))
count++;
if(!(*p & 0x000000FF))
count++;
}
Further micro-optimization (like loop unrolling) made no measurable difference. Looks like optimizing compiler somehow knew that it wouldn’t :)
There are still two ways how this code can be improved (in terms of efficiency): use multiple threads and utilize SSE.
Pretty sure it is IO bound anyways, given the origin is supposed to be an fstream. No matter what you do, it'll be starved for data.
If all the data is placed in memory before starting... Then I still think the program is IO bound to the slow memory (compared to what the processor does).
Maybe if the buffer was on the stack, but in this case, you can't have it big enough to be interesting.
My personal conclusion is that a good'ol loop (or std find/count) will get the job done at 99.99% max efficiency.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
monarch_dodra
My personal conclusion is that a good'ol loop (or std find/count) will get the job done at 99.99% max efficiency.
Not if you compare it with SSE. Check out this link, for example.
In my quick-and-dirty test I got this zeros counting in half the time (on in-memory data, of course).
-
Re: how to detect NULL bytes in a char array ?
That link is rather intimidating....
Still, good call on the memchr() function, I'd forgotten about that one. It does precisely what the OP wants, no fuss.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
GCDEF
Whichever method you choose, under the hood it's going to look at one character at a time looking for NULL. There's no other way to do it. Your original for loop is going to be just as efficient as anything else.
That's not quite true. It's possible to look for many characters at a time. Most processors today have multiple cores so it's possible to perform true concurrent searches.
-
Re: how to detect NULL bytes in a char array ?
Quote:
Originally Posted by
nuzzle
That's not quite true. It's possible to look for many characters at a time. Most processors today have multiple cores so it's possible to perform true concurrent searches.
Exactly. One thread per core will get you the best result.