Ok, you're right, it would seem as if that last example program (the pure C code) indeed ran out of memory.
My question is still not answered though: The original program bad_allocs with 250 threads and the whole system to itself (Another test program can easy allocate (as in malloc + memset) 1,5 GB in the exact same setup) so even with 2 MB stacks it shouldn't run out of memory.
More than that, it sometimes crashes (SIGSEGV or SIGABRT).

Neither of those problems seems to indicate it running out of memory. OoM Problems also shouldn't go away after protecting the allocation with a mutex, or am I confused again?