Interesting thought. I wrote a test program that was supposed to create a lot of threads, all trying to lock the same mutex. Locking the mutex never failed, but it turns out I'm not even able to spawn as many threads as I try, I can only spawn 254.
That is until I reduce the stack size for the threads, which seems to be set to 2MB per default.
This would indicate I'm actually running out of memory, although top doesn't show any memory consumption (because stack doesn't count).
This doesn't explain why the mutex in my second sample fixes the problem, but that may be due to changes in the timing?
However, the system has far more than 500 MB Memory (4GB actually). This also doesn't explain the problems I have in my actual application, which already sets the stack to a lower amount.