Click to See Complete Forum and Search --> : c++ allocators thread-safe?
Tannin
December 6th, 2008, 04:33 AM
Hi,
I have been experiencing crashes in a heavily multi-threaded application (up to 1000 threads), that were always related to memory allocation (usually a std::string being resized).
Now, if I execute the following example on the target system (a red hat 9 => old thread-model) it causes std::bad_allocs as well as SIGABRT and sometimes segmentation faults if I set the number of threads high enough (>=250).
On my development machine (an up-to-date linux) it doesn't.
EDIT: Almost forgot: All systems I tested on have multi-core CPUs.
My question is: Is the code correct and there is something wrong with the target system or am I doing something wrong?
Sorry that I couldn't reduce the code further, but if I remove anything, the errors disappear or become a lot less frequent.
#include <pthread.h>
#include <string>
#include <iostream>
#include <sstream>
pthread_t Threads[NUM_THREADS];
void f(std::string &a)
{
std::istringstream test(a);
char i;
test >> i;
std::string r = '\"'+std::string("d")+'\"';
}
void * startThread(void *)
{
for(;;) {
std::string s2("test");
f(s2);
int size = rand() % 100000 + 1;
try {
char* b = new char[size + 1];
memset(b, rand() % 255, size);
b[size] = '\0';
s2 = b;
delete [] b;
} catch (const std::exception& e) {
std::cout << "failed to allocate array of size " << size + 1 << ":" << e.what() << std::endl;
}
}
return NULL;
}
int main()
{
srand(time(NULL));
for (int i = 0; i < NUM_THREADS; ++i) {
pthread_create(&Threads[i], NULL, startThread, NULL);
}
for (int i = 0; i < NUM_THREADS; ++i) {
pthread_join(Threads[i], NULL);
}
return 0;
}
compile with "g++ -DNUM_THREADS=250 -Wall -o test test.cpp -pthread -Wall"
Any help would be greatly appreciated.
Tannin
December 6th, 2008, 05:55 AM
I extended the sample a bit with overwritten versions of new and new[] that now explicitly lock a mutex before allocating memory:
static pthread_mutex_t NewMutex;
void* operator new(size_t size) throw(std::bad_alloc)
{
#ifdef USE_MUTEX
pthread_mutex_lock(&NewMutex);
#endif
void* Res = malloc(size);
#ifdef USE_MUTEX
pthread_mutex_unlock(&NewMutex);
#endif
if (Res == NULL) {
throw std::bad_alloc();
} else {
return Res;
}
}
void* operator new[](size_t size) throw(std::bad_alloc)
{
#ifdef USE_MUTEX
pthread_mutex_lock(&NewMutex);
#endif
void* Res = malloc(size);
#ifdef USE_MUTEX
pthread_mutex_unlock(&NewMutex);
#endif
if (Res == NULL) {
throw std::bad_alloc();
} else {
return Res;
}
}
void operator delete(void* ptr) throw()
{
#ifdef USE_MUTEX
pthread_mutex_lock(&NewMutex);
#endif
free(ptr);
#ifdef USE_MUTEX
pthread_mutex_unlock(&NewMutex);
#endif
}
void operator delete[](void* ptr) throw()
{
#ifdef USE_MUTEX
pthread_mutex_lock(&NewMutex);
#endif
free(ptr);
#ifdef USE_MUTEX
pthread_mutex_unlock(&NewMutex);
#endif
}
If USE_MUTEX is defined, it doesn't crash, if it isn't, it crashes.
Therefore, malloc doesn't seem to be thread-safe.
From googling I found that malloc is supposed to be thread-safe if I link to libpthread.
Again, is this a bug or am I misunderstanding something?
Codeplug
December 6th, 2008, 11:18 AM
>> Therefore, malloc doesn't seem to be thread-safe.
It should be. Can you reproduce the error with the first two lines after the "for(;;)" commented out?
gg
Tannin
December 6th, 2008, 12:12 PM
Hi,
unfortunately I won't be able to test again until monday. As far as I remember I wasn't able to reproduce the problem with the function call commented out.
Also, googling a bit it seems as if the regular malloc is not thread-safe but once you link in the thread-library it's supposed to be overwritten with a thread-safe version.
Tannin
December 8th, 2008, 02:12 AM
I've rewritten the test to be less obscure and C-only:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
pthread_t Threads[NUM_THREADS];
void * startThread(void* Data)
{
for(;;) {
int Size = rand() % 10000 + 1;
void* Test = malloc(Size);
if (Test != NULL) {
memset(Test, rand() % 255, Size);
free(Test);
} else {
printf("allocation error: %s\n", strerror(errno));
}
}
return NULL;
}
int main(int argc,char **argv)
{
srand(time(NULL));
for (int i = 0; i < NUM_THREADS; ++i) {
pthread_create(&Threads[i], NULL, startThread, NULL);
}
for (int i = 0; i < NUM_THREADS; ++i) {
pthread_join(Threads[i],NULL);
}
return 0;
}
It reports allocation errors, always with errno = ENOMEM (which according to the man-page is the only errno malloc is allowed to report). Of course the program doesn't come close to running out of available memory...
zerver
December 8th, 2008, 04:50 AM
It could be that you are hitting some kernel limit on the maximum number of locks per mutex.
http://forums.sun.com/thread.jspa?threadID=5084993
Tannin
December 8th, 2008, 02:12 PM
Interesting thought. I wrote a test program that was supposed to create a lot of threads, all trying to lock the same mutex. Locking the mutex never failed, but it turns out I'm not even able to spawn as many threads as I try, I can only spawn 254.
That is until I reduce the stack size for the threads, which seems to be set to 2MB per default.
This would indicate I'm actually running out of memory, although top doesn't show any memory consumption (because stack doesn't count).
This doesn't explain why the mutex in my second sample fixes the problem, but that may be due to changes in the timing?
However, the system has far more than 500 MB Memory (4GB actually). This also doesn't explain the problems I have in my actual application, which already sets the stack to a lower amount.
TheCPUWizard
December 8th, 2008, 02:21 PM
This would indicate I'm actually running out of memory, although top doesn't show any memory consumption (because stack doesn't count)..
WHY to you think stack would not count????
HOW are you measuring memory consumption????
Remember that on a 32 bit system, the total RESERVED (but not necessarily allocated or committed at any point in time) FOR ALL PROCESSES/THREADS on the system is either 2GB or 3GB.
When all of these "might eventually need"'s are taken into account, the number will be MUCH larger than what is shown in TaskMgr as the amount of memory ACTUALLY used.
Tannin
December 8th, 2008, 03:35 PM
WHY to you think stack would not count????
because top says the process has 0.0 - 0.1% memory consumption. ;)
HOW are you measuring memory consumption????
top
Remember that on a 32 bit system, the total RESERVED (but not necessarily allocated or committed at any point in time) FOR ALL PROCESSES/THREADS on the system is either 2GB or 3GB.
umm, no? The maximum for EACH processes is 3 GB. Each process has its own virtual address space of 2^32 bytes, 1 GB of which is reserved for the system (on linux, I think windows has 2GB for the system in default settings.
When all of these "might eventually need"'s are taken into account, the number will be MUCH larger than what is shown in TaskMgr as the amount of memory ACTUALLY used.
This is true. What I was trying to say is this: The Stack of a process/thread does not appear in top (and probably nowhere else) as used, but it's still unavailable for memory allocation. Still, I don't understand how the system could run out of memory with the allocations I make in the test. I also don't understand why the same test doesn't report errors on a more up-to-date system (for example my development system).
I'll run another test tomorrow where I allocate a fixed amount of memory in each thread to see if allocation problems are reproducable then. This should proove if it's actually running out of memory or if timing plays into it somehow.
TheCPUWizard
December 8th, 2008, 04:00 PM
Top,
Wrong on so many (yet horribly common) points.
Yes each process has an independant 4gb address space (with 1-2 GB reserved and inaccessable).
But ALL of the processes on a system must be mapped into the SINGLE 4gb (again 1-2 reserved) Processor address space.
Therefore a program which takes 550MB will FAIL when loading the 4th (5th is extended space is enabled) instance.
Next we get to the "actual" vs. "reserved" question. When a process is started (and possibly at additional points where it is running), it will request blocks of memory (but not actually utilize or "commit" them). As soon as this happens the TOTAL amount is removed from the available pool.
This is done so that future commits will alwyas succeed in terms of virtual address mapping; although actual commits may fail if the physical/virtual memory size if insufficient).
This is the primary reason for moving to 64 bit architectures - even with 32 bit processes. Our theoretical 550MB program could run approximately 3-4 BILLION instances before the global address space was exhausted (of course other limits will apply before then).
Task Manager (and many PermMon/WMI counters) do NOT reflect memory which has been reserved (ie eliminated from the free global address pool). This is what leads to confusion when people see things failing while memory appears to be available.
It is a simple matter to create a small program which reserves large amounts of memory so that multiple instances will quickly faily, yet have TaskManager (and most WMI) still show very low memory utilizations.
Tannin
December 9th, 2008, 03:31 AM
Ok, you're right, it would seem as if that last example program (the pure C code) indeed ran out of memory.
My question is still not answered though: The original program bad_allocs with 250 threads and the whole system to itself (Another test program can easy allocate (as in malloc + memset) 1,5 GB in the exact same setup) so even with 2 MB stacks it shouldn't run out of memory.
More than that, it sometimes crashes (SIGSEGV or SIGABRT).
Neither of those problems seems to indicate it running out of memory. OoM Problems also shouldn't go away after protecting the allocation with a mutex, or am I confused again?
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.