Hello,
I have a very strange issue with a program I'm trying to debug. First of all, the program is a monte carlo simulation, which means that it does exactly the same process over and over again, involving random numbers, with different random seeds. From a purely algorithmic standpoint, if the code is correct than the simulation should always run fine, and if its wrong then it should fail on the very first simulation. However, I have a bizarre combination of issues.
First off, there appears to be a memory leak. The RAM the program uses increases rapidly until it hits about 400 megs. This is still only 20% of my system however. At this point it crashes. If it was purely a memory allocation issue however, you would expect that with different parameters it would crash at different points in the code. However it does not. It always crashes on the exact same line (verified using a debugger). This lame is a simple array copying statement, and it causes a seg fault, array out of bounds because the array that I am trying to assign values to is a null pointer. Hence even though I am accessing it at zero, it causes a seg fault.
Since the pointer is null, the obvious thing is that the array hasn't been new-ed yet or that it has already been deleted. However, it worked on the first n identical iterations of my simulation, but not on this one. What gives? Why would a memory leak cause an array out of bounds error? Could a memory leak eventually prevent an array from being allocated properly? Why always the same array?
On top of it the crashes are not even consistent in the sense that if i start at seed 1 with some givne settings, it will crash at seed 31. If i start at seed 20 however with the same settings, it no longer crashes at seed 31. This seems to imply that whatever the problem is, it is heavily memory related. Yet it is hard to reconcile that fact with the fact that it seems to always crash in the same place.
Does anyone have some general suggestions on how I can track this down, software I can use, things to look out for, anything? I've been hammering away at it with GDB and staring and staring at lines of source code but its very hard, everything seems to be deleted properly. I make sure that all the classes de-allocate all their memory in the destructors and I make sure all the destructors are called. Don't know what to do any more...
However, it worked on the first n identical iterations of my simulation, but not on this one. What gives?
It comes under the heading of 'Undefined Behaviour'
Posting code helps.
(Full compilable example that demonstrates the problem)
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
Richard P. Feynman
Hello,
I have a very strange issue with a program I'm trying to debug.
A very flowing explanation, but guess what? Welcome to the world of C++ programming -- every programmer has run into this problem. This isn't strange, new, or unexpected when you write a program.
The answer is simply this -- debug the program. Where are your logging statements, or your debugging statements at various points in the program to diagnose the error?
In the world of C++, whenever you mishandle memory, access NULL pointers, the behaviour of the program is undefined. Therefore trying to explain why the behaviour shouldn't happen because of "algorithmic" reasons is not going to do you any good.
An algorithm is just steps on paper. If the computer program used to implement those steps violates in some way the language rules or runtime environment, then the algorithm becomes a moot point.
I make sure that all the classes de-allocate all their memory in the destructors and I make sure all the destructors are called.
We don't know that -- all we have is your word. The problem with making statements like that is that it locks us out from seeing the code you say is OK. Many times (and it has been demonstrated right here on CodeGuru), a poster that claims the code is OK starts us out on the wrong path looking for the problem elsewhere, when the problem turned out to be the code that was supposedly OK.
Paul, while it is helpful for you to tell me to "simply" debug the program, I have believe it or not been trying to do that. I also know that C++ does strange things. What I am asking is for more experienced programmers (who, as you said have ALL run into these problems) to weigh in on what kind of strange thing this might be and why. I do not understand memory allocation very well and I'm relatively ignorant of what can go wrong when you s crew it up. Can a memory leak for example cause a "new" statement later in the program to not allocate properly? Is there a reason why one new statement is consistently the one to fail? These are the questions that I'm asking. I understand that to give exact answers you need code, but I was hoping that I could get rough ideas or possibilities without asking someone to go through thousands of lines of my own code (which seems rude, to me).
I have to run to class now but I will be back shortly, I will zip the files and post them. Thanks so far guys. Thanks for the valgrind recommendation peter, it looks dead on like the sort app that could help me.
I do not understand memory allocation very well and I'm relatively ignorant of what can go wrong when you s crew it up. Can a memory leak for example cause a "new" statement later in the program to not allocate properly?
A simple memory overwrite of a single byte can mess up an allocator. If you have an array declared of 100 chars, and you attempt to write to the 101st entry, that in itself could cause havoc later on.
Double deletion, deletion of a pointer that doesn't point to dynamically allocated memory, using 'C' constructs and functions on non-POD types, I could go on. There are just too many things that can cause this behaviour.
Imagine if you built a search tree in your program, and somewhere your program tries to stuff 10,000 characters into a 100 character array in a seemingly unrelated part of the program. Where did those extra 9,900 characters go? Some may have overwritten your search tree, maybe they were not written in a "dangerous" part of the program and you won't notice, etc.
Regards,
Paul McKenzie
Last edited by Paul McKenzie; March 31st, 2009 at 10:16 AM.
but I was hoping that I could get rough ideas or possibilities without asking someone to go through thousands of lines of my own code (which seems rude, to me).
Can you distil it down to a simple version that exhibits the same problems?
The best guess anyone can make without seeing code is that you could be using memory that has been deleted and used elsewhere, not allocated enough memory for the data, written past the end of an allocated memory area.
I imagine that you are not using any STL containers.
Doing so would probably alleviate all of your memory allocation difficulties.
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
Richard P. Feynman
Ok, here is the code. Some quick comments. First off, this is all being done in linux. Not all of it is relevant, I just zipped up the whole directory structure. If you look in the make file (in the Unix build directory), it compiles several executables. The only one that matters is "HysteresisConsole". The files for it are in HysteresisFiles directory and in Console. John, although I have narrowed down the crash, because of all the memory issues I said I don't know how I could distill the part that causes the crash because I'm not really sure where the misallocation is happening.
When you run the executable, in order to cause a crash (many methods cause crashes, but here's an example), use the commands
d 3 l 64 display - algorithm 1
run 40
This changes dimension to 3, length to 64, removes display, and changes to second algorithm. Then it runs the first 40 seeds. It should crash at seed 31 (at least that's where it did on mine, interesting if it crashes at different places). Thanks,
First impression: I don't think I've ever seen so many new/delete instances in a C++ application for years (or ever!)
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
Richard P. Feynman
If were doing this I would have declared all of the dynamic arrays as vectors.
As an example, I would have written the SortedListHysteresisSimulation constructor/destructor something like this.
Code:
//////////////////////////////////////////////////////////////////////
// Construction/Destruction
// Using std::vector for dynamic arrays
//////////////////////////////////////////////////////////////////////
/// Construct a new SortedListHysteresisSimulation
SortedListHysteresisSimulation::SortedListHysteresisSimulation(double R, int D, int L, int seed)
: D(D),
L(L),
R(R)
{
Z = 2 * D;
rand.seed(seed);
int i;
N = 1;
// Calculate the number of spins
for(i = 0; i < D; i++) N *= L;
M = -N;
// Allocate space for spins, etc.
Z = 2 * D;
nextSpin.resize(Z + 1);
neighborLocs.resize(Z);
neighborCoords.resize(Z, std::vector<int>(D));
lattice.resize(N);
randomField.resize(N;
nUp.resize(N);
sortedSpins.resize(N);
nextSpin.resize(Z + 1);
// Set all the spins to point down
for(i=0;i<N;i++) lattice[i]=-1;
// Generate the random fields
//for(i=0;i<N;i++) randomField[i] = rand.gaussian(R);
//previous line replaced with my correlated field generator. Note that D must equal 3, otherwise expect some very strange results.
double eta = 1.0;
corrCube(L,R,eta,randomField,seed);
// Set the number of up neighbors to zero for each spin
for(i=0;i<N;i++) nUp[i]=0;
// Create a list of spins sorted by random field
for(i=0;i<N;i++) sortedSpins[i]=i;
SortFields();
// Set up strides array for converting between D-dimensional coordinates,
// and positions in the 1-dimensional lattice array
stride.resize(D);
stride[D-1] = 1;
for(i=D-2;i>=0;i--) {
stride[i] = stride[i+1]*L;
}
// Set the initial H to just flip the first spin
H=-randomField[sortedSpins[0]]+Z;
// Set up nextSpin for finding the starts of avalanches
startNup = 0;
stopNup = 2*D;
for(i=0;i<=2*D;i++) {
nextSpin[i]=0;
while((nextSpin[i]<N) && (2*(i-D)+randomField[sortedSpins[nextSpin[i]]]+H>0.0))
nextSpin[i]++;
if(nextSpin[i]==N) stopNup--;
}
}
// Free up memory from the SortedListHysteresisSimulation
SortedListHysteresisSimulation::~SortedListHysteresisSimulation()
{
// Notice the absence of the requirement to 'delete' anything
}
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
Richard P. Feynman
There are many things fundamentally wrong with the code. First off, I agree with JohnWessex. There is an overuse of new[] in that application when std::vector would have alleviated a lot of this coding. Using new[] and delete[] to create dynamic arrays is using coding techniques that have been long been obsoleted by the std::vector class. Also, I wouldn't be the least bit surprised if std::vector increased the speed of the program, and not only provide much more safety.
The strange thing is that one of the CPP files includes <vector> from STL, so there is absolutely no excuse that std::vector wasn't used in all the other parts of the program also.
Also, none of the classes that I saw are safely copyable, or have the copy construction and assignment disabled. In other words, I can easily break any program you write with just two lines of code.
Bang, your dead. Hopefully you see that this calls delete[] twice on the same memory when s1 and s2 are destroyed. If I can break this entire code base with those two lines, can you imagine what a real main() program could do?
Either the copy and assignment has to be turned off by making these functions private and unimplemented, or you provide correct copy and assignment operators. If you have no idea what I'm talking about, then it is your priority to change to std::vector instead of new[] and delete[] (even if you did know what I'm talking about, still use std::vector).
But what I don't understand is again, vector is used in some parts, and totally ignored in other parts, and instead replaced by dangerous and faulty C++ programming using pointers and dynamically allocated memory.
* The Best Reasons to Target Windows 8
Learn some of the best reasons why you should seriously consider bringing your Android mobile development expertise to bear on the Windows 8 platform.