Debugging suspected memory overwrite
Hello,
My team and I have been creating some software from a large engine that we have complete access to but are not hugely familiar with.
The software crashes intermittently, when the crash does crop up with some amount of reliance, any changes made to the software (to log function calls etc) stop it from occurring.
This has lead us to believe that we have some form of memory overwrite. I have personally been through every single memory allocation and array in the software to look for any obvious mis-use or lack of bounds checking to no avail. Fortunately, we mostly use stl containers so it wasn't too rough for the software, however, we don't have the luxury of time to do the same with the engine.
I was hoping that any readers of this thread would have some suggestions to help track down memory overwrites. All suggestions would be greatly appreciated.
One method I have seen a few times whilst searching, is to overload new and delete, then add a buffer to the beginning and end. Upon deletion of the object if the buffer has any data in it, there has been an overwrite.
I am not very technical so this solution seems somewhat daunting, if anyone has used this technique or has any further/related reading on it, I would be equally appreciative of you sharing such information.
Thank you for reading,
nixius
Edit: using C++ and VS2010, Windows7.
Re: Debugging suspected memory overwrite
what other possible causes have you considered, and why have you rejected them? e.g. threading issue.
I'm sure googling will throw up plenty of information. You may need to read up on 'placement new'.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
...
The software crashes intermittently, when the crash does crop up with some amount of reliance, any changes made to the software (to log function calls etc) stop it from occurring.
Didn't you try to debug your app to find out what where and why goes wrong?
Re: Debugging suspected memory overwrite
Hi.
When the program exits, the log outputs:
CAUGHT[signal] : SIGSEGV : Segment violation (11)
CAUGHT[UnhandledException] : Access violation @ 0X7C9118CA: Bad read on 0x00000024 The thread tried to read from or write to a virtual address for which it does not have the appropriate access.
We don't do any multi-threading.
To be honest, a memory overwrite is the only thing I could think the issue to be.
Of course we have all tried debugging, I did omit to mention that the issue has only occurred on the 'live PC' (This machine goes out to clients with the software on). None of our dev machines in debug/release have been able to reproduce the bug.
I suppose it could be the RAM on the live machine.
I have been googling a lot, but mostly I find people talking about code for memory leaks, which is not very helpful for us since 99% of our memory is assigned at the start and released at the end of the software running cycle, or commercial software which is not an option for us.
Thanks for the responses.
Re: Debugging suspected memory overwrite
Could you try to use map file to locate the place of Access violation in your code?
Finding crash information using the MAP file
Re: Debugging suspected memory overwrite
This has been mentioned, unfortunately on the latest occasion of the crash, the .Map file was not copied across to the live machine =(
We have a rough idea already from the game logging (i.e it happens between where it logged last and didn't get to the next log message) where the crash happened last however, it runs through this same piece of code absolutely fine several times and does not always crash which is odd in itself.
The crash has been around the same point in the software since it first started happening, but after code changes (bug fixes, adding logging etc) the crash doesn't occur (or the frequency drops) or it happens slightly later/earlier on.
When looking around that area for suspect code, all arrays are containers like std::vector/map/list (so no accessing outside of bounds), the only alarm bell is there are 4 'new' objects created around that time but even then the log either reports it crashes at roughly the time of the the first two being allocated, later on creating the second 2, or it does not crash at all and runs through.
My point with all this information is, that it is not really the location of the crash that is the problem, more over it is the fact it happens randomly, in different locations; this leads me to believe it is a memory overwrite. I think.
Mulling it over, if it is always around a 'new' call, perhaps it may be a problem allocating memory under certain circumstances... hmmm...
This one has had me frustrated for several days. I have told my colleagues to put .map files on for future builds, thanks for the reminder.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
This has been mentioned, unfortunately on the latest occasion of the crash, the .Map file was not copied across to the live machine =(
Why do you think you need to copy " .Map file ... across to the live machine"?
Map file is created while compiling + linking the exe on your dev machine. Then you can use it to find out the crash location...
Re: Debugging suspected memory overwrite
Because the project has been been modified since, therefore the map file does not match the executable that crashed.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
Because the project has been been modified since, therefore the map file does not match the executable that crashed.
Ah.., well. Understand.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
Because the project has been been modified since, therefore the map file does not match the executable that crashed.
Well, if you don’t preserve auxiliary files for each released version (you really should), you sure can pull that version from your source code version control system and rebuild it, right?
Re: Debugging suspected memory overwrite
It was a local version, not checked into any version control =(. Even so, it happened with a lot of infrequency, and the .map file will only point to a function if I remember correctly?
Edit: I should mention we have a 'live' system for us to test on, this is not a final release procedure or anything, this is part of our testing process.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
When looking around that area for suspect code, all arrays are containers like std::vector/map/list (so no accessing outside of bounds), the only alarm bell is there are 4 'new' objects created around that time but even then the log either reports it crashes at roughly the time of the the first two being allocated, later on creating the second 2, or it does not crash at all and runs through.
Since I think we can reasonably assume that the VC++ CRT's new is bug-free, I think there's some chance that the crash is caused by the costructors of the classes in question or something that is called by them, directly or indirectly. Perhaps it's worth to focus the search for the bug on that code. (This doesn't mean though, that you only should look for the bug there.)
The address your app is trying to read from which causes the access violation indicates that the cause is the use of an invalid pointer. This may or may not be due to memory corruption. It may also be a logic bug, like, for instance, indexing a null pointer.
Re: Debugging suspected memory overwrite
Thank you for your response Eri.
I will take another look into that class.
I am still quite confident that it is a memory overwrite, so going back to the first post, any tips on how to debug those would still be appreciated.
Thanks.
Re: Debugging suspected memory overwrite
Does this software contain Windows-specific code? The best tool for debugging these sorts of issues is valgrind, but it's only available on Linux (and I think OSX).
There are similar tools available for Windows, but they aren't free (so far as I know). What you want is a dynamic code analysis tool.
By the way, usage of STL containers does not guarantee lack of out-of-bounds access. It's true that Microsoft's vector implementation does bounds-checking by default on both [] and at(), but this can be disabled (and often is for speed reasons), and checking of [] is actually not mandated by the standard so if you aren't using the default Dinkumware implementation, you may not even get this.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
Of course we have all tried debugging, I did omit to mention that the issue has only occurred on the 'live PC' (This machine goes out to clients with the software on). None of our dev machines in debug/release have been able to reproduce the bug.
Your application should have produced a crash dump. If not, then consider putting one in your application and have it enabled. Then when the application crashes, you get the dump from your clients and you can then debug the issue on your machine by loading the crash dump in Visual Studio.
Read this thread:
www.codeguru.com/forum/showthread.php?t=517072
Also, you have something called "remote debugging" if you have TCP/IP access to one of these machines.
Also, mentioning that it crashes only on live machines is not out of the ordinary. I bet that every one that has responded to you has had applications that "work" in the shop, but crash elsewhere.
Quote:
I suppose it could be the RAM on the live machine.
If you're a C++ programmer and said anything like that, in many companies, it could get you fired, or at the very least, have your stature as a C++ program greatly diminished.
A C++ (or C) programmer never blames hardware or anything else for their program crashing -- the blame always goes to the program having bugs. Unless you have verifiable proof that the problem is elsewhere, never mention anything having to cause the problem except for the program itself.
Another thing you don't do -- don't start moving code around until you've debugged the crash and know exactly why it happened. What I see some coders doing is rearrange some code here and there, and voila, the crash is gone without any explanation. For example, they introduce a couple more variables, or they remove a function, etc. Unfortunately, all that has happened is that the bug was moved to another part of the application, and who knows when it will appear again.
Regards,
Paul McKenzie
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
Paul McKenzie
Another thing you don't do -- don't start moving code around until you've debugged the crash and know exactly why it happened. What I see some coders doing is rearrange some code here and there, and voila, the crash is gone without any explanation. For example, they introduce a couple more variables, or they remove a function, etc. Unfortunately, all that has happened is that the bug was moved to another part of the application, and who knows when it will appear again.
In some cases identifying which types of code movement tend to hide the bug can be helpful for solving it. But I agree----always return the code to the non-working state after such tests, until you have pinned down the cause.
Re: Debugging suspected memory overwrite
It does indeed contain Windows specific code. I have read a lot of the wonders of valgrind.
Quote:
By the way, usage of STL containers does not guarantee lack of out-of-bounds access. It's true that Microsoft's vector implementation does bounds-checking by default on both [] and at(), but this can be disabled (and often is for speed reasons), and checking of [] is actually not mandated by the standard so if you aren't using the default Dinkumware implementation, you may not even get this.
As far as I know, we don't do anything other than standard use but it's a good idea to investigate. Thanks!
Quote:
Your application should have produced a crash dump. If not, then consider putting one in your application and have it enabled. Then when the application crashes, you get the dump from your clients and you can then debug the issue on your machine by loading the crash dump in Visual Studio.
Hmm yes there was a .dmp file (I assume that is what you are referring to) however, it generated at 0 bytes. It would be a good idea to investigate why.
Quote:
Also, you have something called "remote debugging" if you have TCP/IP access to one of these machines.
Very true, I have mentioned this to management however, we have not got round to setting all this up correctly yet (some networking issues).
Quote:
If you're a C++ programmer and said anything like that, in many companies, it could get you fired, or at the very least, have your stature as a C++ program greatly diminished. A C++ (or C) programmer never blames hardware or anything else for their program crashing.
While I see your point, it is a perfectly valid possibility in my humble opinion and I am in no way trying to 'fob off' this problem onto the hardware. I have been fighting this crash for days. I was merely musing 'out loud'.
Quote:
Another thing you don't do -- don't start moving code around until you've debugged the crash and know exactly why it happened
Oh tell me about it! We are behind on the project, and management has one guy fixing bugs and two of us trying to track this crash; no matter how much I tell them we need a version with the crash in to test. After all, if you can't reproduce the bug with some frequency or certainty, how can you confidently say you have fixed it?
Another problem is we only have one live machine to test with.
Thank you both for your posts, they have given me some solid extra things to think about and pursue.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
Hmm yes there was a .dmp file (I assume that is what you are referring to) however, it generated at 0 bytes. It would be a good idea to investigate why.
You need to actually write code to enable the crash dump. The issue is that relying solely on the OS to produce the dump on the crash (i.e. Dr. Watson), will not reliably produce a dump for you, at least that's my experience.
In your code, you should set up the "crash handler", and when a crash occurs, the code calls the functions in DBGHELP.DLL to write the dump file (if you ever saw this DBGHELP.DLL show up on Windows systems -- well now you know what it does). There are examples here and on the Internet that you can use that shows proper usage.
Also, for every build you generate that you release to the public -- save the PDB files and source code! They are your lifeline in debugging crash dumps.
If everything is set up correctly and you've saved the PDB and source files, the next crash you get will produce a crash dump. You then get the crash dump and load the dump file in Visual Studio (as a project). Then you "run" the dump file in Visual Studio, the code will point in the source where the bug occurred, or approximately where the bug occurred (since you are probably releasing an optimized build).
In any event, you must have a solid contingency plan when a crash occurs in the field that cannot be produced in the shop. Crash dumps and remote debugging have the most power when it comes to debugging these issues, log files are third on the list.
Regards,
Paul McKenzie
Re: Debugging suspected memory overwrite
adding to what others said,
firstly, what were the results of overloading new/delete ? I ask, because you didn't mentioned that after post #1 ...
secondly and as a last resort, if introducing logging functions failed to reproduce the crash because of the compiler changing the code in proximity of the offending code or because of logging-induced changes in the state of the program heap memory, then you could try the Microsoft Detour library to inject the logging function logic at runtime ( eventually, with it's own heap ) to be the less intrusive as possible on the original code ...
Re: Debugging suspected memory overwrite
Thanks for the post Paul, I will definitely do some searching around the .dmp topic (I was told it should already be working but this blatantly isn't the case) and take on board what you said about the release procedure.
Thanks for the post also superbonzo, so far I have been having difficulties getting the new/delete overloading working because we rely on a lot of third party libraries; this is causing some conflicts and I am looking into some possible work arounds. Nonetheless, I will do some searching on the Microsoft Detour library you mentioned and keep it in mind.
Re: Debugging suspected memory overwrite
Quote:
Originally Posted by
nixius
Thanks for the post Paul, I will definitely do some searching around the .dmp topic (I was told it should already be working but this blatantly isn't the case)
What you can do to test things is that somewhere in your program where you know the code will be executed, do this:
Code:
char *p = 0;
*p = 'x';
This should produce a crash. If the crash reporting system you have set up does work, then it should create a dump file. If a crash dump file is not produced, or one is produced with 0 bytes, then there is an issue that needs to be resolved.
But again, the only sure-fire way to resolve any issue is to write code to enable the crash dump mechanism. You control where to write the crash dump file, and how the crash is reported. For example, you can override the default dialog box stating stuff about "Do you want to report this to Microsoft" and replace it with your company's "crash dialog". If you've run your share of Windows apps, you must have seen apps that have their own crash reporting dialogs with the company's logo, copyright, etc. listed in the dialog -- this is what you should be striving for.
Regards,
Paul McKenzie
Re: Debugging suspected memory overwrite
I think you need to do a drawing and follow each &value you`re using. I would perhaps try to do full inline branch of parameter checks to get a same*value check`d for pointer-homogenity. For instance a cerro class (class error reporting) may output and trace each value modification and diplay error if values are greater than initial 'values'.
Second think I can imagine is to avoid full instanciate of program down to a start-up logo: firstly boot skeleton then load logo; then inst class/value1 ->get confirmation, then further so on inst class/values2 ->get confirmation... Most apps nowadays either get started or opens a bluescreen. If memory fault appears at startup then runtime then Paul McKenzie is right about your programers credit, and if error appears during operation then you.. must become a better programmer.
It`s better to use your program w/ drawings to help work w/ blocks/parts of your program. Most users instanciate 'a fileopendialog() inline' by prefference in the midst of program. Better way I did is I wrote a fopen() procedure that requires a char argument and fills in other structs a-macly (the fileopenstruct and path structures there is left out).
====================================================================
kirchoff hadn`t write ebooks for branches:D
Re: Debugging suspected memory overwrite
In addition to what everyone said, and this is more of a preventive measure, make sure you set the warning level to the highest (/W4 or /WAll) and "Treat warnings as errors" is on. Also use a static syntax checking tool, such as the free CppCheck, or preferably a more powerful commercial tool such as Parasoft's C++test.