[RESOLVED] Question about volatile registers

Hello all,

Ive got an intermittent problem and think Ive finally narrowed it down. In my library almost every function saves every register that it uses that is not also used to pass parameters, then restores them before returning. I did this to make it easier when calling a function so that the caller could expect as little state change as possible. Sometimes though, one of the volatile registers (which was saved at the beggining of the function and restored) will return changed. This problem usually happens rarely (usually taking on the order of at least a few million calls to various functions), it happens in random places after calls to apparantly random functions and I am wondering, if the system were to switch contexts say right at the point where it is returning from a function call, would it change the volatile registers?

An example:

Code:

Function1 PROC mov r11, rcx call Function2 ;<-----is it possible for r11 to change in this call even though function 2 restores r11 before returning? add r11, r12 mov rax, r11 RET Function1 endp Function2 PROC push r11 mov r11, rdx mov rax, rcx mul r11 pop r11 RET Function2 endp

just some bs code to illustrate an example (I could give a concrete example, but that would be a lot more code that I thought would confuse the concept). Anyways, in the example above, r11 is used in both functions, function 2 saves r11 before using it and restores it before returning, now 99.99999999% of the time everything runs fine, but every once in a great while r11 will be changed when function1 uses it after returning from the call to function2 it will be changed.

Ive been able to trap this problem by checking r11, which in most of my functions holds a pointer to the first array passed in to a function, immediately after returning from a function call with something like this:

Code:

Function1 PROC local _tr11:qword mov r11, rcx mov _tr11, r11 call Function2 ;<-----is it possible for r11 to change in this call even though function 2 restores r11 before returning? cmp r11, _tr11 je ItsGood mov rax, rax ItsGood: add r11, r12 mov rax, r11 RET Function1 endp Function2 PROC push r11 mov r11, rdx mov rax, rcx mul r11 pop r11 RET Function2 endp

Ill put a breakpoint on the mov rax, rax instruction so that it runs fine unless it returns from the function call changed. I cant figure out HOW its getting changed though, so I was wondering, since the system considers r11 volatile across function calls, would it save r11 if a context switch happens after completing a call to a function but before dropping back in to the calling function?

Thanks in advance.

Re: Question about volatile registers

One more thing, it appears to happen in multi-threaded mode but not when run in single threaded mode (only about 95% sure of this at the moment though). Im still running it in different modes to see if I can narrow the problem down more.

Re: Question about volatile registers

That pretty much looks like plain stack corruption to me: When the saved R11 value gets popped back, the stack memory holding the value has been unintentionally modified in the meantime by whatever. Context switches should in any case be entirely transparent to application code, so I wouln't even think of blaming them unless I'd suspect the OS to be buggy.

BTW, why are you using a MOV RAX,RAX instruction? Doen't x64 feature a NOP?

Re: Question about volatile registers

Quote:

That pretty much looks like plain stack corruption to me:

I am not sure where its going wrong, all I can say so far is that r11 is correct before the call to function2, and that every once in a great while it gets corrupted between the function call and its return. I am working on putting code in to verify r11 before it leaves function2 now. I just cant see why it happens so rarely. In all of my algorithms the only pushes and pops are at the very beggining of the alg and right before the ret, I even use macros for it, that all happens during setup and clean up of the stack frame, so it would seem to me it should do it everytime it hit the function. I am also pretty sure it is only happening in multi threaded mode, but the only objects that should be shared between threads are a few global constants which are initialized during startup and then dont change. Anyways, I thought context switches should be transparent to the program but wasnt sure if there was some subtle detail about it wasnt aware of.

Quote:

BTW, why are you using a MOV RAX,RAX instruction? Doen't x64 feature a NOP?

yes there is, its just an old habit from my microprocessor class from 1982, the processor in our lab setup didnt have a nop instruction so I used to do something like it to achieve a nop and been stuck on stupid ever since.

I forgot to mention in the original post, if I change r11 to r15 (either in function 1 or in both functions) the problem goes away, thats why I phrased it as a question about voltile registers. If I use a non-volatile (such as r15) the problem goes away.

I just verified that r11 is correct when leaving function2. Somehow it is changing between leaving function2 but before it drops back in to function1. Any ideas what would cause that?

Re: Question about volatile registers

Quote:

Originally Posted by AKRichard

yes there is, its just an old habit from my microprocessor class from 1982, the processor in our lab setup didnt have a nop instruction so I used to do something like it to achieve a nop and been stuck on stupid ever since.

Mostly out of curiosity, what MP was it you were using these days? I was mostly writing code for the Z80 in the early 80s, and that had the NOP encoded as 00h (just like the Intel 8080), a value that uninitialized RAM cells had on many hardware platforms back then. (Uuuh, nostalgia... :D) The x86 encoding of NOP, at leadt as far as 32-bit goes, is 90h, and I have a faint memory of there really being a reason behind that, I just don't recall what it was...

Any further reflection about your post not before I had some sleep... :)

Re: Question about volatile registers

I dont remember what the mp was, it had the mp and a little bit of memory all on one chip, but it was part of a kit that had a piece of bread board, 2 of the 8 segment leds, a couple of slide switches and a couple of push button switches, you programmed the thing by punching in the op codes and hitting a button. Pretty cool kit at the time actually, if I remember right it only worked on 8 bits and I dont think it even had a div instruction. But I spent hours at that stupid thing wiring in seperate address decoders and different timing circuits. Today you can get a kit at radio shack that would blow it away, but at the time I was only 14 and was still a few months away from getting my first tandy. I still have the tandy by the way, one of the first color computers, it had a whole 8 colors and 64k memory woohooo lol.

As to the original question, Ive found a few articles that mention r10, r11 specifically are used by the syscall/sysret instructions, and one article that mentions those two must be preserved by the caller, but it doesnt mention if thats if the called function uses it or f it should allways be saved. http://gcc.gnu.org/ml/gcc/2008-04/msg00550.html. Its not just any volatiles it is specifically r10/r11, if either of them are used in function1, then sooner or later it hits an exception

Re: Question about volatile registers

It shouldn't ever fail with THAT exact snipped of code you posted. The only time it may not is if you also have a hardware interrupt handler that doesn't restore r11. (but then I'd expect you to see a whole lot of other issues as well).
If a context switch would cause it, this would wreck massive issues in your app as a whole.

If you're getting it with another bit of code which you didn't post. then... only you can tell...

do note that r11 is not guaranteed preserved after a windows API call. so if your actual code calls windows API functions, then that may be your cause. Another possible error is as said the stack getting overwritten.
(r15 is preserved through api calls btw, which could explain why changing your code to r15 solves the problem).

Re: Question about volatile registers

Ok, I have to break this post in two parts (it said Im almost double what it allows to post).

thats not the exact code, but illustrates the idea, the exact code where the exception pops up (because r11 no longer holds my pointer) is: this would be equivalent to function1 above:

Code:

APow64 PROC base:PTR qword, exp:PTR qword ;saves the registers we use and adjusts the stack pointer ENTERFUNC _zero, _one, _zero, _zero, _one, _one, _one, _one, _one, _one, _one, _zero, _one, _zero, _zero ;prep by saving the base, exponent, and decimal place, moving rcx into r11, rdx into r12, clearing r8 and r10, then begginging our checks mov _v1, rcx ;save the base mov _v2, rdx ;save the exponent mov _var3, rax ;saves the decimal point (this will be kept as the original decimal place variable) mov _var4, rax ;save the decimal point again (this will keep the running total for where the decimal point is) mov r11, rcx ;move the base into r11 mov r12, rdx ;move the exponent into r12 mov r8, 0 ;clear r8 mov r10, 0 ;clear r10 cmp qword ptr[r12], 1 ;check if the exponents array has only one element ja NextCheck ;if its more than one jump to the next check because it has to be greater than one cmp qword ptr[r12], 0 ;check if the element count is 0 for the exponent je ReturnOne ;technically this should be an error since no array should have an element count of 0, but we treat as if the exponent is equal to 0 cmp qword ptr[r12+8], 0 ;checck the first value element for 0 je ReturnOne ;jump to return one if it does cmp qword ptr[r12+8], 1 ;check if the first vaalue element equals 1 je ReturnVal1 ;jump to returnval1 if it does (this will return a copy of the base) NextCheck: ;here we check the base for the same values we checked for the exponent cmp qword ptr[r11], 0 ;check if the element count is 0 (again, this should be an error but we treat it as a zero) je ReturnZero ;jump to returnzero if it does cmp qword ptr[r11], 1 ;check if the element count is one ja MoveOn ;cancel the rest of the checks if it is above cmp qword ptr[r11+8], 0 ;compare the first value element for 0 je ReturnZero ;jump to returnzero if it equals cmp qword ptr[r11+8], 1 ;compare the first value element for 1 je ReturnOne ;jump to return one if it does MoveOn: ;first thing to do after the checks is to count how many elements for the answer and temp arrays we need. I have special versions of the multiply and square routines that do not create ;a return array (they use the temp array that we create). Then is a lot faster then creating and destroying the arrays with each pass of the alg. We overestimate the element count ;by taking the number of elements in the base to start the running total at, each iteration we double the running total and add one, if the current bit in the exponent is 1 ;we add the base element count to the running total and add one. In the actual alg that extra byte will not allways be used. To start we move the exponent element count into rdi and save it in _var1 ;then we count the number of bits in the high byte of the exponent, load the bits array into r9 (this contains the powers of 2 from a power of 0 to a power of 63), move the element count of the ;base into rax, increase that count a little to give ourselves some room, make sure the bit count in the high byte of the exponent was not 0 (if its not we jump to decandgo if it is we ;decrement the index into the exponent, make sure the index is not 0, if it is the count is done, if not store 64 in rsiand move on) mov rdi, qword ptr[r12] ;move the element count of the exponent into rdi mov _var1, rdi ;save the element count in _var1 bsr rsi, qword ptr[r12+rdi*8] ;count the number of bits in the high byte of the exponent mov _var2, rsi ;save the bit count in _var2 lea r9, _bits ;load the address to the bits array into r9 (the bits array have the powers of two) mov rax, qword ptr[r11] ;move the element count of the base into rax add rax, 10 ;increase the element count some to give ourselves some room cmp rsi, 0 ;compare rsi to 0 ja DecAndGo ;if its above jump to decandgo cmp rdi, 1 ;if rsi is 0 compare rdi to 1 je CountDone ;if rsi equals 1 then the count is done mov rsi, 64 ;otherwise store 64 in rsi dec rdi ;decrement the index into the exponent mov _var1, rdi ;save the index in _var2 DecAndGo: dec rsi ;we need to decrement the index into the bits array by 1 because the most significant bit is allready counted mov _var2, rsi ;store the index into the bits array in _var2 CountLoop: add rax, rax ;double rax inc rax ;add one to rax mov rcx, qword ptr[r12+rdi*8] ;move the byte of the exponent we are working on into rcx and rcx, qword ptr[r9+rsi*8] ;and the byte will the value stored in the bits array pointed to by rsi jz NoMult1 ;if the result was 0 then dont add the base element count add rax, qword ptr[r11] ;add the element count of the base to rax inc rax ;add one to rax NoMult1: cmp rsi, 0 ;compare the index into the bits array to 0 ja NotZero ;if its above 0 jump to notzero cmp rdi, 1 ;otherwise to move to the next byte in the exponent, compare the index into the exponent to 1 jna CountDone ;if its not above the count is done mov rsi, 63 ;otherwise set the index into the bits array to 63 dec rdi ;decrement the index into the exponent jmp CountLoop ;jump to the head of the count loop NotZero: dec rsi ;the check above meant we still have bits in the current byte to go through so decrement the index to bits jmp CountLoop ;jump to the head of the count loop CountDone: ;now that we know how many elements we need for the arrays, we multiply the count by 8 (8 bytes in 64 bits), then make 2 calls to getmem and copy the base to one of them. mov _t1, rax ;move the result of the count into _t1 mul _eight ;multiply the result by 8 for 64 bits mov rcx, rax ;move the final result into rcx mov _var7, r8 ;temporarily here for debugging mov _t9, r11 ;temporarily here for debugging mov r8, r11 ;temporarily here for debugging mov rdx, 187679 ;temporarily here for debugging mov _r8, r8 ;this is here trying to fix the exception mov _r9, r9 ;this is here trying to fix the exception mov _r10, r10 ;this is here trying to fix the exception mov _r11, r11 ;this is here trying to fix the exception call GetMem64 ;call our getmem routine mov r11, _r11 ;this is here trying to fix the exception mov r10, _r10 ;this is here trying to fix the exception mov r9, _r9 ;this is here trying to fix the exception mov r8, _r8 ;this is here trying to fix the exception cmp r11, _t9 ;temporarily here for debugging je NoProb ;temporarily here for debugging mov rax, rax ;temporarily here for debugging NoProb: ;temporarily here for debugging cmp r11, _v1 ;temporarily here for debugging je NoProbA ;temporarily here for debugging mov rax, rax ;temporarily here for debugging NoProbA: ;temporarily here for debugging mov r8, _var7 cmp rax, 0 ;make sure what was returned from the routine is valid je MallocError ;if its not jump to mallocerror mov _retval, rax ;move the pointer into _retval mov rcx, qword ptr[r11] ;move the element count of the base into rcx inc rcx ;increment the count by one for the element count at position 0 mov rdi, rax ;move the pointer we just got into rdi mov rsi, _v1 ;move the pointer for the base into rsi pushf ;save the flags register cld ;clear the direction flag rep movsq ;copy all the elements from the base into the array we just got popf ;restore the flags register mov rax, _t1 ;move the size of our return array into rax mul _eight ;multiply it by 8 for the 64 bits mov rcx, rax ;move the result into rcx mov _var7, r8 ;temporarily here for debugging mov _rdx, 187679 ;temporarily here for debugging mov r8, r11 ;temporarily here for debugging mov _r8, r8 ;this is here trying to fix the exception mov _r9, r9 ;this is here trying to fix the exception mov _r10, r10 ;this is here trying to fix the exception mov _r11, r11 ;this is here trying to fix the exception call GetMem64 ;call our getmem routine mov r11, _r11 ;this is here trying to fix the exception mov r10, _r10 ;this is here trying to fix the exception mov r9, _r9 ;this is here trying to fix the exception mov r8, _r8 ;this is here trying to fix the exception cmp r11, _t9 ;temporarily here for debugging je NoProb2 ;temporarily here for debugging mov rax, rax ;temporarily here for debugging NoProb2: ;temporarily here for debugging cmp r11, _v1 ;temporarily here for debugging je NoProb2A ;temporarily here for debugging mov rax, rax ;temporarily here for debugging NoProb2A: ;temporarily here for debugging mov r8, _var7 cmp rax, 0 ;make sure we got a valid pointer back je MallocError ;if not jump to mallocerror mov r10, rax ;move the pointer into the r10 register mov _t9, r10 ;save the pointer in _t9 mov r8, _retval ;move _retval into r8 MainLoop: ;this is just the standard repeated square and multiply alg. On each iteration we will square the _retval (kept in r8), the special square alg will take r8 and store the result ;in the array into r10, next we check the current bit of the current byte in the exponent for 1, if it is 1 then we multiply the result from the squaring by the base. mov rax, _var4 ;move the running total for the decimal point into rax add _var4, rax ;add rax back into _var4 to double it mov rdi, r10 ;move the pointer to the temp array into rdi mov rcx, qword ptr[r8] ;move the element count in _retval into rcx add rcx, rcx ;add rcx to itself to double it inc rcx ;make room for the element count inc rcx ;give it one extra byte in case there is still a carry to do after the final multiply mov qword ptr[r10], rcx ;rcx has our (possible) new element count for the result of the squaring, so store it at position 0 of the temp array mov rax, 0 ;move 0 into rax add rdi, 8 ;add 8 to rdi so that it points to the element at position one of the temp array pushf ;save the flags register cld ;clear the direction flag rep stosq ;store 0 in all the value elements popf ;restore the flags register mov rdi, 1 ;set rdi to 1 mov rsi, 1 ;set rsi to 1 mov rcx, 1 ;set rcx to 1 mov rdx, 0 ;clear rdx SqLoop: mov rbx, rdx ;rdx holds if there was a carry from the last iteration, save it inrbx mov rax, qword ptr[r8+rdi*8] ;move the current byte of the result array into rax mul qword ptr[r8+rsi*8] ;multiply by the other current byte of the result array add qword ptr[r10+rcx*8], rax ;add the result to the current position of the temp array adc rdx, 0 ;if there was a carry from the add, this will add one to the carry from the multiply add qword ptr[r10+rcx*8], rbx ;add the carry from the last iteration to the current position in the temp array adc rdx, 0 ;if there was a carry from the add, this will add one to the carry from the multiply (only one at most of these adc instructions will inc rdx) inc rcx ;inc our index into the temp array inc rdi ;inc our first position in the _retval cmp rdi, qword ptr[r8] ;compare rdi to the element count jna SqLoop ;if its not larger than jump to the head of the sqloop mov qword ptr[r10+rcx*8], rdx ;the first position reached the end of retval so store the carry from the last iteration in the next element of the temp array (we add it here just in case the second position of the retval has been reached) mov rdx, 0 ;zero out the return so we dont add it again mov rdi, 1 ;point the first index into retval to the first element inc rsi ;inc the second index into retval mov rcx, rsi ;point the index into the temp array to the same position as the second index into the retval cmp rsi, qword ptr[r8] ;make sure the second index is not past the last element jna SqLoop ;if its not jump to the head of the square loop SqDone: mov rdi, qword ptr[r10] ;move the element count of the temp array into rdi ZCheck1: cmp qword ptr[r10+rdi*8], 0 ;compare the current element we are pointing at to 0 ja ZCheck1Done ;if its above jump out cmp rdi, 1 ;otherwise compare the index into the array to one jna ZCheck1Done ;if its not above jump out dec rdi ;otherwise decrement it by one mov qword ptr[r10], rdi ;and store it back in position 0 of the temp array jmp ZCheck1 ;jump to the head of the zero check loop ZCheck1Done: mov rax, r10 ;we switch the arrays so that the current result is allways in r8 mmove r10 into rax mov r10, r8 ;move r8 into r10 mov r8, rax ;move rax into r8 mov rdi, _var1 ;move the index into the exponent into rdi mov rsi, _var2 ;move the index into the bits array into rsi mov rax, qword ptr[r12+rdi*8] ;move the current byte of the exponent into rax and rax, qword ptr[r9+rsi*8] ;and rax with the value pointed to in bits jz NoMult2 ;if the result was 0 then jump past the multipliaction loop mov rcx, qword ptr[r8] ;otherwise move the element count of retval into rcx add rcx, qword ptr[r11] ;add the element count of the base to rcx inc rcx ;inc the count for the element count to store in the temp array inc rcx ;inc the element count in case the result of the multiply goes over mov qword ptr[r10], rcx ;store the calcd element count at position 0 of the temp array mov rdi, r10 ;move the pointer of the temp array into rdi add rdi, 8 ;add 8 to rdi so that we point to the element at position 1 mov rax, 0 ;move 0 into rax pushf ;save the flags register cld ;clear the direction flag rep stosq ;clear all the elements in the temp array popf ;restore the flags register mov rsi, 1 ;point to the first element of the retval mov rdi, 1 ;point to the first element of the base mov rcx, 1 ;point to the first element of the temp array mov rax, _var3 ;move the original decimal point into rax add _var4, rax ;add it to the running total mov rdx, 0 ;clear rdx MultLoop: mov rbx, rdx ;again rdx holds the carry from the last iteration so save it in rbx mov rax, qword ptr[r8+rsi*8] ;move the current element of retval into rax mul qword ptr[r11+rdi*8] ;multiply by the current element in the base add qword ptr[r10+rcx*8], rax ;add the result to the current element of the temp array adc rdx, 0 ;if the last add resulted in a carry this will add it to the carry from the multiplication add qword ptr[r10+rcx*8], rbx ;add the carry from the last iteration to the current element in the temp array adc rdx, 0 ;if the last add resulted in a carry than add it to the carry from the multiply (only one of the adc instructions will inc rdx) inc rcx ;point to the next element in the temp array inc rdi ;point to the next element in the base cmp rdi, qword ptr[r11] ;make sure we are not past the last element in the base jna MultLoop ;if we are good then jump to the head of the multiply loop mov qword ptr[r10+rcx*8], rdx ;otherwise move the carry from this iteration to the new current position in the temp array mov rdx, 0 ;clear rdx mov rdi, 1 ;point the index into the base at the first element inc rsi ;inc the index into the retval mov rcx, rsi ;point the index into the temp array to the same position as the index into retval cmp rsi, qword ptr[r8] ;make sure we are not past the last element of retval jna MultLoop ;if not jump to the head of the mult loop MultDone: mov rcx, r10 ;move the temp array into rcx CHECKFORZEROS ;check for zeros mov r10, r8 ;switch the two arrays, move the current retval into r10 mov r8, rax ;move the old temp array into retval NoMult2: cmp _var2, 0 ;check if the index into the bits array is 0 ja NotZ ;if its not jump to the notz cmp _var1, 1 ;otherwise make sure we are still above the least siginificant byte of the exponent jna GoOut ;if not then go out dec _var1 ;otherwise dec the index into the exponent mov _var2, 63 ;and move 63 into the index into bits jmp MainLoop ;jump to the head of the main loop NotZ: dec _var2 ;we still have bits to go sso decrement the index into the bits array jmp MainLoop ;jump to the head of the main loop ReturnVal1: ;the exponent must be equal to one so we return a copy of the base mov rax, qword ptr[r11] add rax, 5 mul _eight mov rcx, rax mov _r8, r8 mov _r9, r9 mov _r10, r10 mov _r11, r11 call GetMem64 ;call our getmem routine mov r11, _r11 mov r10, _r10 mov r9, _r9 mov r8, _r8 cmp rax, 0 je MallocError mov r8, rax mov r11, _v1 mov rcx, qword ptr[r11] inc rcx mov rsi, r11 mov rdi, r8 pushf cld rep movsq popf mov rax, qword ptr[r11] mov qword ptr[r8], rax jmp GoOut ReturnZero: ;the base must of been equal to 0 (and the exponent was greater than 0) so we return 0 mov rcx, 32 mov _r8, r8 mov _r9, r9 mov _r10, r10 mov _r11, r11 call GetMem64 ;call our getmem routine mov r11, _r11 mov r10, _r10 mov r9, _r9 mov r8, _r8 cmp rax, 0 je MallocError mov r8, rax mov qword ptr[r8], 1 mov qword ptr[r8+8], 0 mov _var4, 0 jmp GoOut ReturnOne: ;either the base was equal to 1 or the exponent was equal to 0 mov rcx, 32 mov _r8, r8 mov _r9, r9 mov _r10, r10 mov _r11, r11 call GetMem64 ;call our getmem routine mov r11, _r11 mov r10, _r10 mov r9, _r9 mov r8, _r8 cmp rax, 0 je MallocError mov r8, rax mov qword ptr[r8], 1 mov qword ptr[r8+8], 1 mov _var4, 0 jmp GoOut MallocError: ;if there was an error requesting memory try to get the error code, store it in the return value and send it back to managed code mov rcx, 32 mov _r8, r8 mov _r9, r9 mov _r10, r10 mov _r11, r11 call GetMem64 ;call our getmem routine mov r11, _r11 mov r10, _r10 mov r9, _r9 mov r8, _r8 cmp rax, 0 je GoOut mov r8, rax call GetLastError mov qword ptr[r8], 0 mov qword ptr[r8+8], 0 mov qword ptr[r8+16], rax GoOut: ;we are all done so we destroy the temp array if it was created then exit cmp r10, 0 je NotC mov rcx, r10 call Free64 NotC: mov rcx, _var4 mov rax, r8 EXITFUNC _zero, _one, _zero, _zero, _one, _one, _one, _one, _one, _one, _one, _zero, _one, _zero, _zero RET APow64 ENDP

Re: Question about volatile registers

And the following would be equivalent to function2:

Code:

GetMem64 PROC _tsize:qword, _ch:qword, _nr:qword ;using a management table ;each table is specific to a memory set ;each entry in the table will have the following pattern ; ;all unused block will have all words _zerod ; ;a used block will have words 0 and 1 set to the index of the first block shifted to the left 16 ;words 2 and 3 will hold how many blocks are being used shifted to the left 16 ENTERFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero ;we check to make sure everything is initialized mov _v1, rcx ;save the amount of memory requested mov _var9, rdx mov _var8, r8 ;now we check if we are supposed to use malloc or not, if we do, then call it and exit cmp _usemalloc, 0 ;if the _usemalloc variable is 0 we use this alg, if it is not 0, we use malloc to fullfill the request je UseMine call malloc ;the call to malloc cmp rax, 0 ;if rax contains 0 then there was an error in the call je MallocError EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero cmp _var9, 187679 jne ItsGood3 cmp r11, _var8 je ItsGood3 mov rax, rax ItsGood3: RET UseMine: cmp _initialized, 0 ;check if everything is allready initialized, if not then run initialization je CallInitialize mov rcx, _v1 ;restore the requested amount to rcx Allready_initialized: ;we start by dividing the requested amount by our allocation size to see how many units to look for (we increment it by one to cover the case it equals zero) ;then we set everything up by moving _memslot into r13, then moving the first set into r13, set rdi and rsi to 1 rcx to zero (rdi is the current position in the set ;rsi marks the first block of the set we are looking at and rcx holds the count) mov rax, rcx ;mov the amount requested into rax for the division xor rdx, rdx ;clear the rdx register for the division div _allocsize ;divide the requested amount by the allocation size, this tells us how many blocks to look for cmp rdx, 0 ;check if it was an exact division (no remainder) cmovne rdx, _one ;if there was a remainder then move 1 into rdx add rax, rdx ;this increments the number of blocks we are looking for by one if there was a remainder mov _t1, rax ;move the number of blocks we are looking for into t1 ;add _t1, 1 mov r13, _memslot ;move the memslot array into r13, this array holds the pointers to the arrays that hold the mapping tables (which tells us which blocks are free and whoch are taken) mov rdi, 1 ;we intially set to the first table, this will also be used for the pointer into the table mov rsi, 1 ;rsi will keep track of the first available block for when we are ready to start marking mov _t2, 1 ;the t2 variable keeps track of which table we are using so that we can return a pointer from the correct memptr table mov r13, qword ptr[r13+rdi*8] ;load the first memslot table into r13 xor rcx, rcx ;clear rcx, this keeps track of how many contiguous free blocks we find SearchLoop: ;look at the slot, if it is 0 then inc rcx, compare rcx to how many units we are looking for, if its equal then jump to found ;else we inc rdi make sure rdi isnt pointing past the end, then jump to the top to do it again. ;if rdi is pointing past the end, then we load _memslot back into r13, load rdi from _t2 (which is keeping track of which set) inc rdi, make sure its pointing to a valid set ;load r13 with the next set, set rdi and rsi to 1, rcx to 0 and jump tp searchloop ;if it is pointing past the end of the sets, then we jump to create a new set cmp qword ptr[r13+rdi*8], 0 ;check the element pointed to is zero, (0 means the block is free) jne ItsTaken ;if its allready being used go to the next section inc rcx ;if its not taken increment rcx to count another free block cmp rcx, _t1 ;compare rcx to t1 (the number of free blocks we are looking for) jnb Found ;if we found enough go to prep for marking the blocks inc rdi ;increment rdi to point to the next block cmp rdi, _numberofallocs ;compare rdi to the number of allocations we keep in the table jna SearchLoop ;if its not above we jump to the head of the search loop to check the next block inc _t2 ;if it is above we need to load the next table mov r13, _memslot ;load the memslot pointer into r13 mov rax, _t2 ;move the index to the next table into rax cmp rax, qword ptr[r13] ;check we are not past the last table in there ja CreateNewSet ;if we are jump to the create new set routine mov r13, qword ptr[r13+rax*8] ;move the next table into r13 mov rdi, 1 ;set the index to the first entry mov rsi, 1 ;set rsi to point to the first entry (rsi holds the index to the first free block) xor rcx, rcx ;clear rcx (the counter) jmp SearchLoop ;jump to the head of the search loop ItsTaken: movzx rax, word ptr[r13+rdi*8+2] ;since the value wasnt 0 we know the value contains the info we need to jump past the taken blocks, we move the first block in this taken set into rax movzx rdi, word ptr[r13+rdi*8+6] ;we move the number of blocks taken in this set into rdi add rdi, rax ;we add rax to rdi so that now rdi points to the first block past this taken set mov rsi, rdi ;we set rsi = rdi so that rsi keeps track of the first block xor rcx, rcx ;we clear rcx since we need contiguous blocks cmp rdi, _numberofallocs ;make sure wwe are not pointing past the end of the table jna SearchLoop ;if we are good jump to the head of the search loop inc _t2 ;if not increment the index into the memslot table mov r13, _memslot ;move the memslot table into r13 mov rax, _t2 ;move the index into rax cmp rax, qword ptr[r13] ;make sure we are not pointing past the last table ja CreateNewSet ;if we are jump to the create new set routine mov r13, qword ptr[r13+rax*8] ;move the next table into r13 mov rdi, 1 ;point to the first entry mov rsi, 1 ;set rsi to the first entry xor rcx, rcx ;clear the counter jmp SearchLoop ;jump to the head of the search loop Found: ;ok we found enough empty units, now we prep to mark them, rax:rdx need to be zero, this is what the cmpxchg command compares against the memory location ;mov the starting block (rsi) into rcx, and the number of blocks we are reserving into rbx and shift both to the left 16 so that it uses the 32 bit form of the command mov rax, 0 ;have to prep the compare and exchange each loop mov rdx, 0 mov rbx, _t1 ;move the number of blocks into rbx mov rcx, rsi ;move the index of the first block into rsi shl rbx, 16 ;we have to shift the value so the correct form of cmpexchg is used shl rcx, 16 ;we have to shift the value so the correct form of cmpexchg is used MarkLoop: ;use the lock prefix so that we have exclusive access to the memory location and execute the command, if the zero flag is not set then that means the location no longer ;contains zero (another thread must have reserved it allready) so jump to allready marked and find another set of blocks. If this command executes the first run, it is gauranteed ;to succeed on all the rest. After marking all blocks we are using then we load _memptr into r13, and rdi with _t2 (so we point to the correct set) and load that set into r13 ;then we multiply the number of blocks between 0 and the set we started at by the allocation size, add that to r13, and that gives us the begginging address of the ;memory we just reserved and thats what gets returned lock cmpxchg8b qword ptr[r13+rsi*8] ;lock the memory location and fire the cmpexchg instruction jnz AllreadyMarked ;if the zero flag is not set, then the exchange failed (if its going to fail it will only fail on the first time through) inc rsi ;increment the index cmp rsi, rdi ;compare it against rdi (where to stop) jna MarkLoop ;if its not above jump to the head of the mark loop mov r13, _memptr ;load the memptr table inot r13 mov rdi, _t2 ;load the index of the memslot table we found the free blocks in to rdi mov r13, qword ptr[r13+rdi*8] ;load the correct memptr into r13 (the one that corresponds to the memslot table where we found the free blocks) mov rax, rcx ;move the index to the first free block into rax shr rax, 16 ;we have to shift it back to the right (because we shifted it to the left above) mul _allocsize ;multiply it by the allocation size add rax, r13 ;add it to the base pointer (r13) this gives us the pointer to the first address of the memory we will return mov _v1, rax ;save the pointer for when we return ;this is a new section to protect against the possibility that this alg assigns some memory, then the _usemalloc variable gets changed. With this set up we will keep track of ;how many requests that we filled are still out. If the number is above 0 then the free alg will ALLWAYS start by checking if the memory being returned was filled from the memory ;we are handing out, if it is then we use our free alg to free it, if its not then it falls through to call the windows free routine. ;In order to insure that we allways have the correct count we will use the cmpexchg instruction with the lock prefix, there may be an issue since both this alg and the free alg ;will be locking and accessing the _fm variable, if it slows it down to much Ill either remove this section or try something different. IncCount: mov r13, 0 ;prep to increment the count, set r13 to zero mov rax, _fm ;move the value in _fm to rax mov rdx, rax ;copy rax to rdx shr rdx, 32 ;shift rdx to the right by 32 bits (_fm should be looked at like two dwords) and rax, 0ffffffffh ;and rax by 0ffffffffh to keep just the first 32 bits (rdx:rax now holds the value we will compare to) mov rbx, rax ;copy rax to rbx mov rcx, rdx ;copy rdx to rcx add ebx, 1 ;add one to ebx cmovc r13, _one ;if the carry flag is set then set r13 to 1 add ecx, r13d ;add r13d to ecx (r13d will be either 0 or 1) lock cmpxchg8b _fm ;lock and fire the instruction jnz IncCount ;if the zero flag is not set then another thread beat us to it so jump to IncCount and try again mov rax, _v1 ;move the pointer back into rax EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero cmp _var9, 187679 jne ItsGood2 cmp r11, _var8 je ItsGood2 mov rax, rax ItsGood2: RET CallInitialize: call Initialize ;needs to be initialized so call the initialization routine (done this way because all but the first time the jmp statement above will fall through) jmp Allready_initialized ;jmp back to the beggining AllreadyMarked: ;ok our marking of that address failed, on failure the command moves the value of the address into edx:eax, so as above we compute how many blocks to jump ;make sure it is not past the end of the set, then jump to the search loop again. If it is past the end, load _memslot into r13, _t2 into rdi, inc rdi ;make sure it is pointing to a valid set, if it is set rdi and rsi to 1 rcx to 0 and jump to the search loop. If it is not pointing to a valid set jump to create new set shr edx, 16 ;the cmpexchg failed when the instruction fails it loads what it found at the memory location into eax/edx shr eax, 16 ;both of which needs to be shifted to the right 16 bits xor rdi, rdi ;then eax gets added to edx and moved into rdi, this will point to the first block past this set add edi, eax add edi, edx xor rcx, rcx ;the count is cleared mov rsi, rdi ;mov rdi into rsi so the first block is kept track of cmp rdi, _numberofallocs ;make sure rdi doesnt point past the end of the table jna SearchLoop ;if its good jump to the head of the search loop inc _t2 ;if its not incerement the index into the memslot array mov r13, _memslot ;mov the memslot array into r13 mov rax, _t2 ;mov the index into rax cmp rax, qword ptr[r13] ;make sure the index isnt pointing past the last table ja CreateNewSet ;if it is create a new set mov r13, qword ptr[r13+rax*8] ;otherwise move the next memslot table into r13 mov rdi, 1 ;point rdi to the first block mov rsi, 1 ;point rsi to the first block xor rcx, rcx ;clear the counter jmp SearchLoop ;jump to the head of the search loop CreateNewSet: ;we want to make sure we are the only thread creating a new set so we use the cmpxchg command again but this time on a global variable _gm ;if we fail jump to start bottom. If we succeed, get enough memory for another _memptr set and store it in the memptr table (updating the count as we go) ;then we request enough for another memslot table, we zero out all the elements, then store it in the memslot table (updating the count as we go) ;then we point _t2 to the new set, set rdi and rsi to 1 rcx to 0 and jump to the search loop setting _gm to zero right before we leave mov rax, 0 ;have to prep the cmpexchg instruction mov rdx, 0 ;we want to make sure the _gm global variable is 0 (which means no other thread is currently in this section of code) mov rbx, 1 ;if it is 0 we want to change it to 1 mov rcx, 0 lock cmpxchg8b _gm ;lock and fire the instruction jnz StartBottom ;if the zero flag is not set then another thread is allready in this section of code, so exit this part and wait till it is finished mov rcx, _allocreqsize ;move the amount to request for the memptr entry into rcx call malloc ;get the memory from the syste, cmp rax, 0 ;make sure there wasnt an error je MallocError ;if there was jump to the error routine mov r13, _memptr ;move the memptr array into r13 mov rdi, qword ptr[r13] ;move the array count (element 0) into rdi inc rdi ;increment the element count mov qword ptr[r13], rdi ;store the new count in element 0 mov qword ptr[r13+rdi*8], rax ;store the pointer we just requested in the new position mov rcx, _tablereqsize ;move the size of the memslot table into rcx call malloc ;request the memory from the system cmp rax, 0 ;make sure there wasnt an error filling the request je MallocError ;if there was jump to the error routine mov r13, _memslot ;move the memslot array into r13 mov rdi, qword ptr[r13] ;move the array count into rdi inc rdi ;increment the array coutnt mov qword ptr[r13+rdi*8], rax ;store the memory we just requested in the array mov qword ptr[r13], rdi ;store the new count at element 0 mov r13, rax ;move the pointer to the new table into r13 mov rdi, rax ;move the pointer into rdi mov rcx, _numberofallocs ;move the number of allocations (table size) into rcx mov rax, 0 ;move zero into rax (this is what will be stored in all the elements in the new memslot table since none of the blocks have been assigned yet) pushf ;save the flags register cld ;clear the direction flag rep stosq ;store 0 in all the elements of the new table popf ;restore the flags register mov _gm, 0 ;clear the _gm variable xor rcx, rcx ;clear the count mov rdi, 1 ;point at the first block mov rsi, 1 ;set rsi to point to the first block jmp SearchLoop ;jump to the head of the search loop StartBottom: ;we didnt get exclusive access to that part of the routine, we know there are not enough blocks in any of the other sets for us, so we wait till ;the other thread is done creating and initializing everything, at which point we point to the new set, set rdi and rsi to 1 and rcx to 0 then jump to the search loop cmp _gm, 1 ;see if the thread that is setting up the new tables is still working on it je StartBottom ;if it is jump to the head of this loop again mov r13, _memslot ;load the memslot array into r13 mov rdi, qword ptr[r13] ;get the index of the last table (we woudnt be here unless we allready searched all the other tables) mov r13, qword ptr[r13+rdi*8] ;load the new table into r13 mov rdi, 1 ;point rdi to the first block mov rsi, 1 ;set rsi to point to the first block xor rcx, rcx ;clear the counter jmp SearchLoop ;jump to the head of the search loop EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero cmp _var9, 187679 jne ItsGood1 cmp r11, _var8 je ItsGood1 mov rax, rax ItsGood1: RET MallocError: mov rax, rax GetMem64 ENDP

Re: Question about volatile registers

sorry took 3 posts to get everything in:

Now there is a bit of extra code in there trying to debug this problem, hopefully its commented enough to understand a few things though:

1. EnterFunc/ExitFunc these are macros that set up each function that needs it, it save whatever registers it is told to (the _zero/_one after the name tells it which to save the order is rax, rbx, rcx, rdx, rdi, rsi, r8, r9, r10, r11, r12, r13, r14, r15, rsp). These macros also set up/clean up the stack frame. the enterfunc macro also reserves room for a bunch of temp variables if needed.

2. The variables that you dont see declared at the beggining of the function are text equates that point into the stack that was set up by the enterfunc/exitfunc macros.

3. you can tell where I have debugging code inserted, I mark those sections with extra blank spaces so that its easy to clean up.

4. In the APow64 function, immediately before and after the call to GetMem64 youll see me saving the registers and restoring them, this appears to work, if I comment them out then when it does throw an exception it will throw it at the very first instruction after the call GetMem64 that tries to use r11 as a pointer. When this happens, the value that it shows in r11 is Allways the same, I havent restarted my computer since last night, I was trying to get it to throw the exception when I saved the volatiles before the call, it didnt.

5. You can see the debugging code in GetMem64 that I use to check the value in r11 before it exits. This alg normally takes only one argument (the number of bytes being requested), and normally youd never see a reference to r11 in this algorithm, the only reason you see it now is so I can check it. I do not check r11 when it first enters the alg, I just check to make sure r11 still equals the value passed in (from APow64) to check it against.

6. I just looked at the way the code looks in here, it a lot easier to read in visual studio, sorry.

Re: Question about volatile registers

http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx

I keep coming back to this article describing register usage and looking at r10/r11

Quote:

R10:R11
Volatile
Must be preserved as needed by caller; used in syscall/sysret instructions

r10 and r11 are the only (integer) registers marked as must be saved by the caller. As I mentioned before if I change r11 to something like r15 int the calling function the problem disappears. Normally, I would just make the changes and move on, but if there is a problem in the alg I would like to find it and fix it. The wording in the referenced article has me wondering though:

Quote:

Volatile registers are scratch registers presumed by the caller to be destroyed across a call

the "across a call" wording sounds like they are trying to convey that those register may be destroyed by more then the called function, in which case restoring them before returning from the called function may not allways work (which goes hand in hand with the must be saved by caller statement in same article), the problem is I can find nothing on the internet to confirm or deny my thinking.

Any ideas?

Re: Question about volatile registers

I don't think I can really add anything substantial to the quite competent comments already posted by OReubens, except perhaps this:

Quote:

Originally Posted by AKRichard

1. EnterFunc/ExitFunc these are macros that set up each function that needs it, it save whatever registers it is told to (the _zero/_one after the name tells it which to save the order is rax, rbx, rcx, rdx, rdi, rsi, r8, r9, r10, r11, r12, r13, r14, r15, rsp). These macros also set up/clean up the stack frame. the enterfunc macro also reserves room for a bunch of temp variables if needed.

Already when reading http://forums.codeguru.com/showthrea...57#post2115357, I suspected some possible connection between macros/text equates and what looked like stack corruption (or mis-addressing data stored on the stack). Back then I abandoned that idea because I couldn't imagine how plain text replacement could cause something like that, but now I think it may be worth checking this direction. Be it just to confirm the macros are not to blame.

Re: Question about volatile registers

Quote:

Originally Posted by AKRichard

I dont remember what the mp was, it had the mp and a little bit of memory all on one chip, but it was part of a kit that had a piece of bread board, 2 of the 8 segment leds, a couple of slide switches and a couple of push button switches, you programmed the thing by punching in the op codes and hitting a button. Pretty cool kit at the time actually, if I remember right it only worked on 8 bits and I dont think it even had a div instruction. But I spent hours at that stupid thing wiring in seperate address decoders and different timing circuits. Today you can get a kit at radio shack that would blow it away, but at the time I was only 14 and was still a few months away from getting my first tandy. I still have the tandy by the way, one of the first color computers, it had a whole 8 colors and 64k memory woohooo lol.

I never had one myself, but I know these single-board gadgets with binary direct-to-memory entry (already hex was luxury back then :D) and 7-seg display. With interest I followed a series of magazine articles about a system one had to assemble themselves (best thing available was a kit IIRC) based on the SC/MP CPU. (In fact that was in that very magazine referred to at the end of the Wikipedia article.)

My first computer (not counting programmable calculators) also was Tandy. But it was a Model I Level II with no color at all, instead "true" black and wite, i.e. bi-level, and the entire screen's "graphics" resolution was less than today's highest resolution Windows application icons. :D

However, I always thought any CPU would have a NOP instruction. After all that was particularly useful with that sort of hackish direct-to-core code editing/debugging of these days.

Re: Question about volatile registers

Quote:

Already when reading http://forums.codeguru.com/showthrea...57#post2115357, I suspected some possible connection between macros/text equates and what looked like stack corruption (or mis-addressing data stored on the stack).

Funny you should mention that post, I just reread it the other day. As this project has evolved every once in a while Ill see a way to reduce its complexity, code an idea or two and test it out, if it works as thought, and add clarity to the code, then Ill implement it. The macros mentioned solved a few problems, configuring the stack correctly, and making sure every push had a corresponding pop (in the version before the macros I used pushes and pops all over the place which was hard to follow sometimes).

The macros (appear) to be running correctly, I didnt run into this problem until I tackled redesigning my memory management routine. I am finding out that everything runs stably until the management routine needs more memory from the system, every single time this problem pops up, it does so after it grabs more memory from the system (thats the CreateNewSet section of GetMem), and thats why I come here, I think I just figure it out. Ill try it and let you know.

Re: Question about volatile registers

So, in the middle of typing that last reply, when I got done typing that last bit about how it doent throw an exception till it requests more memory when I remembered something OReubens said:

Quote:

do note that r11 is not guaranteed preserved after a windows API call. so if your actual code calls windows API functions, then that may be your cause. Another possible error is as said the stack getting overwritten.

The GetMem routine never uses r11, so I didnt have the macro saving and restoring r11. I have very few functions that call any of the windows api's directly. Anyways, everything clicked when I was typing, when it creates a new set it makes 2 calls into malloc, r11 wasnt being saved since it wasnt used in the called function, the reason it only did it in multi threaded mode was because in single threaded mode it doesnt run out of memory, what it grabs during initialization is usually enough. The only part that I havent figured out yet is I can set a global variable (_usemalloc) to 1 and the GetMem function will call malloc every request it recieves without ever throwing an exception. The only thing I can figure is when I have it set that way the requests are relatively small (usually less than a few hundred bytes), but when my management routine makes a request, it asks for about 4.5 megs all at once so maybe malloc is calling into some other routine that makes a syscall/sysret.

So, I set the macro in the GetMem routine to save and restore r11, now it doesnt throw an exception, though it still messes up the function that called GetMem on the thread that created the new set (messes it up as in it throws an incorrect answer), though everything goes back to running smoothly right after that. My memory management routine is now running almost twice as fast as it was before I redid it.

Thanks guys