[RESOLVED] Question about volatile registers
Hello all,
Ive got an intermittent problem and think Ive finally narrowed it down. In my library almost every function saves every register that it uses that is not also used to pass parameters, then restores them before returning. I did this to make it easier when calling a function so that the caller could expect as little state change as possible. Sometimes though, one of the volatile registers (which was saved at the beggining of the function and restored) will return changed. This problem usually happens rarely (usually taking on the order of at least a few million calls to various functions), it happens in random places after calls to apparantly random functions and I am wondering, if the system were to switch contexts say right at the point where it is returning from a function call, would it change the volatile registers?
An example:
Code:
Function1 PROC
mov r11, rcx
call Function2
;<-----is it possible for r11 to change in this call even though function 2 restores r11 before returning?
add r11, r12
mov rax, r11
RET
Function1 endp
Function2 PROC
push r11
mov r11, rdx
mov rax, rcx
mul r11
pop r11
RET
Function2 endp
just some bs code to illustrate an example (I could give a concrete example, but that would be a lot more code that I thought would confuse the concept). Anyways, in the example above, r11 is used in both functions, function 2 saves r11 before using it and restores it before returning, now 99.99999999% of the time everything runs fine, but every once in a great while r11 will be changed when function1 uses it after returning from the call to function2 it will be changed.
Ive been able to trap this problem by checking r11, which in most of my functions holds a pointer to the first array passed in to a function, immediately after returning from a function call with something like this:
Code:
Function1 PROC
local _tr11:qword
mov r11, rcx
mov _tr11, r11
call Function2
;<-----is it possible for r11 to change in this call even though function 2 restores r11 before returning?
cmp r11, _tr11
je ItsGood
mov rax, rax
ItsGood:
add r11, r12
mov rax, r11
RET
Function1 endp
Function2 PROC
push r11
mov r11, rdx
mov rax, rcx
mul r11
pop r11
RET
Function2 endp
Ill put a breakpoint on the mov rax, rax instruction so that it runs fine unless it returns from the function call changed. I cant figure out HOW its getting changed though, so I was wondering, since the system considers r11 volatile across function calls, would it save r11 if a context switch happens after completing a call to a function but before dropping back in to the calling function?
Thanks in advance.
Re: Question about volatile registers
One more thing, it appears to happen in multi-threaded mode but not when run in single threaded mode (only about 95% sure of this at the moment though). Im still running it in different modes to see if I can narrow the problem down more.
Re: Question about volatile registers
That pretty much looks like plain stack corruption to me: When the saved R11 value gets popped back, the stack memory holding the value has been unintentionally modified in the meantime by whatever. Context switches should in any case be entirely transparent to application code, so I wouln't even think of blaming them unless I'd suspect the OS to be buggy.
BTW, why are you using a MOV RAX,RAX instruction? Doen't x64 feature a NOP?
Re: Question about volatile registers
Quote:
That pretty much looks like plain stack corruption to me:
I am not sure where its going wrong, all I can say so far is that r11 is correct before the call to function2, and that every once in a great while it gets corrupted between the function call and its return. I am working on putting code in to verify r11 before it leaves function2 now. I just cant see why it happens so rarely. In all of my algorithms the only pushes and pops are at the very beggining of the alg and right before the ret, I even use macros for it, that all happens during setup and clean up of the stack frame, so it would seem to me it should do it everytime it hit the function. I am also pretty sure it is only happening in multi threaded mode, but the only objects that should be shared between threads are a few global constants which are initialized during startup and then dont change. Anyways, I thought context switches should be transparent to the program but wasnt sure if there was some subtle detail about it wasnt aware of.
Quote:
BTW, why are you using a MOV RAX,RAX instruction? Doen't x64 feature a NOP?
yes there is, its just an old habit from my microprocessor class from 1982, the processor in our lab setup didnt have a nop instruction so I used to do something like it to achieve a nop and been stuck on stupid ever since.
I forgot to mention in the original post, if I change r11 to r15 (either in function 1 or in both functions) the problem goes away, thats why I phrased it as a question about voltile registers. If I use a non-volatile (such as r15) the problem goes away.
I just verified that r11 is correct when leaving function2. Somehow it is changing between leaving function2 but before it drops back in to function1. Any ideas what would cause that?
Re: Question about volatile registers
Quote:
Originally Posted by
AKRichard
yes there is, its just an old habit from my microprocessor class from 1982, the processor in our lab setup didnt have a nop instruction so I used to do something like it to achieve a nop and been stuck on stupid ever since.
Mostly out of curiosity, what MP was it you were using these days? I was mostly writing code for the Z80 in the early 80s, and that had the NOP encoded as 00h (just like the Intel 8080), a value that uninitialized RAM cells had on many hardware platforms back then. (Uuuh, nostalgia... :D) The x86 encoding of NOP, at leadt as far as 32-bit goes, is 90h, and I have a faint memory of there really being a reason behind that, I just don't recall what it was...
Any further reflection about your post not before I had some sleep... :)
Re: Question about volatile registers
I dont remember what the mp was, it had the mp and a little bit of memory all on one chip, but it was part of a kit that had a piece of bread board, 2 of the 8 segment leds, a couple of slide switches and a couple of push button switches, you programmed the thing by punching in the op codes and hitting a button. Pretty cool kit at the time actually, if I remember right it only worked on 8 bits and I dont think it even had a div instruction. But I spent hours at that stupid thing wiring in seperate address decoders and different timing circuits. Today you can get a kit at radio shack that would blow it away, but at the time I was only 14 and was still a few months away from getting my first tandy. I still have the tandy by the way, one of the first color computers, it had a whole 8 colors and 64k memory woohooo lol.
As to the original question, Ive found a few articles that mention r10, r11 specifically are used by the syscall/sysret instructions, and one article that mentions those two must be preserved by the caller, but it doesnt mention if thats if the called function uses it or f it should allways be saved. http://gcc.gnu.org/ml/gcc/2008-04/msg00550.html. Its not just any volatiles it is specifically r10/r11, if either of them are used in function1, then sooner or later it hits an exception
Re: Question about volatile registers
It shouldn't ever fail with THAT exact snipped of code you posted. The only time it may not is if you also have a hardware interrupt handler that doesn't restore r11. (but then I'd expect you to see a whole lot of other issues as well).
If a context switch would cause it, this would wreck massive issues in your app as a whole.
If you're getting it with another bit of code which you didn't post. then... only you can tell...
do note that r11 is not guaranteed preserved after a windows API call. so if your actual code calls windows API functions, then that may be your cause. Another possible error is as said the stack getting overwritten.
(r15 is preserved through api calls btw, which could explain why changing your code to r15 solves the problem).
Re: Question about volatile registers
Ok, I have to break this post in two parts (it said Im almost double what it allows to post).
thats not the exact code, but illustrates the idea, the exact code where the exception pops up (because r11 no longer holds my pointer) is: this would be equivalent to function1 above:
Code:
APow64 PROC base:PTR qword, exp:PTR qword
;saves the registers we use and adjusts the stack pointer
ENTERFUNC _zero, _one, _zero, _zero, _one, _one, _one, _one, _one, _one, _one, _zero, _one, _zero, _zero
;prep by saving the base, exponent, and decimal place, moving rcx into r11, rdx into r12, clearing r8 and r10, then begginging our checks
mov _v1, rcx ;save the base
mov _v2, rdx ;save the exponent
mov _var3, rax ;saves the decimal point (this will be kept as the original decimal place variable)
mov _var4, rax ;save the decimal point again (this will keep the running total for where the decimal point is)
mov r11, rcx ;move the base into r11
mov r12, rdx ;move the exponent into r12
mov r8, 0 ;clear r8
mov r10, 0 ;clear r10
cmp qword ptr[r12], 1 ;check if the exponents array has only one element
ja NextCheck ;if its more than one jump to the next check because it has to be greater than one
cmp qword ptr[r12], 0 ;check if the element count is 0 for the exponent
je ReturnOne ;technically this should be an error since no array should have an element count of 0, but we treat as if the exponent is equal to 0
cmp qword ptr[r12+8], 0 ;checck the first value element for 0
je ReturnOne ;jump to return one if it does
cmp qword ptr[r12+8], 1 ;check if the first vaalue element equals 1
je ReturnVal1 ;jump to returnval1 if it does (this will return a copy of the base)
NextCheck:
;here we check the base for the same values we checked for the exponent
cmp qword ptr[r11], 0 ;check if the element count is 0 (again, this should be an error but we treat it as a zero)
je ReturnZero ;jump to returnzero if it does
cmp qword ptr[r11], 1 ;check if the element count is one
ja MoveOn ;cancel the rest of the checks if it is above
cmp qword ptr[r11+8], 0 ;compare the first value element for 0
je ReturnZero ;jump to returnzero if it equals
cmp qword ptr[r11+8], 1 ;compare the first value element for 1
je ReturnOne ;jump to return one if it does
MoveOn:
;first thing to do after the checks is to count how many elements for the answer and temp arrays we need. I have special versions of the multiply and square routines that do not create
;a return array (they use the temp array that we create). Then is a lot faster then creating and destroying the arrays with each pass of the alg. We overestimate the element count
;by taking the number of elements in the base to start the running total at, each iteration we double the running total and add one, if the current bit in the exponent is 1
;we add the base element count to the running total and add one. In the actual alg that extra byte will not allways be used. To start we move the exponent element count into rdi and save it in _var1
;then we count the number of bits in the high byte of the exponent, load the bits array into r9 (this contains the powers of 2 from a power of 0 to a power of 63), move the element count of the
;base into rax, increase that count a little to give ourselves some room, make sure the bit count in the high byte of the exponent was not 0 (if its not we jump to decandgo if it is we
;decrement the index into the exponent, make sure the index is not 0, if it is the count is done, if not store 64 in rsiand move on)
mov rdi, qword ptr[r12] ;move the element count of the exponent into rdi
mov _var1, rdi ;save the element count in _var1
bsr rsi, qword ptr[r12+rdi*8] ;count the number of bits in the high byte of the exponent
mov _var2, rsi ;save the bit count in _var2
lea r9, _bits ;load the address to the bits array into r9 (the bits array have the powers of two)
mov rax, qword ptr[r11] ;move the element count of the base into rax
add rax, 10 ;increase the element count some to give ourselves some room
cmp rsi, 0 ;compare rsi to 0
ja DecAndGo ;if its above jump to decandgo
cmp rdi, 1 ;if rsi is 0 compare rdi to 1
je CountDone ;if rsi equals 1 then the count is done
mov rsi, 64 ;otherwise store 64 in rsi
dec rdi ;decrement the index into the exponent
mov _var1, rdi ;save the index in _var2
DecAndGo:
dec rsi ;we need to decrement the index into the bits array by 1 because the most significant bit is allready counted
mov _var2, rsi ;store the index into the bits array in _var2
CountLoop:
add rax, rax ;double rax
inc rax ;add one to rax
mov rcx, qword ptr[r12+rdi*8] ;move the byte of the exponent we are working on into rcx
and rcx, qword ptr[r9+rsi*8] ;and the byte will the value stored in the bits array pointed to by rsi
jz NoMult1 ;if the result was 0 then dont add the base element count
add rax, qword ptr[r11] ;add the element count of the base to rax
inc rax ;add one to rax
NoMult1:
cmp rsi, 0 ;compare the index into the bits array to 0
ja NotZero ;if its above 0 jump to notzero
cmp rdi, 1 ;otherwise to move to the next byte in the exponent, compare the index into the exponent to 1
jna CountDone ;if its not above the count is done
mov rsi, 63 ;otherwise set the index into the bits array to 63
dec rdi ;decrement the index into the exponent
jmp CountLoop ;jump to the head of the count loop
NotZero:
dec rsi ;the check above meant we still have bits in the current byte to go through so decrement the index to bits
jmp CountLoop ;jump to the head of the count loop
CountDone:
;now that we know how many elements we need for the arrays, we multiply the count by 8 (8 bytes in 64 bits), then make 2 calls to getmem and copy the base to one of them.
mov _t1, rax ;move the result of the count into _t1
mul _eight ;multiply the result by 8 for 64 bits
mov rcx, rax ;move the final result into rcx
mov _var7, r8 ;temporarily here for debugging
mov _t9, r11 ;temporarily here for debugging
mov r8, r11 ;temporarily here for debugging
mov rdx, 187679 ;temporarily here for debugging
mov _r8, r8 ;this is here trying to fix the exception
mov _r9, r9 ;this is here trying to fix the exception
mov _r10, r10 ;this is here trying to fix the exception
mov _r11, r11 ;this is here trying to fix the exception
call GetMem64 ;call our getmem routine
mov r11, _r11 ;this is here trying to fix the exception
mov r10, _r10 ;this is here trying to fix the exception
mov r9, _r9 ;this is here trying to fix the exception
mov r8, _r8 ;this is here trying to fix the exception
cmp r11, _t9 ;temporarily here for debugging
je NoProb ;temporarily here for debugging
mov rax, rax ;temporarily here for debugging
NoProb: ;temporarily here for debugging
cmp r11, _v1 ;temporarily here for debugging
je NoProbA ;temporarily here for debugging
mov rax, rax ;temporarily here for debugging
NoProbA: ;temporarily here for debugging
mov r8, _var7
cmp rax, 0 ;make sure what was returned from the routine is valid
je MallocError ;if its not jump to mallocerror
mov _retval, rax ;move the pointer into _retval
mov rcx, qword ptr[r11] ;move the element count of the base into rcx
inc rcx ;increment the count by one for the element count at position 0
mov rdi, rax ;move the pointer we just got into rdi
mov rsi, _v1 ;move the pointer for the base into rsi
pushf ;save the flags register
cld ;clear the direction flag
rep movsq ;copy all the elements from the base into the array we just got
popf ;restore the flags register
mov rax, _t1 ;move the size of our return array into rax
mul _eight ;multiply it by 8 for the 64 bits
mov rcx, rax ;move the result into rcx
mov _var7, r8 ;temporarily here for debugging
mov _rdx, 187679 ;temporarily here for debugging
mov r8, r11 ;temporarily here for debugging
mov _r8, r8 ;this is here trying to fix the exception
mov _r9, r9 ;this is here trying to fix the exception
mov _r10, r10 ;this is here trying to fix the exception
mov _r11, r11 ;this is here trying to fix the exception
call GetMem64 ;call our getmem routine
mov r11, _r11 ;this is here trying to fix the exception
mov r10, _r10 ;this is here trying to fix the exception
mov r9, _r9 ;this is here trying to fix the exception
mov r8, _r8 ;this is here trying to fix the exception
cmp r11, _t9 ;temporarily here for debugging
je NoProb2 ;temporarily here for debugging
mov rax, rax ;temporarily here for debugging
NoProb2: ;temporarily here for debugging
cmp r11, _v1 ;temporarily here for debugging
je NoProb2A ;temporarily here for debugging
mov rax, rax ;temporarily here for debugging
NoProb2A: ;temporarily here for debugging
mov r8, _var7
cmp rax, 0 ;make sure we got a valid pointer back
je MallocError ;if not jump to mallocerror
mov r10, rax ;move the pointer into the r10 register
mov _t9, r10 ;save the pointer in _t9
mov r8, _retval ;move _retval into r8
MainLoop:
;this is just the standard repeated square and multiply alg. On each iteration we will square the _retval (kept in r8), the special square alg will take r8 and store the result
;in the array into r10, next we check the current bit of the current byte in the exponent for 1, if it is 1 then we multiply the result from the squaring by the base.
mov rax, _var4 ;move the running total for the decimal point into rax
add _var4, rax ;add rax back into _var4 to double it
mov rdi, r10 ;move the pointer to the temp array into rdi
mov rcx, qword ptr[r8] ;move the element count in _retval into rcx
add rcx, rcx ;add rcx to itself to double it
inc rcx ;make room for the element count
inc rcx ;give it one extra byte in case there is still a carry to do after the final multiply
mov qword ptr[r10], rcx ;rcx has our (possible) new element count for the result of the squaring, so store it at position 0 of the temp array
mov rax, 0 ;move 0 into rax
add rdi, 8 ;add 8 to rdi so that it points to the element at position one of the temp array
pushf ;save the flags register
cld ;clear the direction flag
rep stosq ;store 0 in all the value elements
popf ;restore the flags register
mov rdi, 1 ;set rdi to 1
mov rsi, 1 ;set rsi to 1
mov rcx, 1 ;set rcx to 1
mov rdx, 0 ;clear rdx
SqLoop:
mov rbx, rdx ;rdx holds if there was a carry from the last iteration, save it inrbx
mov rax, qword ptr[r8+rdi*8] ;move the current byte of the result array into rax
mul qword ptr[r8+rsi*8] ;multiply by the other current byte of the result array
add qword ptr[r10+rcx*8], rax ;add the result to the current position of the temp array
adc rdx, 0 ;if there was a carry from the add, this will add one to the carry from the multiply
add qword ptr[r10+rcx*8], rbx ;add the carry from the last iteration to the current position in the temp array
adc rdx, 0 ;if there was a carry from the add, this will add one to the carry from the multiply (only one at most of these adc instructions will inc rdx)
inc rcx ;inc our index into the temp array
inc rdi ;inc our first position in the _retval
cmp rdi, qword ptr[r8] ;compare rdi to the element count
jna SqLoop ;if its not larger than jump to the head of the sqloop
mov qword ptr[r10+rcx*8], rdx ;the first position reached the end of retval so store the carry from the last iteration in the next element of the temp array (we add it here just in case the second position of the retval has been reached)
mov rdx, 0 ;zero out the return so we dont add it again
mov rdi, 1 ;point the first index into retval to the first element
inc rsi ;inc the second index into retval
mov rcx, rsi ;point the index into the temp array to the same position as the second index into the retval
cmp rsi, qword ptr[r8] ;make sure the second index is not past the last element
jna SqLoop ;if its not jump to the head of the square loop
SqDone:
mov rdi, qword ptr[r10] ;move the element count of the temp array into rdi
ZCheck1:
cmp qword ptr[r10+rdi*8], 0 ;compare the current element we are pointing at to 0
ja ZCheck1Done ;if its above jump out
cmp rdi, 1 ;otherwise compare the index into the array to one
jna ZCheck1Done ;if its not above jump out
dec rdi ;otherwise decrement it by one
mov qword ptr[r10], rdi ;and store it back in position 0 of the temp array
jmp ZCheck1 ;jump to the head of the zero check loop
ZCheck1Done:
mov rax, r10 ;we switch the arrays so that the current result is allways in r8 mmove r10 into rax
mov r10, r8 ;move r8 into r10
mov r8, rax ;move rax into r8
mov rdi, _var1 ;move the index into the exponent into rdi
mov rsi, _var2 ;move the index into the bits array into rsi
mov rax, qword ptr[r12+rdi*8] ;move the current byte of the exponent into rax
and rax, qword ptr[r9+rsi*8] ;and rax with the value pointed to in bits
jz NoMult2 ;if the result was 0 then jump past the multipliaction loop
mov rcx, qword ptr[r8] ;otherwise move the element count of retval into rcx
add rcx, qword ptr[r11] ;add the element count of the base to rcx
inc rcx ;inc the count for the element count to store in the temp array
inc rcx ;inc the element count in case the result of the multiply goes over
mov qword ptr[r10], rcx ;store the calcd element count at position 0 of the temp array
mov rdi, r10 ;move the pointer of the temp array into rdi
add rdi, 8 ;add 8 to rdi so that we point to the element at position 1
mov rax, 0 ;move 0 into rax
pushf ;save the flags register
cld ;clear the direction flag
rep stosq ;clear all the elements in the temp array
popf ;restore the flags register
mov rsi, 1 ;point to the first element of the retval
mov rdi, 1 ;point to the first element of the base
mov rcx, 1 ;point to the first element of the temp array
mov rax, _var3 ;move the original decimal point into rax
add _var4, rax ;add it to the running total
mov rdx, 0 ;clear rdx
MultLoop:
mov rbx, rdx ;again rdx holds the carry from the last iteration so save it in rbx
mov rax, qword ptr[r8+rsi*8] ;move the current element of retval into rax
mul qword ptr[r11+rdi*8] ;multiply by the current element in the base
add qword ptr[r10+rcx*8], rax ;add the result to the current element of the temp array
adc rdx, 0 ;if the last add resulted in a carry this will add it to the carry from the multiplication
add qword ptr[r10+rcx*8], rbx ;add the carry from the last iteration to the current element in the temp array
adc rdx, 0 ;if the last add resulted in a carry than add it to the carry from the multiply (only one of the adc instructions will inc rdx)
inc rcx ;point to the next element in the temp array
inc rdi ;point to the next element in the base
cmp rdi, qword ptr[r11] ;make sure we are not past the last element in the base
jna MultLoop ;if we are good then jump to the head of the multiply loop
mov qword ptr[r10+rcx*8], rdx ;otherwise move the carry from this iteration to the new current position in the temp array
mov rdx, 0 ;clear rdx
mov rdi, 1 ;point the index into the base at the first element
inc rsi ;inc the index into the retval
mov rcx, rsi ;point the index into the temp array to the same position as the index into retval
cmp rsi, qword ptr[r8] ;make sure we are not past the last element of retval
jna MultLoop ;if not jump to the head of the mult loop
MultDone:
mov rcx, r10 ;move the temp array into rcx
CHECKFORZEROS ;check for zeros
mov r10, r8 ;switch the two arrays, move the current retval into r10
mov r8, rax ;move the old temp array into retval
NoMult2:
cmp _var2, 0 ;check if the index into the bits array is 0
ja NotZ ;if its not jump to the notz
cmp _var1, 1 ;otherwise make sure we are still above the least siginificant byte of the exponent
jna GoOut ;if not then go out
dec _var1 ;otherwise dec the index into the exponent
mov _var2, 63 ;and move 63 into the index into bits
jmp MainLoop ;jump to the head of the main loop
NotZ:
dec _var2 ;we still have bits to go sso decrement the index into the bits array
jmp MainLoop ;jump to the head of the main loop
ReturnVal1:
;the exponent must be equal to one so we return a copy of the base
mov rax, qword ptr[r11]
add rax, 5
mul _eight
mov rcx, rax
mov _r8, r8
mov _r9, r9
mov _r10, r10
mov _r11, r11
call GetMem64 ;call our getmem routine
mov r11, _r11
mov r10, _r10
mov r9, _r9
mov r8, _r8
cmp rax, 0
je MallocError
mov r8, rax
mov r11, _v1
mov rcx, qword ptr[r11]
inc rcx
mov rsi, r11
mov rdi, r8
pushf
cld
rep movsq
popf
mov rax, qword ptr[r11]
mov qword ptr[r8], rax
jmp GoOut
ReturnZero:
;the base must of been equal to 0 (and the exponent was greater than 0) so we return 0
mov rcx, 32
mov _r8, r8
mov _r9, r9
mov _r10, r10
mov _r11, r11
call GetMem64 ;call our getmem routine
mov r11, _r11
mov r10, _r10
mov r9, _r9
mov r8, _r8
cmp rax, 0
je MallocError
mov r8, rax
mov qword ptr[r8], 1
mov qword ptr[r8+8], 0
mov _var4, 0
jmp GoOut
ReturnOne:
;either the base was equal to 1 or the exponent was equal to 0
mov rcx, 32
mov _r8, r8
mov _r9, r9
mov _r10, r10
mov _r11, r11
call GetMem64 ;call our getmem routine
mov r11, _r11
mov r10, _r10
mov r9, _r9
mov r8, _r8
cmp rax, 0
je MallocError
mov r8, rax
mov qword ptr[r8], 1
mov qword ptr[r8+8], 1
mov _var4, 0
jmp GoOut
MallocError:
;if there was an error requesting memory try to get the error code, store it in the return value and send it back to managed code
mov rcx, 32
mov _r8, r8
mov _r9, r9
mov _r10, r10
mov _r11, r11
call GetMem64 ;call our getmem routine
mov r11, _r11
mov r10, _r10
mov r9, _r9
mov r8, _r8
cmp rax, 0
je GoOut
mov r8, rax
call GetLastError
mov qword ptr[r8], 0
mov qword ptr[r8+8], 0
mov qword ptr[r8+16], rax
GoOut:
;we are all done so we destroy the temp array if it was created then exit
cmp r10, 0
je NotC
mov rcx, r10
call Free64
NotC:
mov rcx, _var4
mov rax, r8
EXITFUNC _zero, _one, _zero, _zero, _one, _one, _one, _one, _one, _one, _one, _zero, _one, _zero, _zero
RET
APow64 ENDP
Re: Question about volatile registers
And the following would be equivalent to function2:
Code:
GetMem64 PROC _tsize:qword, _ch:qword, _nr:qword
;using a management table
;each table is specific to a memory set
;each entry in the table will have the following pattern
;
;all unused block will have all words _zerod
;
;a used block will have words 0 and 1 set to the index of the first block shifted to the left 16
;words 2 and 3 will hold how many blocks are being used shifted to the left 16
ENTERFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
;we check to make sure everything is initialized
mov _v1, rcx ;save the amount of memory requested
mov _var9, rdx
mov _var8, r8
;now we check if we are supposed to use malloc or not, if we do, then call it and exit
cmp _usemalloc, 0 ;if the _usemalloc variable is 0 we use this alg, if it is not 0, we use malloc to fullfill the request
je UseMine
call malloc ;the call to malloc
cmp rax, 0 ;if rax contains 0 then there was an error in the call
je MallocError
EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
cmp _var9, 187679
jne ItsGood3
cmp r11, _var8
je ItsGood3
mov rax, rax
ItsGood3:
RET
UseMine:
cmp _initialized, 0 ;check if everything is allready initialized, if not then run initialization
je CallInitialize
mov rcx, _v1 ;restore the requested amount to rcx
Allready_initialized:
;we start by dividing the requested amount by our allocation size to see how many units to look for (we increment it by one to cover the case it equals zero)
;then we set everything up by moving _memslot into r13, then moving the first set into r13, set rdi and rsi to 1 rcx to zero (rdi is the current position in the set
;rsi marks the first block of the set we are looking at and rcx holds the count)
mov rax, rcx ;mov the amount requested into rax for the division
xor rdx, rdx ;clear the rdx register for the division
div _allocsize ;divide the requested amount by the allocation size, this tells us how many blocks to look for
cmp rdx, 0 ;check if it was an exact division (no remainder)
cmovne rdx, _one ;if there was a remainder then move 1 into rdx
add rax, rdx ;this increments the number of blocks we are looking for by one if there was a remainder
mov _t1, rax ;move the number of blocks we are looking for into t1
;add _t1, 1
mov r13, _memslot ;move the memslot array into r13, this array holds the pointers to the arrays that hold the mapping tables (which tells us which blocks are free and whoch are taken)
mov rdi, 1 ;we intially set to the first table, this will also be used for the pointer into the table
mov rsi, 1 ;rsi will keep track of the first available block for when we are ready to start marking
mov _t2, 1 ;the t2 variable keeps track of which table we are using so that we can return a pointer from the correct memptr table
mov r13, qword ptr[r13+rdi*8] ;load the first memslot table into r13
xor rcx, rcx ;clear rcx, this keeps track of how many contiguous free blocks we find
SearchLoop:
;look at the slot, if it is 0 then inc rcx, compare rcx to how many units we are looking for, if its equal then jump to found
;else we inc rdi make sure rdi isnt pointing past the end, then jump to the top to do it again.
;if rdi is pointing past the end, then we load _memslot back into r13, load rdi from _t2 (which is keeping track of which set) inc rdi, make sure its pointing to a valid set
;load r13 with the next set, set rdi and rsi to 1, rcx to 0 and jump tp searchloop
;if it is pointing past the end of the sets, then we jump to create a new set
cmp qword ptr[r13+rdi*8], 0 ;check the element pointed to is zero, (0 means the block is free)
jne ItsTaken ;if its allready being used go to the next section
inc rcx ;if its not taken increment rcx to count another free block
cmp rcx, _t1 ;compare rcx to t1 (the number of free blocks we are looking for)
jnb Found ;if we found enough go to prep for marking the blocks
inc rdi ;increment rdi to point to the next block
cmp rdi, _numberofallocs ;compare rdi to the number of allocations we keep in the table
jna SearchLoop ;if its not above we jump to the head of the search loop to check the next block
inc _t2 ;if it is above we need to load the next table
mov r13, _memslot ;load the memslot pointer into r13
mov rax, _t2 ;move the index to the next table into rax
cmp rax, qword ptr[r13] ;check we are not past the last table in there
ja CreateNewSet ;if we are jump to the create new set routine
mov r13, qword ptr[r13+rax*8] ;move the next table into r13
mov rdi, 1 ;set the index to the first entry
mov rsi, 1 ;set rsi to point to the first entry (rsi holds the index to the first free block)
xor rcx, rcx ;clear rcx (the counter)
jmp SearchLoop ;jump to the head of the search loop
ItsTaken:
movzx rax, word ptr[r13+rdi*8+2] ;since the value wasnt 0 we know the value contains the info we need to jump past the taken blocks, we move the first block in this taken set into rax
movzx rdi, word ptr[r13+rdi*8+6] ;we move the number of blocks taken in this set into rdi
add rdi, rax ;we add rax to rdi so that now rdi points to the first block past this taken set
mov rsi, rdi ;we set rsi = rdi so that rsi keeps track of the first block
xor rcx, rcx ;we clear rcx since we need contiguous blocks
cmp rdi, _numberofallocs ;make sure wwe are not pointing past the end of the table
jna SearchLoop ;if we are good jump to the head of the search loop
inc _t2 ;if not increment the index into the memslot table
mov r13, _memslot ;move the memslot table into r13
mov rax, _t2 ;move the index into rax
cmp rax, qword ptr[r13] ;make sure we are not pointing past the last table
ja CreateNewSet ;if we are jump to the create new set routine
mov r13, qword ptr[r13+rax*8] ;move the next table into r13
mov rdi, 1 ;point to the first entry
mov rsi, 1 ;set rsi to the first entry
xor rcx, rcx ;clear the counter
jmp SearchLoop ;jump to the head of the search loop
Found:
;ok we found enough empty units, now we prep to mark them, rax:rdx need to be zero, this is what the cmpxchg command compares against the memory location
;mov the starting block (rsi) into rcx, and the number of blocks we are reserving into rbx and shift both to the left 16 so that it uses the 32 bit form of the command
mov rax, 0 ;have to prep the compare and exchange each loop
mov rdx, 0
mov rbx, _t1 ;move the number of blocks into rbx
mov rcx, rsi ;move the index of the first block into rsi
shl rbx, 16 ;we have to shift the value so the correct form of cmpexchg is used
shl rcx, 16 ;we have to shift the value so the correct form of cmpexchg is used
MarkLoop:
;use the lock prefix so that we have exclusive access to the memory location and execute the command, if the zero flag is not set then that means the location no longer
;contains zero (another thread must have reserved it allready) so jump to allready marked and find another set of blocks. If this command executes the first run, it is gauranteed
;to succeed on all the rest. After marking all blocks we are using then we load _memptr into r13, and rdi with _t2 (so we point to the correct set) and load that set into r13
;then we multiply the number of blocks between 0 and the set we started at by the allocation size, add that to r13, and that gives us the begginging address of the
;memory we just reserved and thats what gets returned
lock cmpxchg8b qword ptr[r13+rsi*8] ;lock the memory location and fire the cmpexchg instruction
jnz AllreadyMarked ;if the zero flag is not set, then the exchange failed (if its going to fail it will only fail on the first time through)
inc rsi ;increment the index
cmp rsi, rdi ;compare it against rdi (where to stop)
jna MarkLoop ;if its not above jump to the head of the mark loop
mov r13, _memptr ;load the memptr table inot r13
mov rdi, _t2 ;load the index of the memslot table we found the free blocks in to rdi
mov r13, qword ptr[r13+rdi*8] ;load the correct memptr into r13 (the one that corresponds to the memslot table where we found the free blocks)
mov rax, rcx ;move the index to the first free block into rax
shr rax, 16 ;we have to shift it back to the right (because we shifted it to the left above)
mul _allocsize ;multiply it by the allocation size
add rax, r13 ;add it to the base pointer (r13) this gives us the pointer to the first address of the memory we will return
mov _v1, rax ;save the pointer for when we return
;this is a new section to protect against the possibility that this alg assigns some memory, then the _usemalloc variable gets changed. With this set up we will keep track of
;how many requests that we filled are still out. If the number is above 0 then the free alg will ALLWAYS start by checking if the memory being returned was filled from the memory
;we are handing out, if it is then we use our free alg to free it, if its not then it falls through to call the windows free routine.
;In order to insure that we allways have the correct count we will use the cmpexchg instruction with the lock prefix, there may be an issue since both this alg and the free alg
;will be locking and accessing the _fm variable, if it slows it down to much Ill either remove this section or try something different.
IncCount:
mov r13, 0 ;prep to increment the count, set r13 to zero
mov rax, _fm ;move the value in _fm to rax
mov rdx, rax ;copy rax to rdx
shr rdx, 32 ;shift rdx to the right by 32 bits (_fm should be looked at like two dwords)
and rax, 0ffffffffh ;and rax by 0ffffffffh to keep just the first 32 bits (rdx:rax now holds the value we will compare to)
mov rbx, rax ;copy rax to rbx
mov rcx, rdx ;copy rdx to rcx
add ebx, 1 ;add one to ebx
cmovc r13, _one ;if the carry flag is set then set r13 to 1
add ecx, r13d ;add r13d to ecx (r13d will be either 0 or 1)
lock cmpxchg8b _fm ;lock and fire the instruction
jnz IncCount ;if the zero flag is not set then another thread beat us to it so jump to IncCount and try again
mov rax, _v1 ;move the pointer back into rax
EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
cmp _var9, 187679
jne ItsGood2
cmp r11, _var8
je ItsGood2
mov rax, rax
ItsGood2:
RET
CallInitialize:
call Initialize ;needs to be initialized so call the initialization routine (done this way because all but the first time the jmp statement above will fall through)
jmp Allready_initialized ;jmp back to the beggining
AllreadyMarked:
;ok our marking of that address failed, on failure the command moves the value of the address into edx:eax, so as above we compute how many blocks to jump
;make sure it is not past the end of the set, then jump to the search loop again. If it is past the end, load _memslot into r13, _t2 into rdi, inc rdi
;make sure it is pointing to a valid set, if it is set rdi and rsi to 1 rcx to 0 and jump to the search loop. If it is not pointing to a valid set jump to create new set
shr edx, 16 ;the cmpexchg failed when the instruction fails it loads what it found at the memory location into eax/edx
shr eax, 16 ;both of which needs to be shifted to the right 16 bits
xor rdi, rdi ;then eax gets added to edx and moved into rdi, this will point to the first block past this set
add edi, eax
add edi, edx
xor rcx, rcx ;the count is cleared
mov rsi, rdi ;mov rdi into rsi so the first block is kept track of
cmp rdi, _numberofallocs ;make sure rdi doesnt point past the end of the table
jna SearchLoop ;if its good jump to the head of the search loop
inc _t2 ;if its not incerement the index into the memslot array
mov r13, _memslot ;mov the memslot array into r13
mov rax, _t2 ;mov the index into rax
cmp rax, qword ptr[r13] ;make sure the index isnt pointing past the last table
ja CreateNewSet ;if it is create a new set
mov r13, qword ptr[r13+rax*8] ;otherwise move the next memslot table into r13
mov rdi, 1 ;point rdi to the first block
mov rsi, 1 ;point rsi to the first block
xor rcx, rcx ;clear the counter
jmp SearchLoop ;jump to the head of the search loop
CreateNewSet:
;we want to make sure we are the only thread creating a new set so we use the cmpxchg command again but this time on a global variable _gm
;if we fail jump to start bottom. If we succeed, get enough memory for another _memptr set and store it in the memptr table (updating the count as we go)
;then we request enough for another memslot table, we zero out all the elements, then store it in the memslot table (updating the count as we go)
;then we point _t2 to the new set, set rdi and rsi to 1 rcx to 0 and jump to the search loop setting _gm to zero right before we leave
mov rax, 0 ;have to prep the cmpexchg instruction
mov rdx, 0 ;we want to make sure the _gm global variable is 0 (which means no other thread is currently in this section of code)
mov rbx, 1 ;if it is 0 we want to change it to 1
mov rcx, 0
lock cmpxchg8b _gm ;lock and fire the instruction
jnz StartBottom ;if the zero flag is not set then another thread is allready in this section of code, so exit this part and wait till it is finished
mov rcx, _allocreqsize ;move the amount to request for the memptr entry into rcx
call malloc ;get the memory from the syste,
cmp rax, 0 ;make sure there wasnt an error
je MallocError ;if there was jump to the error routine
mov r13, _memptr ;move the memptr array into r13
mov rdi, qword ptr[r13] ;move the array count (element 0) into rdi
inc rdi ;increment the element count
mov qword ptr[r13], rdi ;store the new count in element 0
mov qword ptr[r13+rdi*8], rax ;store the pointer we just requested in the new position
mov rcx, _tablereqsize ;move the size of the memslot table into rcx
call malloc ;request the memory from the system
cmp rax, 0 ;make sure there wasnt an error filling the request
je MallocError ;if there was jump to the error routine
mov r13, _memslot ;move the memslot array into r13
mov rdi, qword ptr[r13] ;move the array count into rdi
inc rdi ;increment the array coutnt
mov qword ptr[r13+rdi*8], rax ;store the memory we just requested in the array
mov qword ptr[r13], rdi ;store the new count at element 0
mov r13, rax ;move the pointer to the new table into r13
mov rdi, rax ;move the pointer into rdi
mov rcx, _numberofallocs ;move the number of allocations (table size) into rcx
mov rax, 0 ;move zero into rax (this is what will be stored in all the elements in the new memslot table since none of the blocks have been assigned yet)
pushf ;save the flags register
cld ;clear the direction flag
rep stosq ;store 0 in all the elements of the new table
popf ;restore the flags register
mov _gm, 0 ;clear the _gm variable
xor rcx, rcx ;clear the count
mov rdi, 1 ;point at the first block
mov rsi, 1 ;set rsi to point to the first block
jmp SearchLoop ;jump to the head of the search loop
StartBottom:
;we didnt get exclusive access to that part of the routine, we know there are not enough blocks in any of the other sets for us, so we wait till
;the other thread is done creating and initializing everything, at which point we point to the new set, set rdi and rsi to 1 and rcx to 0 then jump to the search loop
cmp _gm, 1 ;see if the thread that is setting up the new tables is still working on it
je StartBottom ;if it is jump to the head of this loop again
mov r13, _memslot ;load the memslot array into r13
mov rdi, qword ptr[r13] ;get the index of the last table (we woudnt be here unless we allready searched all the other tables)
mov r13, qword ptr[r13+rdi*8] ;load the new table into r13
mov rdi, 1 ;point rdi to the first block
mov rsi, 1 ;set rsi to point to the first block
xor rcx, rcx ;clear the counter
jmp SearchLoop ;jump to the head of the search loop
EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
cmp _var9, 187679
jne ItsGood1
cmp r11, _var8
je ItsGood1
mov rax, rax
ItsGood1:
RET
MallocError:
mov rax, rax
GetMem64 ENDP
Re: Question about volatile registers
sorry took 3 posts to get everything in:
Now there is a bit of extra code in there trying to debug this problem, hopefully its commented enough to understand a few things though:
1. EnterFunc/ExitFunc these are macros that set up each function that needs it, it save whatever registers it is told to (the _zero/_one after the name tells it which to save the order is rax, rbx, rcx, rdx, rdi, rsi, r8, r9, r10, r11, r12, r13, r14, r15, rsp). These macros also set up/clean up the stack frame. the enterfunc macro also reserves room for a bunch of temp variables if needed.
2. The variables that you dont see declared at the beggining of the function are text equates that point into the stack that was set up by the enterfunc/exitfunc macros.
3. you can tell where I have debugging code inserted, I mark those sections with extra blank spaces so that its easy to clean up.
4. In the APow64 function, immediately before and after the call to GetMem64 youll see me saving the registers and restoring them, this appears to work, if I comment them out then when it does throw an exception it will throw it at the very first instruction after the call GetMem64 that tries to use r11 as a pointer. When this happens, the value that it shows in r11 is Allways the same, I havent restarted my computer since last night, I was trying to get it to throw the exception when I saved the volatiles before the call, it didnt.
5. You can see the debugging code in GetMem64 that I use to check the value in r11 before it exits. This alg normally takes only one argument (the number of bytes being requested), and normally youd never see a reference to r11 in this algorithm, the only reason you see it now is so I can check it. I do not check r11 when it first enters the alg, I just check to make sure r11 still equals the value passed in (from APow64) to check it against.
6. I just looked at the way the code looks in here, it a lot easier to read in visual studio, sorry.
Re: Question about volatile registers
http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
I keep coming back to this article describing register usage and looking at r10/r11
Quote:
R10:R11
Volatile
Must be preserved as needed by caller; used in syscall/sysret instructions
r10 and r11 are the only (integer) registers marked as must be saved by the caller. As I mentioned before if I change r11 to something like r15 int the calling function the problem disappears. Normally, I would just make the changes and move on, but if there is a problem in the alg I would like to find it and fix it. The wording in the referenced article has me wondering though:
Quote:
Volatile registers are scratch registers presumed by the caller to be destroyed across a call
the "across a call" wording sounds like they are trying to convey that those register may be destroyed by more then the called function, in which case restoring them before returning from the called function may not allways work (which goes hand in hand with the must be saved by caller statement in same article), the problem is I can find nothing on the internet to confirm or deny my thinking.
Any ideas?
Re: Question about volatile registers
I don't think I can really add anything substantial to the quite competent comments already posted by OReubens, except perhaps this:
Quote:
Originally Posted by
AKRichard
1. EnterFunc/ExitFunc these are macros that set up each function that needs it, it save whatever registers it is told to (the _zero/_one after the name tells it which to save the order is rax, rbx, rcx, rdx, rdi, rsi, r8, r9, r10, r11, r12, r13, r14, r15, rsp). These macros also set up/clean up the stack frame. the enterfunc macro also reserves room for a bunch of temp variables if needed.
Already when reading http://forums.codeguru.com/showthrea...57#post2115357, I suspected some possible connection between macros/text equates and what looked like stack corruption (or mis-addressing data stored on the stack). Back then I abandoned that idea because I couldn't imagine how plain text replacement could cause something like that, but now I think it may be worth checking this direction. Be it just to confirm the macros are not to blame.
Re: Question about volatile registers
Quote:
Originally Posted by
AKRichard
I dont remember what the mp was, it had the mp and a little bit of memory all on one chip, but it was part of a kit that had a piece of bread board, 2 of the 8 segment leds, a couple of slide switches and a couple of push button switches, you programmed the thing by punching in the op codes and hitting a button. Pretty cool kit at the time actually, if I remember right it only worked on 8 bits and I dont think it even had a div instruction. But I spent hours at that stupid thing wiring in seperate address decoders and different timing circuits. Today you can get a kit at radio shack that would blow it away, but at the time I was only 14 and was still a few months away from getting my first tandy. I still have the tandy by the way, one of the first color computers, it had a whole 8 colors and 64k memory woohooo lol.
I never had one myself, but I know these single-board gadgets with binary direct-to-memory entry (already hex was luxury back then :D) and 7-seg display. With interest I followed a series of magazine articles about a system one had to assemble themselves (best thing available was a kit IIRC) based on the SC/MP CPU. (In fact that was in that very magazine referred to at the end of the Wikipedia article.)
My first computer (not counting programmable calculators) also was Tandy. But it was a Model I Level II with no color at all, instead "true" black and wite, i.e. bi-level, and the entire screen's "graphics" resolution was less than today's highest resolution Windows application icons. :D
However, I always thought any CPU would have a NOP instruction. After all that was particularly useful with that sort of hackish direct-to-core code editing/debugging of these days.
Re: Question about volatile registers
Funny you should mention that post, I just reread it the other day. As this project has evolved every once in a while Ill see a way to reduce its complexity, code an idea or two and test it out, if it works as thought, and add clarity to the code, then Ill implement it. The macros mentioned solved a few problems, configuring the stack correctly, and making sure every push had a corresponding pop (in the version before the macros I used pushes and pops all over the place which was hard to follow sometimes).
The macros (appear) to be running correctly, I didnt run into this problem until I tackled redesigning my memory management routine. I am finding out that everything runs stably until the management routine needs more memory from the system, every single time this problem pops up, it does so after it grabs more memory from the system (thats the CreateNewSet section of GetMem), and thats why I come here, I think I just figure it out. Ill try it and let you know.
Re: Question about volatile registers
So, in the middle of typing that last reply, when I got done typing that last bit about how it doent throw an exception till it requests more memory when I remembered something OReubens said:
Quote:
do note that r11 is not guaranteed preserved after a windows API call. so if your actual code calls windows API functions, then that may be your cause. Another possible error is as said the stack getting overwritten.
The GetMem routine never uses r11, so I didnt have the macro saving and restoring r11. I have very few functions that call any of the windows api's directly. Anyways, everything clicked when I was typing, when it creates a new set it makes 2 calls into malloc, r11 wasnt being saved since it wasnt used in the called function, the reason it only did it in multi threaded mode was because in single threaded mode it doesnt run out of memory, what it grabs during initialization is usually enough. The only part that I havent figured out yet is I can set a global variable (_usemalloc) to 1 and the GetMem function will call malloc every request it recieves without ever throwing an exception. The only thing I can figure is when I have it set that way the requests are relatively small (usually less than a few hundred bytes), but when my management routine makes a request, it asks for about 4.5 megs all at once so maybe malloc is calling into some other routine that makes a syscall/sysret.
So, I set the macro in the GetMem routine to save and restore r11, now it doesnt throw an exception, though it still messes up the function that called GetMem on the thread that created the new set (messes it up as in it throws an incorrect answer), though everything goes back to running smoothly right after that. My memory management routine is now running almost twice as fast as it was before I redid it.
Thanks guys