CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    [RESOLVED] Question about volatile registers

    Hello all,

    Ive got an intermittent problem and think Ive finally narrowed it down. In my library almost every function saves every register that it uses that is not also used to pass parameters, then restores them before returning. I did this to make it easier when calling a function so that the caller could expect as little state change as possible. Sometimes though, one of the volatile registers (which was saved at the beggining of the function and restored) will return changed. This problem usually happens rarely (usually taking on the order of at least a few million calls to various functions), it happens in random places after calls to apparantly random functions and I am wondering, if the system were to switch contexts say right at the point where it is returning from a function call, would it change the volatile registers?

    An example:

    Code:
    	Function1 PROC 
    
    		mov		r11, rcx
    		call	Function2
    
    							;<-----is it possible for r11 to change in this call even though function 2 restores r11 before returning?
    
    		add		r11, r12
    
    		mov		rax, r11
    
    		RET
    
    	Function1 endp
    
    	Function2 PROC 
    
    		push	r11
    		mov		r11, rdx
    		mov		rax, rcx
    		mul		r11
    		pop		r11
    		RET
    
    	Function2 endp
    just some bs code to illustrate an example (I could give a concrete example, but that would be a lot more code that I thought would confuse the concept). Anyways, in the example above, r11 is used in both functions, function 2 saves r11 before using it and restores it before returning, now 99.99999999% of the time everything runs fine, but every once in a great while r11 will be changed when function1 uses it after returning from the call to function2 it will be changed.

    Ive been able to trap this problem by checking r11, which in most of my functions holds a pointer to the first array passed in to a function, immediately after returning from a function call with something like this:

    Code:
    	Function1 PROC 
    
    		local  _tr11:qword
    
    		mov		r11, rcx
    		mov		_tr11, r11
    		call	Function2
    
    							;<-----is it possible for r11 to change in this call even though function 2 restores r11 before returning?
    
    		cmp		r11, _tr11
    		je		ItsGood
    		mov		rax, rax
    
    ItsGood:
    
    		add		r11, r12
    
    		mov		rax, r11
    
    		RET
    
    	Function1 endp
    
    	Function2 PROC 
    
    		push	r11
    		mov		r11, rdx
    		mov		rax, rcx
    		mul		r11
    		pop		r11
    		RET
    
    	Function2 endp
    Ill put a breakpoint on the mov rax, rax instruction so that it runs fine unless it returns from the function call changed. I cant figure out HOW its getting changed though, so I was wondering, since the system considers r11 volatile across function calls, would it save r11 if a context switch happens after completing a call to a function but before dropping back in to the calling function?

    Thanks in advance.

  2. #2
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    One more thing, it appears to happen in multi-threaded mode but not when run in single threaded mode (only about 95% sure of this at the moment though). Im still running it in different modes to see if I can narrow the problem down more.

  3. #3
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: Question about volatile registers

    That pretty much looks like plain stack corruption to me: When the saved R11 value gets popped back, the stack memory holding the value has been unintentionally modified in the meantime by whatever. Context switches should in any case be entirely transparent to application code, so I wouln't even think of blaming them unless I'd suspect the OS to be buggy.

    BTW, why are you using a MOV RAX,RAX instruction? Doen't x64 feature a NOP?
    Last edited by Eri523; June 20th, 2013 at 09:01 PM.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  4. #4
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    That pretty much looks like plain stack corruption to me:
    I am not sure where its going wrong, all I can say so far is that r11 is correct before the call to function2, and that every once in a great while it gets corrupted between the function call and its return. I am working on putting code in to verify r11 before it leaves function2 now. I just cant see why it happens so rarely. In all of my algorithms the only pushes and pops are at the very beggining of the alg and right before the ret, I even use macros for it, that all happens during setup and clean up of the stack frame, so it would seem to me it should do it everytime it hit the function. I am also pretty sure it is only happening in multi threaded mode, but the only objects that should be shared between threads are a few global constants which are initialized during startup and then dont change. Anyways, I thought context switches should be transparent to the program but wasnt sure if there was some subtle detail about it wasnt aware of.

    BTW, why are you using a MOV RAX,RAX instruction? Doen't x64 feature a NOP?
    yes there is, its just an old habit from my microprocessor class from 1982, the processor in our lab setup didnt have a nop instruction so I used to do something like it to achieve a nop and been stuck on stupid ever since.

    I forgot to mention in the original post, if I change r11 to r15 (either in function 1 or in both functions) the problem goes away, thats why I phrased it as a question about voltile registers. If I use a non-volatile (such as r15) the problem goes away.

    I just verified that r11 is correct when leaving function2. Somehow it is changing between leaving function2 but before it drops back in to function1. Any ideas what would cause that?
    Last edited by AKRichard; June 20th, 2013 at 08:21 PM.

  5. #5
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: Question about volatile registers

    Quote Originally Posted by AKRichard View Post
    yes there is, its just an old habit from my microprocessor class from 1982, the processor in our lab setup didnt have a nop instruction so I used to do something like it to achieve a nop and been stuck on stupid ever since.
    Mostly out of curiosity, what MP was it you were using these days? I was mostly writing code for the Z80 in the early 80s, and that had the NOP encoded as 00h (just like the Intel 8080), a value that uninitialized RAM cells had on many hardware platforms back then. (Uuuh, nostalgia... ) The x86 encoding of NOP, at leadt as far as 32-bit goes, is 90h, and I have a faint memory of there really being a reason behind that, I just don't recall what it was...

    Any further reflection about your post not before I had some sleep...
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  6. #6
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    I dont remember what the mp was, it had the mp and a little bit of memory all on one chip, but it was part of a kit that had a piece of bread board, 2 of the 8 segment leds, a couple of slide switches and a couple of push button switches, you programmed the thing by punching in the op codes and hitting a button. Pretty cool kit at the time actually, if I remember right it only worked on 8 bits and I dont think it even had a div instruction. But I spent hours at that stupid thing wiring in seperate address decoders and different timing circuits. Today you can get a kit at radio shack that would blow it away, but at the time I was only 14 and was still a few months away from getting my first tandy. I still have the tandy by the way, one of the first color computers, it had a whole 8 colors and 64k memory woohooo lol.

    As to the original question, Ive found a few articles that mention r10, r11 specifically are used by the syscall/sysret instructions, and one article that mentions those two must be preserved by the caller, but it doesnt mention if thats if the called function uses it or f it should allways be saved. http://gcc.gnu.org/ml/gcc/2008-04/msg00550.html. Its not just any volatiles it is specifically r10/r11, if either of them are used in function1, then sooner or later it hits an exception

  7. #7
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: Question about volatile registers

    It shouldn't ever fail with THAT exact snipped of code you posted. The only time it may not is if you also have a hardware interrupt handler that doesn't restore r11. (but then I'd expect you to see a whole lot of other issues as well).
    If a context switch would cause it, this would wreck massive issues in your app as a whole.


    If you're getting it with another bit of code which you didn't post. then... only you can tell...

    do note that r11 is not guaranteed preserved after a windows API call. so if your actual code calls windows API functions, then that may be your cause. Another possible error is as said the stack getting overwritten.
    (r15 is preserved through api calls btw, which could explain why changing your code to r15 solves the problem).
    Last edited by OReubens; June 21st, 2013 at 08:15 AM.

  8. #8
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    Ok, I have to break this post in two parts (it said Im almost double what it allows to post).

    thats not the exact code, but illustrates the idea, the exact code where the exception pops up (because r11 no longer holds my pointer) is: this would be equivalent to function1 above:

    Code:
    	APow64 PROC base:PTR qword, exp:PTR qword
    
    		;saves the registers we use and adjusts the stack pointer
    
    		ENTERFUNC _zero, _one, _zero, _zero, _one, _one, _one, _one, _one, _one, _one, _zero, _one, _zero, _zero
    
    		;prep by saving the base, exponent, and decimal place, moving rcx into r11, rdx into r12, clearing r8 and r10, then begginging our checks
    
    		mov		_v1, rcx					;save the base
    		mov		_v2, rdx					;save the exponent
    		mov		_var3, rax					;saves the decimal point (this will be kept as the original decimal place variable)
    		mov		_var4, rax					;save the decimal point again (this will keep the running total for where the decimal point is)
    		mov		r11, rcx					;move the base into r11
    		mov		r12, rdx					;move the exponent into r12
    		mov		r8, 0						;clear r8
    		mov		r10, 0						;clear r10
    		cmp		qword ptr[r12], 1			;check if the exponents array has only one element
    		ja		NextCheck					;if its more than one jump to the next check because it has to be greater than one
    		cmp		qword ptr[r12], 0			;check if the element count is 0 for the exponent
    		je		ReturnOne					;technically this should be an error since no array should have an element count of 0, but we treat as if the exponent is equal to 0
    		cmp		qword ptr[r12+8], 0			;checck the first value element for 0
    		je		ReturnOne					;jump to return one if it does
    		cmp		qword ptr[r12+8], 1			;check if the first vaalue element equals 1
    		je		ReturnVal1					;jump to returnval1 if it does (this will return a copy of the base)
    
    NextCheck:
    
    		;here we check the base for the same values we checked for the exponent
    
    		cmp		qword ptr[r11], 0			;check if the element count is 0 (again, this should be an error but we treat it as a zero)
    		je		ReturnZero					;jump to returnzero if it does
    		cmp		qword ptr[r11], 1			;check if the element count is one
    		ja		MoveOn						;cancel the rest of the checks if it is above
    		cmp		qword ptr[r11+8], 0			;compare the first value element for 0
    		je		ReturnZero					;jump to returnzero if it equals
    		cmp		qword ptr[r11+8], 1			;compare the first value element for 1
    		je		ReturnOne					;jump to return one if it does
    
    MoveOn:
    
    		;first thing to do after the checks is to count how many elements for the answer and temp arrays we need.  I have special versions of the multiply and square routines that do not create
    		;a return array (they use the temp array that we create).  Then is a lot faster then creating and destroying the arrays with each pass of the alg.  We overestimate the element count
    		;by taking the number of elements in the base to start the running total at, each iteration we double the running total and add one, if the current bit in the exponent is 1
    		;we add the base element count to the running total and add one.  In the actual alg that extra byte will not allways be used.  To start we move the exponent element count into rdi and save it in _var1
    		;then we count the number of bits in the high byte of the exponent, load the bits array into r9 (this contains the powers of 2 from a power of 0 to a power of 63), move the element count of the
    		;base into rax, increase that count a little to give ourselves some room, make sure the bit count in the high byte of the exponent was not 0 (if its not we jump to decandgo if it is we
    		;decrement the index into the exponent, make sure the index is not 0, if it is the count is done, if not store 64 in rsiand move on)
    
    		mov		rdi, qword ptr[r12]			;move the element count of the exponent into rdi
    		mov		_var1, rdi					;save the element count in _var1
    		bsr		rsi, qword ptr[r12+rdi*8]	;count the number of bits in the high byte of the exponent
    		mov		_var2, rsi					;save the bit count in _var2
    		lea		r9, _bits					;load the address to the bits array into r9 (the bits array have the powers of two)
    		mov		rax, qword ptr[r11]			;move the element count of the base into rax
    		add		rax, 10						;increase the element count some to give ourselves some room
    		cmp		rsi, 0						;compare rsi to 0
    		ja		DecAndGo					;if its above jump to decandgo
    		cmp		rdi, 1						;if rsi is 0 compare rdi to 1
    		je		CountDone					;if rsi equals 1 then the count is done
    		mov		rsi, 64						;otherwise store 64 in rsi
    		dec		rdi							;decrement the index into the exponent
    		mov		_var1, rdi					;save the index in _var2
    
    DecAndGo:
    
    		dec		rsi							;we need to decrement the index into the bits array by 1 because the most significant bit is allready counted
    		mov		_var2, rsi					;store the index into the bits array in _var2
    
    CountLoop:
    
    		add		rax, rax					;double rax
    		inc		rax							;add one to rax
    		mov		rcx, qword ptr[r12+rdi*8]	;move the byte of the exponent we are working on into rcx
    		and		rcx, qword ptr[r9+rsi*8]	;and the byte will the value stored in the bits array pointed to by rsi
    		jz		NoMult1						;if the result was 0 then dont add the base element count
    		add		rax, qword ptr[r11]			;add the element count of the base to rax
    		inc		rax							;add one to rax
    
    NoMult1:
    
    		cmp		rsi, 0						;compare the index into the bits array to 0
    		ja		NotZero						;if its above 0 jump to notzero
    		cmp		rdi, 1						;otherwise to move to the next byte in the exponent, compare the index into the exponent to 1
    		jna		CountDone					;if its not above the count is done
    		mov		rsi, 63						;otherwise set the index into the bits array to 63
    		dec		rdi							;decrement the index into the exponent
    		jmp		CountLoop					;jump to the head of the count loop
    
    NotZero:
    
    		dec		rsi							;the check above meant we still have bits in the current byte to go through so decrement the index to bits
    		jmp		CountLoop					;jump to the head of the count loop
    
    CountDone:
    
    		;now that we know how many elements we need for the arrays, we multiply the count by 8 (8 bytes in 64 bits), then make 2 calls to getmem and copy the base to one of them.
    
    		mov		_t1, rax					;move the result of the count into _t1
    		mul		_eight						;multiply the result by 8 for 64 bits
    		mov		rcx, rax					;move the final result into rcx
    
    
    		mov		_var7, r8	;temporarily here for debugging
    		mov		_t9, r11	;temporarily here for debugging
    		mov		r8, r11		;temporarily here for debugging
    		mov		rdx, 187679	;temporarily here for debugging
    
    
    		mov		_r8, r8				;this is here trying to fix the exception
    		mov		_r9, r9				;this is here trying to fix the exception
    		mov		_r10, r10			;this is here trying to fix the exception
    		mov		_r11, r11			;this is here trying to fix the exception
    		call	GetMem64					;call our getmem routine
    		mov		r11, _r11			;this is here trying to fix the exception
    		mov		r10, _r10			;this is here trying to fix the exception
    		mov		r9, _r9				;this is here trying to fix the exception
    		mov		r8, _r8				;this is here trying to fix the exception
    
    
    		cmp		r11, _t9	;temporarily here for debugging
    		je		NoProb		;temporarily here for debugging
    		mov		rax, rax	;temporarily here for debugging
    
    NoProb:						;temporarily here for debugging
    
    		cmp		r11, _v1	;temporarily here for debugging
    		je		NoProbA		;temporarily here for debugging
    		mov		rax, rax	;temporarily here for debugging
    
    NoProbA:					;temporarily here for debugging
    
    		mov		r8, _var7
    		
    
    
    		cmp		rax, 0						;make sure what was returned from the routine is valid
    		je		MallocError					;if its not jump to mallocerror
    		mov		_retval, rax				;move the pointer into _retval
    		mov		rcx, qword ptr[r11]			;move the element count of the base into rcx
    		inc		rcx							;increment the count by one for the element count at position 0
    		mov		rdi, rax					;move the pointer we just got into rdi
    		mov		rsi, _v1					;move the pointer for the base into rsi
    		pushf								;save the flags register
    		cld									;clear the direction flag
    rep		movsq								;copy all the elements from the base into the array we just got
    		popf								;restore the flags register
    		mov		rax, _t1					;move the size of our return array into rax
    		mul		_eight						;multiply it by 8 for the 64 bits
    		mov		rcx, rax					;move the result into rcx
    
    
    		mov		_var7, r8		;temporarily here for debugging
    		mov		_rdx, 187679	;temporarily here for debugging
    		mov		r8, r11			;temporarily here for debugging
    
    
    		mov		_r8, r8				;this is here trying to fix the exception
    		mov		_r9, r9				;this is here trying to fix the exception
    		mov		_r10, r10			;this is here trying to fix the exception
    		mov		_r11, r11			;this is here trying to fix the exception
    		call	GetMem64					;call our getmem routine
    		mov		r11, _r11			;this is here trying to fix the exception
    		mov		r10, _r10			;this is here trying to fix the exception
    		mov		r9, _r9				;this is here trying to fix the exception
    		mov		r8, _r8				;this is here trying to fix the exception
    
    
    		cmp		r11, _t9	;temporarily here for debugging
    		je		NoProb2		;temporarily here for debugging
    		mov		rax, rax	;temporarily here for debugging
    
    NoProb2:					;temporarily here for debugging
    
    		cmp		r11, _v1	;temporarily here for debugging
    		je		NoProb2A	;temporarily here for debugging
    		mov		rax, rax	;temporarily here for debugging
    
    NoProb2A:					;temporarily here for debugging
    
    		mov		r8, _var7
    
    
    		cmp		rax, 0						;make sure we got a valid pointer back
    		je		MallocError					;if not jump to mallocerror
    		mov		r10, rax					;move the pointer into the r10 register
    		mov		_t9, r10					;save the pointer in _t9
    		mov		r8, _retval					;move _retval into r8
    
    MainLoop:
    
    		;this is just the standard repeated square and multiply alg.  On each iteration we will square the _retval (kept in r8), the special square alg will take r8 and store the result
    		;in the array into r10, next we check the current bit of the current byte in the exponent for 1, if it is 1 then we multiply the result from the squaring by the base.
    
    		mov		rax, _var4					;move the running total for the decimal point into rax
    		add		_var4, rax					;add rax back into _var4 to double it
    		mov		rdi, r10					;move the pointer to the temp array into rdi
    		mov		rcx, qword ptr[r8]			;move the element count in _retval into rcx
    		add		rcx, rcx					;add rcx to itself to double it
    		inc		rcx							;make room for the element count
    		inc		rcx							;give it one extra byte in case there is still a carry to do after the final multiply
    		mov		qword ptr[r10], rcx			;rcx has our (possible) new element count for the result of the squaring, so store it at position 0 of the temp array
    		mov		rax, 0						;move 0 into rax
    		add		rdi, 8						;add 8 to rdi so that it points to the element at position one of the temp array
    		pushf								;save the flags register
    		cld									;clear the direction flag
    rep		stosq								;store 0 in all the value elements
    		popf								;restore the flags register
    		mov		rdi, 1						;set rdi to 1
    		mov		rsi, 1						;set rsi to 1
    		mov		rcx, 1						;set rcx to 1
    		mov		rdx, 0						;clear rdx
    
    SqLoop:
    
    		mov		rbx, rdx					;rdx holds if there was a carry from the last iteration, save it inrbx
    		mov		rax, qword ptr[r8+rdi*8]	;move the current byte of the result array into rax
    		mul		qword ptr[r8+rsi*8]			;multiply by the other current byte of the result array
    		add		qword ptr[r10+rcx*8], rax	;add the result to the current position of the temp array
    		adc		rdx, 0						;if there was a carry from the add, this will add one to the carry from the multiply
    		add		qword ptr[r10+rcx*8], rbx	;add the carry from the last iteration to the current position in the temp array
    		adc		rdx, 0						;if there was a carry from the add, this will add one to the carry from the multiply (only one at most of these adc instructions will inc rdx)
    		inc		rcx							;inc our index into the temp array
    		inc		rdi							;inc our first position in the _retval
    		cmp		rdi, qword ptr[r8]			;compare rdi to the element count
    		jna		SqLoop						;if its not larger than jump to the head of the sqloop
    		mov		qword ptr[r10+rcx*8], rdx	;the first position reached the end of retval so store the carry from the last iteration in the next element of the temp array (we add it here just in case the second position of the retval has been reached)
    		mov		rdx, 0						;zero out the return so we dont add it again
    		mov		rdi, 1						;point the first index into retval to the first element
    		inc		rsi							;inc the second index into retval
    		mov		rcx, rsi					;point the index into the temp array to the same position as the second index into the retval
    		cmp		rsi, qword ptr[r8]			;make sure the second index is not past the last element
    		jna		SqLoop						;if its not jump to the head of the square loop
    
    SqDone:
    
    		mov		rdi, qword ptr[r10]			;move the element count of the temp array into rdi
    
    ZCheck1:
    
    		cmp		qword ptr[r10+rdi*8], 0		;compare the current element we are pointing at to 0
    		ja		ZCheck1Done					;if its above jump out
    		cmp		rdi, 1						;otherwise compare the index into the array to one
    		jna		ZCheck1Done					;if its not above jump out
    		dec		rdi							;otherwise decrement it by one
    		mov		qword ptr[r10], rdi			;and store it back in position 0 of the temp array
    		jmp		ZCheck1						;jump to the head of the zero check loop
    
    ZCheck1Done:
    
    		mov		rax, r10					;we switch the arrays so that the current result is allways in r8 mmove r10 into rax
    		mov		r10, r8						;move r8 into r10
    		mov		r8, rax						;move rax into r8
    		mov		rdi, _var1					;move the index into the exponent into rdi
    		mov		rsi, _var2					;move the index into the bits array into rsi
    		mov		rax, qword ptr[r12+rdi*8]	;move the current byte of the exponent into rax
    		and		rax, qword ptr[r9+rsi*8]	;and rax with the value pointed to in bits
    		jz		NoMult2						;if the result was 0 then jump past the multipliaction loop
    		mov		rcx, qword ptr[r8]			;otherwise move the element count of retval into rcx
    		add		rcx, qword ptr[r11]			;add the element count of the base to rcx
    		inc		rcx							;inc the count for the element count to store in the temp array
    		inc		rcx							;inc the element count in case the result of the multiply goes over
    		mov		qword ptr[r10], rcx			;store the calcd element count at position 0 of the temp array
    		mov		rdi, r10					;move the pointer of the temp array into rdi
    		add		rdi, 8						;add 8 to rdi so that we point to the element at position 1
    		mov		rax, 0						;move 0 into rax
    		pushf								;save the flags register
    		cld									;clear the direction flag
    rep		stosq								;clear all the elements in the temp array
    		popf								;restore the flags register
    		mov		rsi, 1						;point to the first element of the retval
    		mov		rdi, 1						;point to the first element of the base
    		mov		rcx, 1						;point to the first element of the temp array
    		mov		rax, _var3					;move the original decimal point into rax
    		add		_var4, rax					;add it to the running total
    		mov		rdx, 0						;clear rdx
    
    MultLoop:
    
    
    		mov		rbx, rdx					;again rdx holds the carry from the last iteration so save it in rbx
    		mov		rax, qword ptr[r8+rsi*8]	;move the current element of retval into rax
    		mul		qword ptr[r11+rdi*8]		;multiply by the current element in the base
    		add		qword ptr[r10+rcx*8], rax	;add the result to the current element of the temp array
    		adc		rdx, 0						;if the last add resulted in a carry this will add it to the carry from the multiplication
    		add		qword ptr[r10+rcx*8], rbx	;add the carry from the last iteration to the current element in the temp array
    		adc		rdx, 0						;if the last add resulted in a carry than add it to the carry from the multiply (only one of the adc instructions will inc rdx)
    		inc		rcx							;point to the next element in the temp array
    		inc		rdi							;point to the next element in the base
    		cmp		rdi, qword ptr[r11]			;make sure we are not past the last element in the base
    		jna		MultLoop					;if we are good then jump to the head of the multiply loop
    		mov		qword ptr[r10+rcx*8], rdx	;otherwise move the carry from this iteration to the new current position in the temp array
    		mov		rdx, 0						;clear rdx
    		mov		rdi, 1						;point the index into the base at the first element
    		inc		rsi							;inc the index into the retval
    		mov		rcx, rsi					;point the index into the temp array to the same position as the index into retval
    		cmp		rsi, qword ptr[r8]			;make sure we are not past the last element of retval
    		jna		MultLoop					;if not jump to the head of the mult loop
    
    MultDone:
    
    		mov		rcx, r10					;move the temp array into rcx
    		CHECKFORZEROS						;check for zeros
    		mov		r10, r8						;switch the two arrays, move the current retval into r10
    		mov		r8, rax						;move the old temp array into retval
    
    NoMult2:
    
    		cmp		_var2, 0					;check if the index into the bits array is 0
    		ja		NotZ						;if its not jump to the notz
    		cmp		_var1, 1					;otherwise make sure we are still above the least siginificant byte of the exponent
    		jna		GoOut						;if not then go out
    		dec		_var1						;otherwise dec the index into the exponent
    		mov		_var2, 63					;and move 63 into the index into bits
    		jmp		MainLoop					;jump to the head of the main loop
    
    NotZ:
    
    		dec		_var2						;we still have bits to go sso decrement the index into the bits array
    		jmp		MainLoop					;jump to the head of the main loop
    
    ReturnVal1:
    
    		;the exponent must be equal to one so we return a copy of the base
    
    		mov		rax, qword ptr[r11]
    		add		rax, 5
    		mul		_eight
    		mov		rcx, rax
    		mov		_r8, r8
    		mov		_r9, r9
    		mov		_r10, r10
    		mov		_r11, r11
    		call	GetMem64					;call our getmem routine
    		mov		r11, _r11
    		mov		r10, _r10
    		mov		r9, _r9
    		mov		r8, _r8
    		cmp		rax, 0
    		je		MallocError
    		mov		r8, rax
    		mov		r11, _v1
    		mov		rcx, qword ptr[r11]
    		inc		rcx
    		mov		rsi, r11
    		mov		rdi, r8
    		pushf
    		cld
    rep		movsq
    		popf
    		mov		rax, qword ptr[r11]
    		mov		qword ptr[r8], rax
    		jmp		GoOut
    
    ReturnZero:
    
    		;the base must of been equal to 0 (and the exponent was greater than 0) so we return 0
    
    		mov		rcx, 32
    		mov		_r8, r8
    		mov		_r9, r9
    		mov		_r10, r10
    		mov		_r11, r11
    		call	GetMem64					;call our getmem routine
    		mov		r11, _r11
    		mov		r10, _r10
    		mov		r9, _r9
    		mov		r8, _r8
    		cmp		rax, 0
    		je		MallocError
    		mov		r8, rax
    		mov		qword ptr[r8], 1
    		mov		qword ptr[r8+8], 0
    		mov		_var4, 0
    		jmp		GoOut
    
    ReturnOne:
    
    		;either the base was equal to 1 or the exponent was equal to 0
    
    		mov		rcx, 32
    		mov		_r8, r8
    		mov		_r9, r9
    		mov		_r10, r10
    		mov		_r11, r11
    		call	GetMem64					;call our getmem routine
    		mov		r11, _r11
    		mov		r10, _r10
    		mov		r9, _r9
    		mov		r8, _r8
    		cmp		rax, 0
    		je		MallocError
    		mov		r8, rax
    		mov		qword ptr[r8], 1
    		mov		qword ptr[r8+8], 1
    		mov		_var4, 0
    		jmp		GoOut
    
    MallocError:
    
    		;if there was an error requesting memory try to get the error code, store it in the return value and send it back to managed code
    
    		mov		rcx, 32
    		mov		_r8, r8
    		mov		_r9, r9
    		mov		_r10, r10
    		mov		_r11, r11
    		call	GetMem64					;call our getmem routine
    		mov		r11, _r11
    		mov		r10, _r10
    		mov		r9, _r9
    		mov		r8, _r8
    		cmp		rax, 0
    		je		GoOut
    		mov		r8, rax
    		call	GetLastError
    		mov		qword ptr[r8], 0
    		mov		qword ptr[r8+8], 0
    		mov		qword ptr[r8+16], rax
    
    GoOut:
    
    		;we are all done so we destroy the temp array if it was created then exit
    
    		cmp		r10, 0
    		je		NotC
    		mov		rcx, r10
    		call	Free64
    
    NotC:
    
    		mov		rcx, _var4
    		mov		rax, r8
    
    		EXITFUNC _zero, _one, _zero, _zero, _one, _one, _one, _one, _one, _one, _one, _zero, _one, _zero, _zero
    
    		RET
    	
    	APow64	ENDP
    Last edited by AKRichard; June 21st, 2013 at 11:15 PM.

  9. #9
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    And the following would be equivalent to function2:

    Code:
    	GetMem64 PROC _tsize:qword, _ch:qword, _nr:qword
    
    		;using a management table
    		;each table is specific to a memory set
    		;each entry in the table will have the following pattern
    		;
    		;all unused block will have all words _zerod
    		;
    		;a used block will have words 0 and 1 set to the index of the first block shifted to the left 16
    		;words 2 and 3 will hold how many blocks are being used shifted to the left 16
    
    		ENTERFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
    		
    		;we check to make sure everything is initialized
    
    		mov		_v1, rcx				;save the amount of memory requested
    
    
    
    
    
    
    
    
    
    		mov		_var9, rdx
    		mov		_var8, r8
    
    
    
    
    
    
    
    
    
    
    		;now we check if we are supposed to use malloc or not, if we do, then call it and exit
    
    		cmp		_usemalloc, 0			;if the _usemalloc variable is 0 we use this alg, if it is not 0, we use malloc to fullfill the request
    		je		UseMine
    		call	malloc					;the call to malloc
    		cmp		rax, 0					;if rax contains 0 then there was an error in the call
    		je		MallocError
    
    		EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
    
    		cmp		_var9, 187679
    		jne		ItsGood3
    		cmp		r11, _var8
    		je		ItsGood3
    		mov		rax, rax
    
    ItsGood3:
    
    		RET
    
    UseMine:
    
    		cmp		_initialized, 0			;check if everything is allready initialized, if not then run initialization
    		je		CallInitialize
    		mov		rcx, _v1				;restore the requested amount to rcx
    
    Allready_initialized:
    
    		;we start by dividing the requested amount by our allocation size to see how many units to look for (we increment it by one to cover the case it equals zero)
    		;then we set everything up by moving _memslot into r13, then moving the first set into r13, set rdi and rsi to 1 rcx to zero (rdi is the current position in the set
    		;rsi marks the first block of the set we are looking at and rcx holds the count)
    
    		mov		rax, rcx				;mov the amount requested into rax for the division
    		xor		rdx, rdx				;clear the rdx register for the division
    		div		_allocsize				;divide the requested amount by the allocation size, this tells us how many blocks to look for
    		cmp		rdx, 0					;check if it was an exact division (no remainder)
    		cmovne	rdx, _one				;if there was a remainder then move 1 into rdx
    		add		rax, rdx				;this increments the number of blocks we are looking for by one if there was a remainder
    		mov		_t1, rax				;move the number of blocks we are looking for into t1
    		;add		_t1, 1
    		mov		r13, _memslot			;move the memslot array into r13, this array holds the pointers to the arrays that hold the mapping tables (which tells us which blocks are free and whoch are taken)
    		mov		rdi, 1					;we intially set to the first table, this will also be used for the pointer into the table
    		mov		rsi, 1					;rsi will keep track of the first available block for when we are ready to start marking
    		mov		_t2, 1					;the t2 variable keeps track of which table we are using so that we can return a pointer from the correct memptr table
    		mov		r13, qword ptr[r13+rdi*8]	;load the first memslot table into r13
    		xor		rcx, rcx				;clear rcx, this keeps track of how many contiguous free blocks we find
    
    SearchLoop:
    
    		;look at the slot, if it is 0 then inc rcx, compare rcx to how many units we are looking for, if its equal then jump to found
    		;else we inc rdi make sure rdi isnt pointing past the end, then jump to the top to do it again.
    		;if rdi is pointing past the end, then we load _memslot back into r13, load rdi from _t2 (which is keeping track of which set) inc rdi, make sure its pointing to a valid set
    		;load r13 with the next set, set rdi and rsi to 1, rcx to 0 and jump tp searchloop
    		;if it is pointing past the end of the sets, then we jump to create a new set
    
    		cmp		qword ptr[r13+rdi*8], 0		;check the element pointed to is zero, (0 means the block is free)
    		jne		ItsTaken					;if its allready being used go to the next section
    		inc		rcx							;if its not taken increment rcx to count another free block
    		cmp		rcx, _t1					;compare rcx to t1 (the number of free blocks we are looking for)
    		jnb		Found						;if we found enough go to prep for marking the blocks
    		inc		rdi							;increment rdi to point to the next block
    		cmp		rdi, _numberofallocs		;compare rdi to the number of allocations we keep in the table
    		jna		SearchLoop					;if its not above we jump to the head of the search loop to check the next block
    		inc		_t2							;if it is above we need to load the next table
    		mov		r13, _memslot				;load the memslot pointer into r13
    		mov		rax, _t2					;move the index to the next table into rax
    		cmp		rax, qword ptr[r13]			;check we are not past the last table in there
    		ja		CreateNewSet				;if we are jump to the create new set routine
    		mov		r13, qword ptr[r13+rax*8]	;move the next table into r13
    		mov		rdi, 1						;set the index to the first entry
    		mov		rsi, 1						;set rsi to point to the first entry (rsi holds the index to the first free block)
    		xor		rcx, rcx					;clear rcx (the counter)
    		jmp		SearchLoop					;jump to the head of the search loop
    
    ItsTaken:
    
    		movzx	rax, word ptr[r13+rdi*8+2]	;since the value wasnt 0 we know the value contains the info we need to jump past the taken blocks, we move the first block in this taken set into rax
    		movzx	rdi, word ptr[r13+rdi*8+6]	;we move the number of blocks taken in this set into rdi
    		add		rdi, rax					;we add rax to rdi so that now rdi points to the first block past this taken set
    		mov		rsi, rdi					;we set rsi = rdi so that rsi keeps track of the first block
    		xor		rcx, rcx					;we clear rcx since we need contiguous blocks
    		cmp		rdi, _numberofallocs		;make sure wwe are not pointing past the end of the table
    		jna		SearchLoop					;if we are good jump to the head of the search loop
    		inc		_t2							;if not increment the index into the memslot table
    		mov		r13, _memslot				;move the memslot table into r13
    		mov		rax, _t2					;move the index into rax
    		cmp		rax, qword ptr[r13]			;make sure we are not pointing past the last table
    		ja		CreateNewSet				;if we are jump to the create new set routine
    		mov		r13, qword ptr[r13+rax*8]	;move the next table into r13
    		mov		rdi, 1						;point to the first entry
    		mov		rsi, 1						;set rsi to the first entry
    		xor		rcx, rcx					;clear the counter
    		jmp		SearchLoop					;jump to the head of the search loop
    
    Found:
    
    		;ok we found enough empty units, now we prep to mark them, rax:rdx need to be zero, this is what the cmpxchg command compares against the memory location
    		;mov the starting block (rsi) into rcx, and the number of blocks we are reserving into rbx and shift both to the left 16 so that it uses the 32 bit form of the command
    
    		mov		rax, 0					;have to prep the compare and exchange each loop
    		mov		rdx, 0
    		mov		rbx, _t1				;move the number of blocks into rbx
    		mov		rcx, rsi				;move the index of the first block into rsi
    		shl		rbx, 16					;we have to shift the value so the correct form of cmpexchg is used
    		shl		rcx, 16					;we have to shift the value so the correct form of cmpexchg is used
    
    MarkLoop:
    		
    		;use the lock prefix so that we have exclusive access to the memory location and execute the command, if the zero flag is not set then that means the location no longer
    		;contains zero (another thread must have reserved it allready) so jump to allready marked and find another set of blocks.  If this command executes the first run, it is gauranteed
    		;to succeed on all the rest.  After marking all blocks we are using then we load _memptr into r13, and rdi with _t2 (so we point to the correct set) and load that set into r13
    		;then we multiply the number of blocks between 0 and the set we started at by the allocation size, add that to r13, and that gives us the begginging address of the
    		;memory we just reserved and thats what gets returned
    
    lock	cmpxchg8b	qword ptr[r13+rsi*8]	;lock the memory location and fire the cmpexchg instruction
    		jnz		AllreadyMarked				;if the zero flag is not set, then the exchange failed (if its going to fail it will only fail on the first time through)
    		inc		rsi							;increment the index
    		cmp		rsi, rdi					;compare it against rdi (where to stop)
    		jna		MarkLoop					;if its not above jump to the head of the mark loop
    		mov		r13, _memptr				;load the memptr table inot r13
    		mov		rdi, _t2					;load the index of the memslot table we found the free blocks in to rdi
    		mov		r13, qword ptr[r13+rdi*8]	;load the correct memptr into r13 (the one that corresponds to the memslot table where we found the free blocks)
    		mov		rax, rcx					;move the index to the first free block into rax
    		shr		rax, 16						;we have to shift it back to the right (because we shifted it to the left above)
    		mul		_allocsize					;multiply it by the allocation size
    		add		rax, r13					;add it to the base pointer (r13) this gives us the pointer to the first address of the memory we will return
    		mov		_v1, rax					;save the pointer for when we return
    
    		;this is a new section to protect against the possibility that this alg assigns some memory, then the _usemalloc variable gets changed.  With this set up we will keep track of
    		;how many requests that we filled are still out.  If the number is above 0 then the free alg will ALLWAYS start by checking if the memory being returned was filled from the memory
    		;we are handing out, if it is then we use our free alg to free it, if its not then it falls through to call the windows free routine.  
    		;In order to insure that we allways have the correct count we will use the cmpexchg instruction with the lock prefix, there may be an issue since both this alg and the free alg
    		;will be locking and accessing the _fm variable, if it slows it down to much Ill either remove this section or try something different.
    		
    IncCount:
    
    		mov		r13, 0						;prep to increment the count, set r13 to zero
    		mov		rax, _fm					;move the value in _fm to rax
    		mov		rdx, rax					;copy rax to rdx
    		shr		rdx, 32						;shift rdx to the right by 32 bits (_fm should be looked at like two dwords)
    		and		rax, 0ffffffffh				;and rax by 0ffffffffh to keep just the first 32 bits (rdx:rax now holds the value we will compare to)
    		mov		rbx, rax					;copy rax to rbx
    		mov		rcx, rdx					;copy rdx to rcx
    		add		ebx, 1						;add one to ebx
    		cmovc	r13, _one					;if the carry flag is set then set r13 to 1
    		add		ecx, r13d					;add r13d to ecx (r13d will be either 0 or 1)
    lock	cmpxchg8b	_fm						;lock and fire the instruction
    		jnz		IncCount					;if the zero flag is not set then another thread beat us to it so jump to IncCount and try again
    		mov		rax, _v1					;move the pointer back into rax
    
    		EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
    
    		cmp		_var9, 187679
    		jne		ItsGood2
    		cmp		r11, _var8
    		je		ItsGood2
    		mov		rax, rax
    
    ItsGood2:
    
    		RET
    
    CallInitialize:
    
    		call	Initialize					;needs to be initialized so call the initialization routine (done this way because all but the first time the jmp statement above will fall through)
    		jmp		Allready_initialized		;jmp back to the beggining
    
    AllreadyMarked:
    
    		;ok our marking of that address failed, on failure the command moves the value of the address into edx:eax, so as above we compute how many blocks to jump
    		;make sure it is not past the end of the set, then jump to the search loop again.  If it is past the end, load _memslot into r13, _t2 into rdi, inc rdi
    		;make sure it is pointing to a valid set, if it is set rdi and rsi to 1 rcx to 0 and jump to the search loop.  If it is not pointing to a valid set jump to create new set
    
    		shr		edx, 16						;the cmpexchg failed when the instruction fails it loads what it found at the memory location into eax/edx
    		shr		eax, 16						;both of which needs to be shifted to the right 16 bits
    		xor		rdi, rdi					;then eax gets added to edx and moved into rdi, this will point to the first block past this set
    		add		edi, eax
    		add		edi, edx
    		xor		rcx, rcx					;the count is cleared
    		mov		rsi, rdi					;mov rdi into rsi so the first block is kept track of
    		cmp		rdi, _numberofallocs		;make sure rdi doesnt point past the end of the table
    		jna		SearchLoop					;if its good jump to the head of the search loop
    		inc		_t2							;if its not incerement the index into the memslot array
    		mov		r13, _memslot				;mov the memslot array into r13
    		mov		rax, _t2					;mov the index into rax
    		cmp		rax, qword ptr[r13]			;make sure the index isnt pointing past the last table
    		ja		CreateNewSet				;if it is create a new set
    		mov		r13, qword ptr[r13+rax*8]	;otherwise move the next memslot table into r13
    		mov		rdi, 1						;point rdi to the first block
    		mov		rsi, 1						;point rsi to the first block
    		xor		rcx, rcx					;clear the counter
    		jmp		SearchLoop					;jump to the head of the search loop
    
    CreateNewSet:
    
    		;we want to make sure we are the only thread creating a new set so we use the cmpxchg command again but this time on a global variable _gm
    		;if we fail jump to start bottom.  If we succeed, get enough memory for another _memptr set and store it in the memptr table (updating the count as we go)
    		;then we request enough for another memslot table, we zero out all the elements, then store it in the memslot table (updating the count as we go)
    		;then we point _t2 to the new set, set rdi and rsi to 1 rcx to 0 and jump to the search loop setting _gm to zero right before we leave
    
    		mov		rax, 0					;have to prep the cmpexchg instruction
    		mov		rdx, 0					;we want to make sure the _gm global variable is 0 (which means no other thread is currently in this section of code)
    		mov		rbx, 1					;if it is 0 we want to change it to 1
    		mov		rcx, 0
    lock	cmpxchg8b	_gm					;lock and fire the instruction
    		jnz		StartBottom				;if the zero flag is not set then another thread is allready in this section of code, so exit this part and wait till it is finished
    		mov		rcx, _allocreqsize		;move the amount to request for the memptr entry into rcx
    		call	malloc					;get the memory from the syste,
    		cmp		rax, 0					;make sure there wasnt an error
    		je		MallocError				;if there was jump to the error routine
    		mov		r13, _memptr			;move the memptr array into r13
    		mov		rdi, qword ptr[r13]		;move the array count (element 0) into rdi
    		inc		rdi						;increment the element count
    		mov		qword ptr[r13], rdi		;store the new count in element 0
    		mov		qword ptr[r13+rdi*8], rax	;store the pointer we just requested in the new position
    		mov		rcx, _tablereqsize		;move the size of the memslot table into rcx
    		call	malloc					;request the memory from the system
    		cmp		rax, 0					;make sure there wasnt an error filling the request
    		je		MallocError				;if there was jump to the error routine
    		mov		r13, _memslot			;move the memslot array into r13
    		mov		rdi, qword ptr[r13]		;move the array count into rdi
    		inc		rdi						;increment the array coutnt
    		mov		qword ptr[r13+rdi*8], rax	;store the memory we just requested in the array
    		mov		qword ptr[r13], rdi		;store the new count at element 0
    		mov		r13, rax				;move the pointer to the new table into r13
    		mov		rdi, rax				;move the pointer into rdi
    		mov		rcx, _numberofallocs	;move the number of allocations (table size) into rcx
    		mov		rax, 0					;move zero into rax (this is what will be stored in all the elements in the new memslot table since none of the blocks have been assigned yet)
    		pushf							;save the flags register
    		cld								;clear the direction flag
    rep		stosq							;store 0 in all the elements of the new table
    		popf							;restore the flags register
    		mov		_gm, 0					;clear the _gm variable
    		xor		rcx, rcx				;clear the count
    		mov		rdi, 1					;point at the first block
    		mov		rsi, 1					;set rsi to point to the first block
    		jmp		SearchLoop				;jump to the head of the search loop
    
    StartBottom:
    
    		;we didnt get exclusive access to that part of the routine, we know there are not enough blocks in any of the other sets for us, so we wait till
    		;the other thread is done creating and initializing everything, at which point we point to the new set, set rdi and rsi to 1 and rcx to 0 then jump to the search loop
    
    		cmp		_gm, 1					;see if the thread that is setting up the new tables is still working on it
    		je		StartBottom				;if it is jump to the head of this loop again
    		mov		r13, _memslot			;load the memslot array into r13
    		mov		rdi, qword ptr[r13]		;get the index of the last table (we woudnt be here unless we allready searched all the other tables)
    		mov		r13, qword ptr[r13+rdi*8]	;load the new table into r13
    		mov		rdi, 1					;point rdi to the first block
    		mov		rsi, 1					;set rsi to point to the first block
    		xor		rcx, rcx				;clear the counter
    		jmp		SearchLoop				;jump to the head of the search loop
    	
    		EXITFUNC _zero, _one, _zero, _one, _one, _one, _zero, _zero, _zero, _zero, _zero, _one, _zero, _zero, _zero
    
    		cmp		_var9, 187679
    		jne		ItsGood1
    		cmp		r11, _var8
    		je		ItsGood1
    		mov		rax, rax
    
    ItsGood1:
    
    		RET
    	
    MallocError:
    
    		mov		rax, rax					
    
    	GetMem64 ENDP

  10. #10
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    sorry took 3 posts to get everything in:


    Now there is a bit of extra code in there trying to debug this problem, hopefully its commented enough to understand a few things though:

    1. EnterFunc/ExitFunc these are macros that set up each function that needs it, it save whatever registers it is told to (the _zero/_one after the name tells it which to save the order is rax, rbx, rcx, rdx, rdi, rsi, r8, r9, r10, r11, r12, r13, r14, r15, rsp). These macros also set up/clean up the stack frame. the enterfunc macro also reserves room for a bunch of temp variables if needed.

    2. The variables that you dont see declared at the beggining of the function are text equates that point into the stack that was set up by the enterfunc/exitfunc macros.

    3. you can tell where I have debugging code inserted, I mark those sections with extra blank spaces so that its easy to clean up.

    4. In the APow64 function, immediately before and after the call to GetMem64 youll see me saving the registers and restoring them, this appears to work, if I comment them out then when it does throw an exception it will throw it at the very first instruction after the call GetMem64 that tries to use r11 as a pointer. When this happens, the value that it shows in r11 is Allways the same, I havent restarted my computer since last night, I was trying to get it to throw the exception when I saved the volatiles before the call, it didnt.

    5. You can see the debugging code in GetMem64 that I use to check the value in r11 before it exits. This alg normally takes only one argument (the number of bytes being requested), and normally youd never see a reference to r11 in this algorithm, the only reason you see it now is so I can check it. I do not check r11 when it first enters the alg, I just check to make sure r11 still equals the value passed in (from APow64) to check it against.

    6. I just looked at the way the code looks in here, it a lot easier to read in visual studio, sorry.
    Last edited by AKRichard; June 21st, 2013 at 11:29 PM.

  11. #11
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx

    I keep coming back to this article describing register usage and looking at r10/r11
    R10:R11
    Volatile
    Must be preserved as needed by caller; used in syscall/sysret instructions
    r10 and r11 are the only (integer) registers marked as must be saved by the caller. As I mentioned before if I change r11 to something like r15 int the calling function the problem disappears. Normally, I would just make the changes and move on, but if there is a problem in the alg I would like to find it and fix it. The wording in the referenced article has me wondering though:

    Volatile registers are scratch registers presumed by the caller to be destroyed across a call
    the "across a call" wording sounds like they are trying to convey that those register may be destroyed by more then the called function, in which case restoring them before returning from the called function may not allways work (which goes hand in hand with the must be saved by caller statement in same article), the problem is I can find nothing on the internet to confirm or deny my thinking.

    Any ideas?

  12. #12
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: Question about volatile registers

    I don't think I can really add anything substantial to the quite competent comments already posted by OReubens, except perhaps this:

    Quote Originally Posted by AKRichard View Post
    1. EnterFunc/ExitFunc these are macros that set up each function that needs it, it save whatever registers it is told to (the _zero/_one after the name tells it which to save the order is rax, rbx, rcx, rdx, rdi, rsi, r8, r9, r10, r11, r12, r13, r14, r15, rsp). These macros also set up/clean up the stack frame. the enterfunc macro also reserves room for a bunch of temp variables if needed.
    Already when reading http://forums.codeguru.com/showthrea...57#post2115357, I suspected some possible connection between macros/text equates and what looked like stack corruption (or mis-addressing data stored on the stack). Back then I abandoned that idea because I couldn't imagine how plain text replacement could cause something like that, but now I think it may be worth checking this direction. Be it just to confirm the macros are not to blame.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  13. #13
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: Question about volatile registers

    Quote Originally Posted by AKRichard View Post
    I dont remember what the mp was, it had the mp and a little bit of memory all on one chip, but it was part of a kit that had a piece of bread board, 2 of the 8 segment leds, a couple of slide switches and a couple of push button switches, you programmed the thing by punching in the op codes and hitting a button. Pretty cool kit at the time actually, if I remember right it only worked on 8 bits and I dont think it even had a div instruction. But I spent hours at that stupid thing wiring in seperate address decoders and different timing circuits. Today you can get a kit at radio shack that would blow it away, but at the time I was only 14 and was still a few months away from getting my first tandy. I still have the tandy by the way, one of the first color computers, it had a whole 8 colors and 64k memory woohooo lol.
    I never had one myself, but I know these single-board gadgets with binary direct-to-memory entry (already hex was luxury back then ) and 7-seg display. With interest I followed a series of magazine articles about a system one had to assemble themselves (best thing available was a kit IIRC) based on the SC/MP CPU. (In fact that was in that very magazine referred to at the end of the Wikipedia article.)

    My first computer (not counting programmable calculators) also was Tandy. But it was a Model I Level II with no color at all, instead "true" black and wite, i.e. bi-level, and the entire screen's "graphics" resolution was less than today's highest resolution Windows application icons.

    However, I always thought any CPU would have a NOP instruction. After all that was particularly useful with that sort of hackish direct-to-core code editing/debugging of these days.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  14. #14
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    Already when reading http://forums.codeguru.com/showthrea...57#post2115357, I suspected some possible connection between macros/text equates and what looked like stack corruption (or mis-addressing data stored on the stack).
    Funny you should mention that post, I just reread it the other day. As this project has evolved every once in a while Ill see a way to reduce its complexity, code an idea or two and test it out, if it works as thought, and add clarity to the code, then Ill implement it. The macros mentioned solved a few problems, configuring the stack correctly, and making sure every push had a corresponding pop (in the version before the macros I used pushes and pops all over the place which was hard to follow sometimes).

    The macros (appear) to be running correctly, I didnt run into this problem until I tackled redesigning my memory management routine. I am finding out that everything runs stably until the management routine needs more memory from the system, every single time this problem pops up, it does so after it grabs more memory from the system (thats the CreateNewSet section of GetMem), and thats why I come here, I think I just figure it out. Ill try it and let you know.

  15. #15
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: Question about volatile registers

    So, in the middle of typing that last reply, when I got done typing that last bit about how it doent throw an exception till it requests more memory when I remembered something OReubens said:

    do note that r11 is not guaranteed preserved after a windows API call. so if your actual code calls windows API functions, then that may be your cause. Another possible error is as said the stack getting overwritten.
    The GetMem routine never uses r11, so I didnt have the macro saving and restoring r11. I have very few functions that call any of the windows api's directly. Anyways, everything clicked when I was typing, when it creates a new set it makes 2 calls into malloc, r11 wasnt being saved since it wasnt used in the called function, the reason it only did it in multi threaded mode was because in single threaded mode it doesnt run out of memory, what it grabs during initialization is usually enough. The only part that I havent figured out yet is I can set a global variable (_usemalloc) to 1 and the GetMem function will call malloc every request it recieves without ever throwing an exception. The only thing I can figure is when I have it set that way the requests are relatively small (usually less than a few hundred bytes), but when my management routine makes a request, it asks for about 4.5 megs all at once so maybe malloc is calling into some other routine that makes a syscall/sysret.

    So, I set the macro in the GetMem routine to save and restore r11, now it doesnt throw an exception, though it still messes up the function that called GetMem on the thread that created the new set (messes it up as in it throws an incorrect answer), though everything goes back to running smoothly right after that. My memory management routine is now running almost twice as fast as it was before I redid it.

    Thanks guys

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured