[RESOLVED] A little advice on a memory management routine would be appreciated
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8

Thread: [RESOLVED] A little advice on a memory management routine would be appreciated

  1. #1
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    [RESOLVED] A little advice on a memory management routine would be appreciated

    Hello all,

    I have a program that handles large numbers held in arrays of qwords. All of the math stuff is implemented in assembly language. All of the routines request memory for at least the return value, and some of them request memory for temporary values as well. I found that my calls to malloc were slowing the alorithms down quite a bit, so I attempted to write my own memory management routine.

    The requirements for my allocation method were: 1. I dont know in advance how much memory will be requested. 2. I do not know for how long an allocation will be used before it is returned to the pool (the return values can live indefinately). I know this leaves me with the potential for fragmentation and I am allready thinking of how to handle that though it has not been an issue as of yet.

    My problem is that in a single threaded program it works great, and gave me a better boost in speed then I thought it would, however, in a multi-threaded app it really bogs down. I was wondering if yall might be able to guide me as to how to improve it, or what I should be doing different.

    The first code listing gives just enough to show where the bottleneck is, the second gives the entire routine with an explanation of how I pictured it working. This is my first attempt at memory management so please dont laugh to hard. Any feedback would be appreciated.

    I know my bottleneck is at the lock cmp xchg, but as far as I can see only one thread at a time is allowed in that section of code, else I end up with multiple threads assigning themselves the same blocks of memory (as I found out the hard way).

    Code:
    LOCAL _size:qword
    		LOCAL _othersize:qword
    		LOCAL _firstpass:qword
    		LOCAL _startingset:qword
    		LOCAL _rollset
    
    		push	rbx
    		push	rdx
    		push	rdi
    		push	rsi
    		push	r8
    		push	r9
    		push	r10
    		push	r11
    		push	r12
    		push	r13
    		push	r14
    		push	r15
    		cmp		Initialized, 0
    		jne		AllreadyInitialized
    		call	Initialize
    		mov		Initialized, 1
    
    AllreadyInitialized:
    
    		cmp		_usemalloc, 0				;use malloc or my algorithm
    		je		NotMalloc
    		sub		rsp, 28h
    		clc
    		call	malloc
    		add		rsp, 28h
    		pop		r15
    		pop		r14
    		pop		r13
    		pop		r12
    		pop		r11
    		pop		r10
    		pop		r9
    		pop		r8
    		pop		rsi
    		pop		rdi
    		pop		rdx
    		pop		rbx
    
    		RET
    
    NotMalloc:
    
    		mov		_firstpass, 1
    		mov		_rollset, 1
    		add		rcx, 16
    		mov		_size, rcx		
    		mov		rax, rcx					; mov amount req to rax
    		xor		rdx, rdx					;prep for div
    		div		_blocksize					;div by block size
    		cmp		rdx, 0						;if there is a remainder
    		je		NoInc
    		inc		rax							;inc rax by one
    
    NoInc:
    
    		push	rax
    
    TryAgain:
    
    		mov		rax, 0					;have to prep the compare and exchange each loop
    		mov		rdx, 0
    		mov		rbx, 1
    		mov		rcx, 0
    		lock	cmpxchg8b	_gm
    		jnz		TryAgain
    		inc		_memcnt					;get rid of the following instructions in the working module
    		cmp		_memcnt, 1
    		je		DontWorry
    		mov		rax, rax
    
    DontWorry:
    before any algorithm runs, there is an initialization routine that uses malloc to request memory (4194304 bytes, thats 65536 sets of 64 byte blocks) and initializes the memory. The address of the first byte is then stored in an array of pointers, I used an array of pointers so I could grow my pool on the fly.

    So, I have a variable that I can change in the data section so that is skips my routine and just uses malloc, this is the very first check my routine does.

    After that, it takes the amount of memory requested and adds 16 to it (this is for management info and is used to make assigning and returning the memory faster the layout will be explained below). This new number is then divided by 64 (that is the size of blocks that I chose to allocate) and if there is a remainder this is inc by one. This tells me how many blocks to look for.

    I have 2 other variables that are then loaded the first is an index into the array of pointers, this is the current set I am assigning from. The second variable points to the next block after the last assignment I made. This block that it points to is then checked for zero (zero indicates it is unused), if it is zero it then checks how many contiguous blocks it has free, if there are enough to handle the request it marks the first few bytes, and the last 2 bytes, it updates the next free block with how many free blocks are left, and the address is modified to point to the first byte after the management bytes and is returned to the calling algorithm.

    If the first byte is not zero, it checks how many blocks in this set are in use, adds this to the base address, and checks the block following this set to see if it is free. If there are not enough free blocks to handle the request, it starts at block zero and checks for any set of free blocks that are big enough for the request. If none are found, it inc the index into the array of pointers and checks the next set. If there are no more sets, then it makes a call to malloc to get another set and marks the blocks.

    The management blocks are marked as follows:

    ;A used block will have the following pattern:
    ; high bit of byte zero will be set if it is asigned from the experimental unit, will be cleared otherwise
    ; bytes 0 and 1 will hold a word value of the memory set being used (which index into the memptr array)
    ; bytes 2 and 3 will hold a word value of the starting block
    ; bytes 4 and 5 will hold a word value of the number of blocks being used
    ; bytes (starting address +((starting block + number of blocks used) * blocksize)) -1 and -2 will hold a word value of the number of blocks being used

    ;an unused block will have the following pattern
    ; bytes 0 through 7 will be zerod out
    ; bytes 8 through 15 holds a qword value of the number of contiguous blocks unused
    ; bytes (starting address + ((starting block + numberof blocks) * blocksize) -1 and -2 will hold a word value of the number of contiguous blocks left


    ;when a memory request comes in 16 is added to the request right off the bat, this number is divided by the blocksize and if rdx is not zero the result is incremented one
    ;the current set is loaded into a register, and the current block is loaded into another register
    ;a check is made to make sure the first qword is zero and if it is the second qword is compared to the amount requested
    ;if the qword is equal to or larger then the first 6 bytes are set according to the pattern above
    ;8 is added to the beggining of the first block and this will be what is returned
    ;next the requested number of blocks is subtracted from the value held in the second qword above
    ;the number of contiguous blocks is checked to make sure it is not zero
    ;if it is not then the first qword after the blocks being returned is zerod
    ;and the second qword has the number of contiguous blocks left moved into it
    ;bytes (starting address + ((starting block + numberof blocks) * blocksize) -1 and -2 will hold a word value of the number of contiguous blocks left



    ;for the freeing of the memory
    ;8 will be subtracted from the starting address
    ;the word at bytes 0 and 1 will be moved into a register, memptr will be moved into another register, and the base address will be pulled from them
    ;the word at bytes 2 and 3 will be moved into a register and the word at bytes 4 and 5 will be moved into another register
    ;the first qword will be zerod, and the second qword will recieve the qword representation of the word value pulled from bytes 4 and 5
    ;the last byte address will be calculated and a word stored at bytes -1 and -2 will hold the word value pulled fom bytes 4 and 5
    ;next, the starting addrress + (number of blocks * blocksize) -2 will be moved into a word value
    ;this will be subtracted from the base address of the blocks being freed
    ;if the first qword at this address is zero then the word value at bytes 4 and 5 will be added to the qword value in the second spot
    ;that same value will be stored at the end of the blocks being freed as a word value
    ;next the first qword after the end of the blocks being freed is checked for zero
    ;if it is then the second qword here is added to the second qword at the beggining of the set (this will either be the second qword of the blocks being freed or the second qword from the new beggining)
    ;this will also be stored as a word value at the end of this upper set -1 -2

    Just reading over what I wrote sounds complicated, but it works and it is faster than malloc, but Id like to make it better.

    Thanks in advance.


    Code:
    GetMem64	PROC	tsize:qword
    
    		LOCAL _size:qword
    		LOCAL _othersize:qword
    		LOCAL _firstpass:qword
    		LOCAL _startingset:qword
    		LOCAL _rollset
    
    		push	rbx
    		push	rdx
    		push	rdi
    		push	rsi
    		push	r8
    		push	r9
    		push	r10
    		push	r11
    		push	r12
    		push	r13
    		push	r14
    		push	r15
    		cmp		Initialized, 0
    		jne		AllreadyInitialized
    		call	Initialize
    		mov		Initialized, 1
    
    AllreadyInitialized:
    
    		cmp		_usemalloc, 0				;use malloc or my algorithm
    		je		NotMalloc
    		sub		rsp, 28h
    		clc
    		call	malloc
    		add		rsp, 28h
    		pop		r15
    		pop		r14
    		pop		r13
    		pop		r12
    		pop		r11
    		pop		r10
    		pop		r9
    		pop		r8
    		pop		rsi
    		pop		rdi
    		pop		rdx
    		pop		rbx
    
    		RET
    
    NotMalloc:
    
    		mov		_firstpass, 1
    		mov		_rollset, 1
    		add		rcx, 16
    		mov		_size, rcx		
    		mov		rax, rcx					; mov amount req to rax
    		xor		rdx, rdx					;prep for div
    		div		_blocksize					;div by block size
    		cmp		rdx, 0						;if there is a remainder
    		je		NoInc
    		inc		rax							;inc rax by one
    
    NoInc:
    
    		push	rax
    
    TryAgain:
    
    		mov		rax, 0					;have to prep the compare and exchange each loop
    		mov		rdx, 0
    		mov		rbx, 1
    		mov		rcx, 0
    		lock	cmpxchg8b	_gm
    		jnz		TryAgain
    		inc		_memcnt					;get rid of the following instructions in the working module
    		cmp		_memcnt, 1
    		je		DontWorry
    		mov		rax, rax
    
    DontWorry:
    
    		mov		rdi, _cset
    		mov		_startingset, rdi
    		mov		r10, _memptr
    		mov		r10, qword ptr[r10+rdi*8]
    		mov		rdi, _cblock
    		xor		rsi, rsi
    		pop		rbx
    
    CheckLoop:
    
    		cmp		rdi, _setsize
    		jnb		Next
    		mov		rax, rdi
    		mul		_blocksize
    		add		rax, r10
    		cmp		qword ptr[rax], 0
    		ja		CheckNext
    		inc		rsi
    		cmp		rsi, rbx
    		jnb		FoundIt 
    		inc		rdi
    		jmp		CheckLoop
    
    CheckNext:
    
    		xor		rsi, rsi
    		add		rax, 4
    		xor		rdx, rdx
    		mov		dx,word ptr[rax]
    		add		rdi, rdx
    		inc		rdi
    		mov		_cblock, rdi
    		jmp		CheckLoop
    
    FoundIt:
    
    		mov		rcx, rax
    		mov		rdx, _cset
    		mov		word ptr[rax], dx			;save the memory set being used
    		or		word ptr[rax], 2147483648   ; set the high bit to mark it as assigned from here
    		inc		rax
    		inc		rax
    		mov		rdx, _cblock
    		mov		word ptr[rax], dx			;save the starting block being used
    		inc		rax
    		inc		rax
    		mov		word ptr[rax], bx			;save the number of blocks being used
    		mov		rax, rcx
    		add		rax, 16
    		inc		rdi
    		mov		_cblock, rdi
    		mov		_gm, 0
    		pop		r15
    		pop		r14
    		pop		r13
    		pop		r12
    		pop		r11
    		pop		r10
    		pop		r9
    		pop		r8
    		pop		rsi
    		pop		rdi
    		pop		rdx
    		pop		rbx
    
    		RET
    
    Next:
    
    		cmp		_firstpass, 1
    		jne		NextSet
    		mov		_firstpass, 0
    		xor		rdi, rdi
    		xor		rsi, rsi
    		mov		_cblock, 0
    		jmp		CheckLoop
    
    NextSet:
    
    		mov		r10, _memptr
    		mov		rax, _cset
    		cmp		qword ptr[r10], rax
    		je		AddAnotherSet
    		cmp		rax, _startingset
    		jne		NotYet
    		cmp		_rollset, 1
    		jne		GetMore
    
    NotYet:
    
    		inc		rax
    		mov		_cset, rax
    		xor		rsi, rsi
    		xor		rdi, rdi
    		mov		_cblock, rdi
    		mov		r10, _memptr
    		mov		r10, qword ptr[r10+rax*8]
    		jmp		CheckLoop					;continue checking
    
    AddAnotherSet:
    
    		cmp		_rollset, 1
    		jne		GetMore
    		mov		rax, _startingset
    		cmp		rax, _cset
    		je		GetMore
    		mov		_cset, 0
    		mov		_rollset, 0
    		xor		rsi, rsi
    		xor		rdi, rdi
    		mov		_cblock, rdi
    		mov		r10, _memptr
    		mov		r10, qword ptr[r10+rdi*8]
    		jmp		CheckLoop
    
    GetMore:
    
    		push	rbx							;save amt req
    		mov		rcx, _memreq				;mov 65536 dwords into rcx
    		sub		rsp, 28h					;adjust stack
    		clc
    		call	malloc						;call malloc
    		add		rsp, 28h					;readjust stack
    		mov		r10, _memptr				;move memslot ptr into r10
    		mov		rbx, qword ptr[r10]			;move current number of memslots into rbx
    		inc		rbx							;inc rbx
    		mov		_cset, rbx
    		mov		_cblock, 0
    		mov		qword ptr[r10], rbx			;move new num memslots into memslot
    		mov		qword ptr[r10+rbx*8], rax
    		xor		rdi, rdi
    
    ClearEmLoop:
    
    		mov		qword ptr[rax+rdi*8], 0
    		add		rdi, 8
    		cmp		rdi, _setsize
    		jna		ClearEmLoop
    		xor		rdi, rdi
    		xor		rsi, rsi
    		mov		r10, rax
    		pop		rbx
    		jmp		CheckLoop
    
    	GetMem64 ENDP

  2. #2
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,583

    Re: A little advice on a memory management routine would be appreciated

    So let me recapitulate your memory management design to ensure I didn't terribly misunderstand something. At the same time I'll suggest some candidate topics for discussion or for you to simply experiment with the respective ideas in your code.

    Do I understand you right that instead of a plain memory pool, you're using a "pool of memory pools" to avoid the need to resize the pool when more memory is needed, i.e. you simply allocate an entire new pool in this case? Looks like an elegant strategy but perhaps it doesn't gain as much as you expected. You probably know that, at least in 32-bit Windows, you can allocate address space without actually assigning real memory to it. I haven't ever dealt with this sort of low-level memory management myself, though. I somewhat would be pretty surprised if something like that wouldn't be possible under x64, and with plenty address space available there, it could be even more useful. So perhaps you could generously pre-allocate address space for your memory pool and only allocate (and perhaps deallocate) actual virtual memory to blocks on demand. So you can delegate much of the management you're doing yourself now to the OS, giving it the opportunity for quite some optimzations. Keep in mind that your app is already dealing with virtual addresses anyway, and the actual backing physical memory blocks dont't even need to be conscutive just because they appear to be from the perspectie of your app. However, this proposed design does not harmonize very well with your current design of holding the memory block state descriptors (free/in use, size etc.) inside the memory blocks themselves (which pretty much reminds me to how the 90's MS C runtimes managed their heap, and AFAICT they didn't change much about that strategy in the meantime) because that requires you to keep virtual memory allocated to freed blocks just to store the sparsely scattered descriptors. So it may be worth considering to change that design to one using descriptor tables and have the payload memory blocks contain just that: payload. That way you may gain the opportunuty to release the virtual memory occupied by consecutive freed blocks while keeping the address space.

    You seem to perform consolidation of consecutive free blocks when you process a new memory space request if you find enough of these free blocks to satisfy the request. Doing the consolidation when freeing blocks instead may be advantageous, however: When you've freed a block, check whether the next block is free too, and in this case join the two blocks. And that's it. That way you'll never have any consecutive free blocks at all when outside the freeing routine, and at most two when inside, which you then will join immediately. Depending on how your program uses the memory pool, this may save lots of effort wasted by failed attempts to satisfy a memory request from a insufficiently sized sequence of consecutive free blocks.

    Finally, you mentioned that synchronization may be a bottleneck. In this case you may gain a lot by giving each thread its own memory pool instead of sharing one between all of them, eliminating the need for much of the synchronization. The effect this design change would have, of course, would heavily depend on how much memory block data exchange is going on between the individual threads. Also, using a single memory pool for all the threads may promote fragmentation of the pool, which, of course, is counter-productive as well.

    HTH
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  3. #3
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: A little advice on a memory management routine would be appreciated

    Quote Originally Posted by Eri523 View Post
    Do I understand you right that instead of a plain memory pool, you're using a "pool of memory pools" to avoid the need to resize the pool when more memory is needed, i.e. you simply allocate an entire new pool in this case? Looks like an elegant strategy but perhaps it doesn't gain as much as you expected. You probably know that, at least in 32-bit Windows, you can allocate address space without actually assigning real memory to it. I haven't ever dealt with this sort of low-level memory management myself, though. I somewhat would be pretty surprised if something like that wouldn't be possible under x64, and with plenty address space available there, it could be even more useful. So perhaps you could generously pre-allocate address space for your memory pool and only allocate (and perhaps deallocate) actual virtual memory to blocks on demand. So you can delegate much of the management you're doing yourself now to the OS, giving it the opportunity for quite some optimzations. Keep in mind that your app is already dealing with virtual addresses anyway, and the actual backing physical memory blocks dont't even need to be conscutive just because they appear to be from the perspectie of your app. However, this proposed design does not harmonize very well with your current design of holding the memory block state descriptors (free/in use, size etc.) inside the memory blocks themselves (which pretty much reminds me to how the 90's MS C runtimes managed their heap, and AFAICT they didn't change much about that strategy in the meantime) because that requires you to keep virtual memory allocated to freed blocks just to store the sparsely scattered descriptors. So it may be worth considering to change that design to one using descriptor tables and have the payload memory blocks contain just that: payload. That way you may gain the opportunuty to release the virtual memory occupied by consecutive freed blocks while keeping the address space.

    HTH
    A pool of pools, that describes it exactly. I added the ability to add more pools on the fly because I figured sooner or later the need would arise, however in practice, the routine has never needed to add another pool, even in a multi-threaded app, which makes sense, the largest numbers I deal with will occupy a few million bits, and numbers that large are rare (even in the decimal implementation of the library), most calls are only requesting one or two blocks so there are enough blocks for a few tens of thousands of requests.

    As for the virtual address space, I understand the concepts but really havent messed with it too much.

    Using descriptor tables is not only a possibility, I actually originally used a very simplified version of descriptor tables in the original implementation, it consisted of an array of pointers to arrays (to be used in conjuction with the array of pointers to the pools), each element in the table simply stored how many blocks had been requested, a zero meant the block was free. Using the management bytes in the blocks itself was an attempt to simplify returning the memory to the pool. Without those bytes I was stuck comparing the address of the block being returned to the base address of each pool in use to figure out which pool it belonged to, but since I have yet to see a second pool added it is just making it overly complicated, so ya I could easily convert it back.

    Quote Originally Posted by Eri523 View Post
    You seem to perform consolidation of consecutive free blocks when you process a new memory space request if you find enough of these free blocks to satisfy the request. Doing the consolidation when freeing blocks instead may be advantageous, however: When you've freed a block, check whether the next block is free too, and in this case join the two blocks. And that's it. That way you'll never have any consecutive free blocks at all when outside the freeing routine, and at most two when inside, which you then will join immediately. Depending on how your program uses the memory pool, this may save lots of effort wasted by failed attempts to satisfy a memory request from a insufficiently sized sequence of consecutive free blocks.
    It actually is handled in the free routine, what youre looking at is me over thinking the request routine. I was trying to make the routine handle the possibility that somehow the management bytes got screwed up (such as if one of the algorithms wrote into the management area by mistake, which happens sometimes when I am modifying an algorithm until I get it debugged). I wanted the routine to be able to handle that possibility instead of it crashing or getting spun in an endless loop or something stupid like that, looking at it though I think it just makes the routine more complicated.


    Quote Originally Posted by Eri523 View Post
    Finally, you mentioned that synchronization may be a bottleneck. In this case you may gain a lot by giving each thread its own memory pool instead of sharing one between all of them, eliminating the need for much of the synchronization. The effect this design change would have, of course, would heavily depend on how much memory block data exchange is going on between the individual threads. Also, using a single memory pool for all the threads may promote fragmentation of the pool, which, of course, is counter-productive as well.
    I had thought about this, but wasnt sure how to implement it. Right now its pretty much a static library being called through a native static library from a managed library usually through static methods within the managed library, I am not even sure how to implement it, but if I could implement it this would probably solve the bottleneck. It could be a problem though because I would need to be able to identify which thread it came from so I could make sure the correct thread handles the memory being returned to the pool. As it stands right now the managed library actually calls the memory request routine in the assembly code when a new instance is first created to hold the intial value (I only use native arrays throughout even in the managed code) This was easier then trying to keep track of if it was managed or native memory and how to return the memory. If I keep the management bytes with the blocks I guess I could insert something to identify what thread it came from.

    As for the fragmentation problem, Ive got a few ideas to try out but all of them would make my memory manager look something like the managed garbage collector , using pointers to pointers(handles) that way I just update the table of pointers during compaction instead of havign to update each instance individually. So far fragmentation hasnt been an issue, but I need to take care of it before it does become a problem.

    And I need to figure out how to do this without slowing it down lol.

  4. #4
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,583

    Re: A little advice on a memory management routine would be appreciated

    Quote Originally Posted by AKRichard View Post
    As for the virtual address space, I understand the concepts but really havent messed with it too much.
    Given what you described above that qoute, I don't think you wouldn't gain much here from explicitly controlling memory virtuality either, so you can put that pretty far down on the list of options to consider. This also eliminates my concernns issued about keeping the block descriptors as part of the pool memory blocks themselves.

    [...] I was trying to make the routine handle the possibility that somehow the management bytes got screwed up (such as if one of the algorithms wrote into the management area by mistake, which happens sometimes when I am modifying an algorithm until I get it debugged). I wanted the routine to be able to handle that possibility instead of it crashing or getting spun in an endless loop or something stupid like that, looking at it though I think it just makes the routine more complicated.
    Are you aware how the MS runtime handles that? They pad their heap blocks with a number of guard bytes at both ends that it fills with a certain value/pattern. Then they verify their integrity at strategic points, like allocating or freeing blocks. IIRC there also is a __heapcheck() or so function that can be called explicitly on demand. Once the actual block descriptors get corrupted, all bets are off anyway, but with the gaurd bytes at least you have the chance to detect upcoming corruption at an early stage and alert you. As this adds considerable extra overhead to memory management, MS only does that in debug builds. Of course there's also the possibility to detect corruption of the block descriptors by checksumming them or simply maintaining backup copies, but that would entail quite a lot of overhead.

    I had thought about [using a private memory pool for each thread], but wasnt sure how to implement it. Right now its pretty much a static library being called through a native static library from a managed library usually through static methods within the managed library, I am not even sure how to implement it, but if I could implement it this would probably solve the bottleneck. It could be a problem though because I would need to be able to identify which thread it came from so I could make sure the correct thread handles the memory being returned to the pool. [...]
    The issue of the correct association between a thread and its private memory pool is to simply have the thread object own the pool and make it just that: private. For that purpose, you wouldn't use plain .NET Thread objects, but rather your own class derived from Thread that owns the memory pool or a class bundling a Thread and a memory pool by composition. (Or, if I misunderstood your design and you're not using managed threads, the respective native equivalent.) That way you have full control over access to the pool, since any outside-world access would need to be "admitted" by your object. The safest way to handle this is not to allow the outside world to access the actual private memory pool at all and instead provide functions that physically copy blocks into and out of the pool. More elegant and much more performant, of course, is passing pointers to pool memory to outside the thread/pool object and allow access over that pointer. But of course that's also less safe: The thread/pool object must be able to rely on the "promise" of the outside code not to make use of the pointer beyond a certain point. Either way you may be able to dramatically reduce synchronization overhead, as well as the probability of sync-related bugs.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  5. #5
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: A little advice on a memory management routine would be appreciated

    Quote Originally Posted by Eri523 View Post
    The issue of the correct association between a thread and its private memory pool is to simply have the thread object own the pool and make it just that: private. For that purpose, you wouldn't use plain .NET Thread objects, but rather your own class derived from Thread that owns the memory pool or a class bundling a Thread and a memory pool by composition. (Or, if I misunderstood your design and you're not using managed threads, the respective native equivalent.) That way you have full control over access to the pool, since any outside-world access would need to be "admitted" by your object. The safest way to handle this is not to allow the outside world to access the actual private memory pool at all and instead provide functions that physically copy blocks into and out of the pool. More elegant and much more performant, of course, is passing pointers to pool memory to outside the thread/pool object and allow access over that pointer. But of course that's also less safe: The thread/pool object must be able to rely on the "promise" of the outside code not to make use of the pointer beyond a certain point. Either way you may be able to dramatically reduce synchronization overhead, as well as the probability of sync-related bugs.
    This is the heart of my problem that Im not sure which direction to go. Since the managed object actually uses this memory routine to obtain the memory it uses for the value Ive been worrying about what happens when in a multi-threaded app when a managed object gets transferred from one thread to another because at that point the managed object is using memory allocated from a diferent thread. I could request memory from the new thread, copy the object and release the old memory before the transfer (that sounds expensive time wise). I was thinking of using an array of pointers, the managed object would have an index into the array that contained the actual pointer to the memory being used, I thought this would make compaction of the memory easier to accomplish (whenever I get around to implementing it) since then all Id have to do is update the element in the array whenever the pointer changed instead of worrying about findin the actual object/objects using the pointer, and if I used this method, would it be possible to have a pointer the the freeing routine of the particular thread managing a particular array of pointers (say at element 0 so that when I do free the memory it would just look at that element for the correct place to go to free that memory), this would also alleviate the need of the management bytes. Isnt this basically what a handle is in a managed object? and how the garbage collector work?

    When I mentioned I wasnt sure how to implement it, I meant I am stuck on how to make the particular assembly code private to a thread, that part would be a lot easier if I could use inline assembly for it. The whole idea of creating my own memory manager was to speed up my algorithms (which it did), but it seems I am at a point now that every step I take to increase performance introduces a problem or completely breaks in certain situations. Should I be thinking along the lines of having differnt implementations for different configurations (one for single threaded use and one for multi-threaded use)?

  6. #6
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: A little advice on a memory management routine would be appreciated

    By the way, completely off of this subject.

    Youre a pretty smart cookie, Im having a wierd problem with variables in a program Im creating (a version of sudoku), such as the debugger claiming variables are out of scope when they obviously arent, the debugger returning the info to the wrong objects and things like that. I am sure its because of a setting or something I am doing wrong, but I run into these problems once in a while and have never figured it out. I usually get so mad that I either create a whole new solution and rewrite the program, or I go through and comment out entire sections of code and rewrite them.

    If I were to bundle up the project and post it on here would you take a look at it. The program itself is still in the early stages and there is not much to it. Its in managed c++ (visual studio 2010) 32 bit. I am curious if someone else would be able to reproduce the problem on their machine or not. If you have the time, just let me know and Ill zip it up and post it tommorrow.

    Thanks

  7. #7
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,583

    Re: A little advice on a memory management routine would be appreciated

    Quote Originally Posted by AKRichard View Post
    [...] Since the managed object actually uses this memory routine to obtain the memory it uses for the value Ive been worrying about what happens when in a multi-threaded app when a managed object gets transferred from one thread to another because at that point the managed object is using memory allocated from a diferent thread. [...]
    Actually, memory addressing shouldn't be a problem here at all, since all threads within a single process share the same address space. So, even when the association between managed and native threads changes, all addresses remain perfectly valid.

    What could (don't have any personal experience with that) be problematic, though, is calling native routines from managed code and then using native synchronization objects in there, since some of them are required to be released by the same thread that locked them. However, of course I don't know whether you're doing something like that in your code at all, or are planning to do in the future.

    [...] Isnt this basically what a handle is in a managed object? and how the garbage collector work?
    Errr, if you want the behavior of managed memory, then why don't you simply use some? More of a rhetorical question though, since, as I pointed out above, native memory handling should be unproblematic.

    When I mentioned I wasnt sure how to implement it, I meant I am stuck on how to make the particular assembly code private to a thread, that part would be a lot easier if I could use inline assembly for it.
    Code isn't ever owned by threads, just objects are, and even they aren't by laws of nature, they are because you wrote your code to act that way. Note that, when you have multiple threads handling objects of the same type, it's the same code executed by all the threads, just the data is distinct.

    I'm not quite sure about how thread-local storage works internally, but I'm confident it's no black magic either and, at least on a low level, obeys the same principles pointed out above.

    Quote Originally Posted by AKRichard View Post
    [...] Im having a wierd problem with variables in a program Im creating (a version of sudoku), such as the debugger claiming variables are out of scope when they obviously arent, the debugger returning the info to the wrong objects and things like that. [...]
    Didn't we already discuss that, or am I confusing things and actually had that discussion with another guy? The bottom line, however: I sometimes encounter this sort of problem in my own projects too, and never really found out what's the cause of it nor how to get rid of it. If I really need to know the state of objects that suffer from this problem, I simply add plain debug output code to the passages in question. This will always work as long as the object is in scope and sufficiently valid (i.e. not causing uncaught null-references or the like).

    If the workarounds you propose really do help, even just short-term, then the problem seems to be related to some sort of degeneration of debug information. I doubt the workarounds are really practically applicable in projects of significant size, though.

    So there's no gain in uploading your project: I couln't do more about that than you already did either...
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  8. #8
    Join Date
    Aug 2009
    Location
    Finally back in Alaska
    Posts
    141

    Re: A little advice on a memory management routine would be appreciated

    I probably did ask you about it before. I usually get pretty upset when I cannot figure out what the root cause of a problem is and ask about it in one of the forums. The forums havent helped me solve it yet lol, I wouldnt even worry about it, but I keep thinking it is something Im doing wrong because, surely Microsoft wouldnt let a problem like this go.

    Errr, if you want the behavior of managed memory, then why don't you simply use some?
    Im not looking for the behavior specifically, just seems like some of the ideas Ive had gravitate in that direction. I definately dont want the complexity of those systems, in fact I really have been trying to keep it simple. Youve givin me enough to help me decide which way Im going to go, now its off to the net for some reading then the fun part of the implementation.

    Thanks for the time to help figure out which direction to try next.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center