CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6
  1. #1
    Join Date
    Jun 2009
    Posts
    100

    Echo recreation in assembly review.

    I am trying to win a bet with a friend of mine on whos "echo" recreation is more efficient. Executable size also matters but pure raw speed is what is important.

    This really isn't a "help me" kind of question but more of a "any ideas for improvement" kind of question.

    He is making his echo recreation in C and I am making mine in 64-bit assembly using the nasm assembler. We both wanted to know if the GCC compiler makes programming in assembly pointless.

    The code compiles into a "1.9 KB" executable and uses a total of roughly "2 KB" of RAM. That's pretty darn small to me. I did reference some C libraries however I made sure with him that doing so would be acceptable.

    Code:
    extern strcat
    extern puts
    
    segment .text
    	global main
    
    main:
    	;Set up stack
    	push r12
    	push rbp
    	mov	rbp, rsi
    	push rbx
    	mov	ebx, edi
    	sub	rsp, 48
    	
    	;If argc == 1, no arguments
    	cmp	edi, 1
    	je .done
    
    	;Else continue
    .start:
    	lea	rdi, [rsp+16]
    	mov	ecx, 8
    	mov esi, 0
    	mov	[rsp+8], esi
    	mov r12d, 0
    	
    	jmp	.print
    	
    .loop:
    	mov	rsi, [rbp+0+r12*8]
    	lea	rdi, [rsp+8]
    	call strcat
    	
    	lea	rdi, [rsp+8]
    	mov	esi, space
    	call strcat
    
    .print:
    	inc	r12
    	cmp	ebx, r12d
    	jg	.loop
    	
    	;Print out result
    	lea	rdi, [rsp+8]
    	call puts
    	
    .done:
    	;End program
    	mov	eax,1		
    	mov	ebx,0		
    	int	80h	
    	ret
    	
    section .data
    	space db " ", 0

  2. #2
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: Echo recreation in assembly review.

    For really efficient code I'd strongly recommend against using strcat. Each time you call it, it will re-determine the length of the destination string by counting its characters starting from the beginning; pretty much a waste of time. Better maintain a pointer to the end of the destination string and then copy the source string there directly.

    You may use strcpy for that copy operation, but then you'd need to call strlen afterwards to determine the new location for the end-of-string pointer to point to. When you implement the copy operation yourself instead, you'll automatically have the updated end-of-string pointer in your destination pointer after the loop has finished. That way you wouldn't just, at least, eliminate the overhead of two function calls per string to be processed, but also the entire internal runtime of strlen.

    BTW, of course the same optimizations may be applied to your friend's C implementation as well. And if the compiler is set up to handle the string functions intrinsically and is really smart in optimization, it may even do the whole thing for him behind the scenes automatically. I can't check that, though, because I don't have GCC.

    BTW2: Of course you owe me a beer when you win the bet...
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  3. #3
    Join Date
    Jun 2009
    Posts
    100

    Re: Echo recreation in assembly review.

    Quote Originally Posted by Eri523 View Post
    For really efficient code I'd strongly recommend against using strcat. Each time you call it, it will re-determine the length of the destination string by counting its characters starting from the beginning; pretty much a waste of time. Better maintain a pointer to the end of the destination string and then copy the source string there directly.

    You may use strcpy for that copy operation, but then you'd need to call strlen afterwards to determine the new location for the end-of-string pointer to point to. When you implement the copy operation yourself instead, you'll automatically have the updated end-of-string pointer in your destination pointer after the loop has finished. That way you wouldn't just, at least, eliminate the overhead of two function calls per string to be processed, but also the entire internal runtime of strlen.

    BTW, of course the same optimizations may be applied to your friend's C implementation as well. And if the compiler is set up to handle the string functions intrinsically and is really smart in optimization, it may even do the whole thing for him behind the scenes automatically. I can't check that, though, because I don't have GCC.

    BTW2: Of course you owe me a beer when you win the bet...
    Do you really think I am going to win? :-D

    My original plan was to just concatenate the string to a buffer defined in ".bss" however I couldn't seem to wrap my head around exactly how to do such a thing without doing the exact thing strcat does. Any ideas?

    Updated code:

    Code:
    extern strcat	
    extern puts		
    
    segment .text
    	global main 
    
    main:
    	;Set up stack
    	mov	rbp, rsi
    	mov	ebx, edi
    	
    	;If argc == 1, no arguments
    	cmp	edi, 1		
    	je .done	;Jump if equal
    
    	;Else continue
    .start:
    	lea	rdi, [rsp+16]	
    	xor r12d, r12d		
    	mov	[rsp+8], r12d	
    	
    	jmp	.print
    	
    .loop:
    	mov	rsi, [rbp+0+r12*8]
    	lea	rdi, [rsp+8]	
    	call strcat
    	
    	lea	rdi, [rsp+8]	
    	mov	rsi, space	
    	call strcat
    
    .print:
    	inc	r12			;Increase I by one
    	
    	;If ebx > 0 goto .loop
    	cmp	ebx, r12d	
    	jg	.loop		
    	
    	;Print out result
    	;puts(rdi)
    	lea	rdi, [rsp+8]	
    	call puts
    	
    .done:
    	;This prevents any cross executable interference.
    	;In "normal" cases Linux will clear the registers after a program exits
    	;however since GCC feels the need to clear them, I should too.
    	xor ebx, ebx
    	xor rdi, rdi
    	xor r12d, r12d
    	xor rsp, rsp
    	xor r12, r12
    	xor rsi, rsi
    	xor rbp, rbp
    	xor rax, rax
    	xor rbx, rbx
    	xor rcx, rcx
    	xor rdx, rdx
    
    	;Uses the standard linux syscalls for exit
    	mov	eax,1		
    	mov	ebx,0		
    	int	80h	
    	ret
    	
    section .data
    	space db " ", 0

  4. #4
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: Echo recreation in assembly review.

    Quote Originally Posted by David2010 View Post
    My original plan was to just concatenate the string to a buffer defined in ".bss" however I couldn't seem to wrap my head around exactly how to do such a thing without doing the exact thing strcat does. Any ideas?
    Unless that has changed within the last 13 or so years while I have been doing assembly rather rarely, or has always been different under *nix, the BSS isn't meant for working data. It rather is the equivalent of static const data in C++ (and usually reserved for comparatively large objects). What improvement did you expect from doing your string operations there?

    Updated code:

    [...]
    I don't really see any improvement in there: You're still wastefully calling strcat twice for every string you process.

    Also...

    Code:
    .done:
    	;This prevents any cross executable interference.
    	;In "normal" cases Linux will clear the registers after a program exits
    	;however since GCC feels the need to clear them, I should too.
    
    	; ...
    Unless the OS (or the rules of the bet ) mandates that, why bother? (Doesn't make a significant difference performance-wise anyway, though.)
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

  5. #5
    Join Date
    Jun 2009
    Posts
    100

    Re: Echo recreation in assembly review.

    Quote Originally Posted by Eri523 View Post
    Unless that has changed within the last 13 or so years while I have been doing assembly rather rarely, or has always been different under *nix, the BSS isn't meant for working data. It rather is the equivalent of static const data in C++ (and usually reserved for comparatively large objects). What improvement did you expect from doing your string operations there?

    I don't really see any improvement in there: You're still wastefully calling strcat twice for every string you process.

    Unless the OS (or the rules of the bet ) mandates that, why bother? (Doesn't make a significant difference performance-wise anyway, though.)
    The improvements were more in terms of readability in this case. I made a few minor changes to help performance but I doubt it would shave off any more than a couple milliseconds.

    Using the nasm assembler ".bss" is used for variables, ".text" for code, and ".data" for static variables. A little strange I know but... whatever.

    "My original plan was to just concatenate the string to a buffer defined in ".bss" "

    What I meant was having my string variable defined in ".bss". Sorry for any confusion there.

    I noted that on Arch Linux 64-bit if the program segfaults, the registers are never wiped. This means that during normal program termination, the OS would automatically clear the registers before running another program.

    That said you might assume that I would just clear the registers during the beginning of the program however the OS also sets values to the registers before executing a program so clearing them "could" segfault the running program. That and I would be using the registers anyhow so it kinda defeats the purpose.

    I don't know why it has to be that way but the same thing happens on several other linux distros so it must be a global problem.

    It should never segfault because of how simplistic of a program it is but.... who knows.

    It may seem a little... excessive like the way many game programmer's code saves several times "just in case".
    Last edited by David2010; November 21st, 2011 at 07:09 PM.

  6. #6
    Join Date
    Jun 2010
    Location
    Germany
    Posts
    2,675

    Re: Echo recreation in assembly review.

    Quote Originally Posted by David2010 View Post
    [...] I made a few minor changes to help performance but I doubt it would shave off any more than a couple milliseconds.
    Milliseconds are almost eons from the POV of nowadays' CPUs... Even on the TRS-80, the first "half-way serious" computer I programmed on more than 30 years ago, a single millisecond was equivalent to 1780 clock ticks.

    And that brings me to the following question: How are you going to reliably measure the performance of two programs performing such a short task at all?

    Using the nasm assembler ".bss" is used for variables, [...] and ".data" for static variables. A little strange I know but... whatever.
    Now you made me question my memory about the nature of the BSS, so I looked it up. I hope Wikipedia is neutral enough to be agreed about between a *nix and a Windows developer : http://en.wikipedia.org/wiki/.bss. In essence: I was right about the static while you were right about it being used for working data. The fact that it's uninitialized (except for initially being zeroed out) makes it pretty pointless to make it const.

    I noted that on Arch Linux 64-bit if the program segfaults, the registers are never wiped. This means that during normal program termination, the OS would automatically clear the registers before running another program.

    That said you might assume that I would just clear the registers during the beginning of the program however the OS also sets values to the registers before executing a program so clearing them "could" segfault the running program. That and I would be using the registers anyhow so it kinda defeats the purpose.

    I don't know why it has to be that way but the same thing happens on several other linux distros so it must be a global problem.
    I don't think anyone, including the OS, cares about the register contents at program termination, except perhaps the register that is supposed to hold the exit code to be returned by the program and, in case the OS intends to continue working with them without restoration/reinitialization, the stack pointer and segment registers (in case anything like the latter does exist in x64 at all, what I don't know)? Perhaps, if this is realy a concern for you, you should push all the registers you're going to use in your program onto the stack when you start and pop them back right prior to termination. That way, I think, you'd ultimately let do the OS what is the OS' job.

    It should never segfault because of how simplistic of a program it is but.... who knows.
    Your program uses pointers (as does practically any assembly language program, BTW), so there always is a chance for a segfault. However, of course, you should strive for your program not doing that.

    It may seem a little... excessive like the way many game programmer's code saves several times "just in case".
    Well, ain't bets of that sort somewhat excessive by nature? About 25 years ago, I wrote 11 printed pages of assembly language for a prime number program bet. I used a quite complex variation of the Eratosthenes' Sieve algorithm because the space required for the sieve array to acommodate to the calculation range mandated by the rules of the bet would have even exhausted the hard disks of these days, and OTOH all other algorithms I knew looked too inefficient to me. Eventually, the bet ended in a draw because the other guy's program never reached a runnable state at all and mine bogusly considered a few numbers with relatively high prime factors (like 999991 - the bogus number, not a prime factor - it's 17 * 59 * 997) to be prime for a reason I never found out.
    I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.

    This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured