|
-
November 19th, 2011, 09:24 PM
#1
Echo recreation in assembly review.
I am trying to win a bet with a friend of mine on whos "echo" recreation is more efficient. Executable size also matters but pure raw speed is what is important.
This really isn't a "help me" kind of question but more of a "any ideas for improvement" kind of question.
He is making his echo recreation in C and I am making mine in 64-bit assembly using the nasm assembler. We both wanted to know if the GCC compiler makes programming in assembly pointless.
The code compiles into a "1.9 KB" executable and uses a total of roughly "2 KB" of RAM. That's pretty darn small to me. I did reference some C libraries however I made sure with him that doing so would be acceptable.
Code:
extern strcat
extern puts
segment .text
global main
main:
;Set up stack
push r12
push rbp
mov rbp, rsi
push rbx
mov ebx, edi
sub rsp, 48
;If argc == 1, no arguments
cmp edi, 1
je .done
;Else continue
.start:
lea rdi, [rsp+16]
mov ecx, 8
mov esi, 0
mov [rsp+8], esi
mov r12d, 0
jmp .print
.loop:
mov rsi, [rbp+0+r12*8]
lea rdi, [rsp+8]
call strcat
lea rdi, [rsp+8]
mov esi, space
call strcat
.print:
inc r12
cmp ebx, r12d
jg .loop
;Print out result
lea rdi, [rsp+8]
call puts
.done:
;End program
mov eax,1
mov ebx,0
int 80h
ret
section .data
space db " ", 0
-
November 20th, 2011, 06:46 AM
#2
Re: Echo recreation in assembly review.
For really efficient code I'd strongly recommend against using strcat. Each time you call it, it will re-determine the length of the destination string by counting its characters starting from the beginning; pretty much a waste of time. Better maintain a pointer to the end of the destination string and then copy the source string there directly.
You may use strcpy for that copy operation, but then you'd need to call strlen afterwards to determine the new location for the end-of-string pointer to point to. When you implement the copy operation yourself instead, you'll automatically have the updated end-of-string pointer in your destination pointer after the loop has finished. That way you wouldn't just, at least, eliminate the overhead of two function calls per string to be processed, but also the entire internal runtime of strlen.
BTW, of course the same optimizations may be applied to your friend's C implementation as well. And if the compiler is set up to handle the string functions intrinsically and is really smart in optimization, it may even do the whole thing for him behind the scenes automatically. I can't check that, though, because I don't have GCC.
BTW2: Of course you owe me a beer when you win the bet...
I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.
This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.
-
November 20th, 2011, 07:43 PM
#3
Re: Echo recreation in assembly review.
 Originally Posted by Eri523
For really efficient code I'd strongly recommend against using strcat. Each time you call it, it will re-determine the length of the destination string by counting its characters starting from the beginning; pretty much a waste of time. Better maintain a pointer to the end of the destination string and then copy the source string there directly.
You may use strcpy for that copy operation, but then you'd need to call strlen afterwards to determine the new location for the end-of-string pointer to point to. When you implement the copy operation yourself instead, you'll automatically have the updated end-of-string pointer in your destination pointer after the loop has finished. That way you wouldn't just, at least, eliminate the overhead of two function calls per string to be processed, but also the entire internal runtime of strlen.
BTW, of course the same optimizations may be applied to your friend's C implementation as well. And if the compiler is set up to handle the string functions intrinsically and is really smart in optimization, it may even do the whole thing for him behind the scenes automatically. I can't check that, though, because I don't have GCC.
BTW2: Of course you owe me a beer when you win the bet... 
Do you really think I am going to win? :-D
My original plan was to just concatenate the string to a buffer defined in ".bss" however I couldn't seem to wrap my head around exactly how to do such a thing without doing the exact thing strcat does. Any ideas?
Updated code:
Code:
extern strcat
extern puts
segment .text
global main
main:
;Set up stack
mov rbp, rsi
mov ebx, edi
;If argc == 1, no arguments
cmp edi, 1
je .done ;Jump if equal
;Else continue
.start:
lea rdi, [rsp+16]
xor r12d, r12d
mov [rsp+8], r12d
jmp .print
.loop:
mov rsi, [rbp+0+r12*8]
lea rdi, [rsp+8]
call strcat
lea rdi, [rsp+8]
mov rsi, space
call strcat
.print:
inc r12 ;Increase I by one
;If ebx > 0 goto .loop
cmp ebx, r12d
jg .loop
;Print out result
;puts(rdi)
lea rdi, [rsp+8]
call puts
.done:
;This prevents any cross executable interference.
;In "normal" cases Linux will clear the registers after a program exits
;however since GCC feels the need to clear them, I should too.
xor ebx, ebx
xor rdi, rdi
xor r12d, r12d
xor rsp, rsp
xor r12, r12
xor rsi, rsi
xor rbp, rbp
xor rax, rax
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
;Uses the standard linux syscalls for exit
mov eax,1
mov ebx,0
int 80h
ret
section .data
space db " ", 0
-
November 20th, 2011, 09:50 PM
#4
Re: Echo recreation in assembly review.
 Originally Posted by David2010
My original plan was to just concatenate the string to a buffer defined in ".bss" however I couldn't seem to wrap my head around exactly how to do such a thing without doing the exact thing strcat does. Any ideas?
Unless that has changed within the last 13 or so years while I have been doing assembly rather rarely, or has always been different under *nix, the BSS isn't meant for working data. It rather is the equivalent of static const data in C++ (and usually reserved for comparatively large objects). What improvement did you expect from doing your string operations there?
I don't really see any improvement in there: You're still wastefully calling strcat twice for every string you process. 
Also...
Code:
.done:
;This prevents any cross executable interference.
;In "normal" cases Linux will clear the registers after a program exits
;however since GCC feels the need to clear them, I should too.
; ...
Unless the OS (or the rules of the bet ) mandates that, why bother? (Doesn't make a significant difference performance-wise anyway, though.)
I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.
This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.
-
November 21st, 2011, 07:01 PM
#5
Re: Echo recreation in assembly review.
 Originally Posted by Eri523
Unless that has changed within the last 13 or so years while I have been doing assembly rather rarely, or has always been different under *nix, the BSS isn't meant for working data. It rather is the equivalent of static const data in C++ (and usually reserved for comparatively large objects). What improvement did you expect from doing your string operations there?
I don't really see any improvement in there: You're still wastefully calling strcat twice for every string you process.
Unless the OS (or the rules of the bet  ) mandates that, why bother? (Doesn't make a significant difference performance-wise anyway, though.)
The improvements were more in terms of readability in this case. I made a few minor changes to help performance but I doubt it would shave off any more than a couple milliseconds.
Using the nasm assembler ".bss" is used for variables, ".text" for code, and ".data" for static variables. A little strange I know but... whatever.
"My original plan was to just concatenate the string to a buffer defined in ".bss" "
What I meant was having my string variable defined in ".bss". Sorry for any confusion there.
I noted that on Arch Linux 64-bit if the program segfaults, the registers are never wiped. This means that during normal program termination, the OS would automatically clear the registers before running another program.
That said you might assume that I would just clear the registers during the beginning of the program however the OS also sets values to the registers before executing a program so clearing them "could" segfault the running program. That and I would be using the registers anyhow so it kinda defeats the purpose.
I don't know why it has to be that way but the same thing happens on several other linux distros so it must be a global problem.
It should never segfault because of how simplistic of a program it is but.... who knows.
It may seem a little... excessive like the way many game programmer's code saves several times "just in case".
Last edited by David2010; November 21st, 2011 at 07:09 PM.
-
November 21st, 2011, 08:14 PM
#6
Re: Echo recreation in assembly review.
 Originally Posted by David2010
[...] I made a few minor changes to help performance but I doubt it would shave off any more than a couple milliseconds.
Milliseconds are almost eons from the POV of nowadays' CPUs... Even on the TRS-80, the first "half-way serious" computer I programmed on more than 30 years ago, a single millisecond was equivalent to 1780 clock ticks.
And that brings me to the following question: How are you going to reliably measure the performance of two programs performing such a short task at all?
Using the nasm assembler ".bss" is used for variables, [...] and ".data" for static variables. A little strange I know but... whatever.
Now you made me question my memory about the nature of the BSS, so I looked it up. I hope Wikipedia is neutral enough to be agreed about between a *nix and a Windows developer : http://en.wikipedia.org/wiki/.bss. In essence: I was right about the static while you were right about it being used for working data. The fact that it's uninitialized (except for initially being zeroed out) makes it pretty pointless to make it const. 
I noted that on Arch Linux 64-bit if the program segfaults, the registers are never wiped. This means that during normal program termination, the OS would automatically clear the registers before running another program.
That said you might assume that I would just clear the registers during the beginning of the program however the OS also sets values to the registers before executing a program so clearing them "could" segfault the running program. That and I would be using the registers anyhow so it kinda defeats the purpose.
I don't know why it has to be that way but the same thing happens on several other linux distros so it must be a global problem.
I don't think anyone, including the OS, cares about the register contents at program termination, except perhaps the register that is supposed to hold the exit code to be returned by the program and, in case the OS intends to continue working with them without restoration/reinitialization, the stack pointer and segment registers (in case anything like the latter does exist in x64 at all, what I don't know)? Perhaps, if this is realy a concern for you, you should push all the registers you're going to use in your program onto the stack when you start and pop them back right prior to termination. That way, I think, you'd ultimately let do the OS what is the OS' job.
It should never segfault because of how simplistic of a program it is but.... who knows.
Your program uses pointers (as does practically any assembly language program, BTW), so there always is a chance for a segfault. However, of course, you should strive for your program not doing that. 
It may seem a little... excessive like the way many game programmer's code saves several times "just in case".
Well, ain't bets of that sort somewhat excessive by nature? About 25 years ago, I wrote 11 printed pages of assembly language for a prime number program bet. I used a quite complex variation of the Eratosthenes' Sieve algorithm because the space required for the sieve array to acommodate to the calculation range mandated by the rules of the bet would have even exhausted the hard disks of these days, and OTOH all other algorithms I knew looked too inefficient to me. Eventually, the bet ended in a draw because the other guy's program never reached a runnable state at all and mine bogusly considered a few numbers with relatively high prime factors (like 999991 - the bogus number, not a prime factor - it's 17 * 59 * 997) to be prime for a reason I never found out.
I was thrown out of college for cheating on the metaphysics exam; I looked into the soul of the boy sitting next to me.
This is a snakeskin jacket! And for me it's a symbol of my individuality, and my belief... in personal freedom.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|