|
-
May 14th, 2010, 10:10 PM
#16
Re: Data Type Conversion
Does anyone see the bus error? I hate writing 32-bit assembly on a 64-bit machine:
Code:
inline void Assembly(){
__asm__ __volatile__(
" movl $1000000, %%ecx ;\n" //put the size of the table in here, don't reference it
" myloop: ;\n" //beginning of my loop
" movb 12(%0), %%ah ;\n" //char 4
" movb 8(%0), %%al ;\n" //char 3
" shl $16, %%eax ;\n" //can't access high bits directly, so shift these there for now
" movb 4(%0), %%ah ;\n" //char 2
" movb (%0), %%al ;\n" //char 1
" movl %%eax, (%1) ;\n" //push it out to the destination
" add $4, %1 ;\n" //move the dst ptr by 4 because we did 4 ata time
" add $16, %0 ;\n" //move the src ptr by 16
" loop myloop ;\n" //loop until ecx is zero
: //No output
: "r" (src), //Let CGG decide what registers to assign these to
"r" (dst) //Let GCC decide what registers to assign these to
: "eax", "ecx" //these two get explicitly clobbred
);
}
It runs fine for small arrays, but once I try doing one over 1000, it starts throwing bus errors.
Last edited by ninja9578; May 14th, 2010 at 10:13 PM.
-
May 14th, 2010, 10:40 PM
#17
Re: Data Type Conversion
 Originally Posted by ninja9578
Does anyone see the bus error? I hate writing 32-bit assembly on a 64-bit machine:
Code:
inline void Assembly(){
__asm__ __volatile__(
" movl $1000000, %%ecx ;\n" //put the size of the table in here, don't reference it
" myloop: ;\n" //beginning of my loop
" movb 12(%0), %%ah ;\n" //char 4
" movb 8(%0), %%al ;\n" //char 3
" shl $16, %%eax ;\n" //can't access high bits directly, so shift these there for now
" movb 4(%0), %%ah ;\n" //char 2
" movb (%0), %%al ;\n" //char 1
" movl %%eax, (%1) ;\n" //push it out to the destination
" add $4, %1 ;\n" //move the dst ptr by 4 because we did 4 ata time
" add $16, %0 ;\n" //move the src ptr by 16
" loop myloop ;\n" //loop until ecx is zero
: //No output
: "r" (src), //Let CGG decide what registers to assign these to
"r" (dst) //Let GCC decide what registers to assign these to
: "eax", "ecx" //these two get explicitly clobbred
);
}
It runs fine for small arrays, but once I try doing one over 1000, it starts throwing bus errors.
Oh god, I hate GCCs representation of inline asm, and AT&T syntax in general. I'm not even sure what that does.
-
May 15th, 2010, 12:03 AM
#18
Re: Data Type Conversion
I kind of do too, it would be much nicer if they used intels, but oh well. A few years ago I was going PPC assembly using AT&T's syntax. That was a nightmare. It's not as bad as it looks, looks like the forum software did some weird things with my tabs.
Last edited by ninja9578; May 15th, 2010 at 12:07 AM.
-
May 15th, 2010, 01:36 AM
#19
Re: Data Type Conversion
 Originally Posted by ninja9578
Does anyone see the bus error? I hate writing 32-bit assembly on a 64-bit machine:
It thought it worked for me, but I have little experience with this matter, but I updated your code a tad bit.
Code:
const int TABLE_SIZE_DIV_4 = TABLE_SIZE / 4;
inline void Assembly()
{
__asm__ __volatile__
(
" movl %2, %%ecx \n" //put the size of the table in here, don't reference it
" myloop: \n" //beginning of my loop
" movb 12(%0), %%ah \n" //char 4
" movb 8(%0), %%al \n" //char 3
" shl $16, %%eax \n" //can't access high bits directly, so shift these there for now
" movb 4(%0), %%ah \n" //char 2
" movb 0(%0), %%al \n" //char 1
" movl %%eax, (%1) \n" //push it out to the destination
" add $4, %1 \n" //move the dst ptr by 4 because we did 4 ata time
" add $16, %0 \n" //move the src ptr by 16
" loop myloop "
: //No output
: "r" (src), //Let CGG decide what registers to assign these to
"r" (dst), //Let GCC decide what registers to assign these to
"r" (TABLE_SIZE_DIV_4) //Let GCC decide what registers to assign these to
: "eax", "ecx" //these two get explicitly clobbred
);
}
and converted to msvc
Code:
const int TABLE_SIZE_DIV_4 = TABLE_SIZE / 4;
inline void Assembly()
{
__asm
{
push edx
push ebx
push ecx
mov edx, offset src
mov ebx, offset dst
mov ecx, TABLE_SIZE_DIV_4
myloop:
mov ah, [edx] + 12
mov al, [edx] + 8
shl eax, 16
mov ah, [edx] + 4
mov al, [edx] + 0
mov [ebx], eax
add ebx, 4
add edx, 16
loop myloop
pop edx
pop ebx
pop ecx
}
}
I make no promises this is 100% right. You need to test these for yourself.
Last edited by Joeman; May 15th, 2010 at 02:45 AM.
0100 0111 0110 1111 0110 0100 0010 0000 0110 1001 0111 0011 0010 0000 0110 0110 0110 1111 0111 0010
0110 0101 0111 0110 0110 0101 0111 0010 0010 0001 0010 0001 0000 0000 0000 0000 0000 0000 0000 0000
-
May 15th, 2010, 02:37 AM
#20
Re: Data Type Conversion
dunno how the performance compares, but I believe it to work.
Code:
inline void int_to_char(int *pInts, char *pChars, int arrSize)
{
_asm {
mov esi, pInts
mov edi, pChars
mov ebx, arrSize
xor ecx, ecx
myloop:
mov eax, [esi]
mov byte ptr [edi], al
add esi, 4
inc edi
inc ecx
cmp ecx, ebx
jne myloop
}
}
Last edited by Chris_F; May 15th, 2010 at 02:40 AM.
-
May 15th, 2010, 07:55 AM
#21
Re: Data Type Conversion
The division by four, that's what what causing my bus error. Bloody hell, I can't believe that I missed that.
Well, I finished my code:
Code:
#include <iostream>
#include <ctime>
const int TABLE_SIZE = 1000000;
const unsigned int LOOPS = 0xFF;
volatile int src[TABLE_SIZE];
volatile unsigned char dst[TABLE_SIZE];
void Simple(){
for(int i = 0; i < TABLE_SIZE; i++)
dst[i] = src[i];
}
inline void Assembly(){
__asm__ __volatile__(
" movl $250000, %%ecx ;\n" //put the size of the table in here, don't reference it
" myloop: ;\n" //beginning of my loop
" movb 12(%0), %%ah ;\n" //char 4
" movb 8(%0), %%al ;\n" //char 3
" shl $16, %%eax ;\n" //can't access high bits directly, so shift these there for now
" movb 4(%0), %%ah ;\n" //char 2
" movb (%0), %%al ;\n" //char 1
" movl %%eax, (%1) ;\n" //push it out to the destination
" add $4, %1 ;\n" //move the dst ptr by 4 because we did 4 ata time
" add $16, %0 ;\n" //move the src ptr by 16
" loop myloop ;\n" //loop until ecx is zero
: //No output
: "r" (src), //Let CGG decide what registers to assign these to
"r" (dst) //Let GCC decide what registers to assign these to
: "eax", "ecx"
);
}
int main (int argc, char * const argv[]) {
clock_t start = clock();
for (unsigned int i = 0; i < LOOPS; ++i)
Simple();
std::cout << clock() - start << std::endl;
start = clock();
for (unsigned int i = 0; i < LOOPS; ++i)
Assembly();
std::cout << clock() - start << std::endl;
return 0;
}
And sorry guys betting on the optimizer :
Code:
Ninjas-MacBook-Pro:Release ninja9578$ ./AssemblyChallenge
1059456
366081
Yes, used maximum optimizations, not the default release build on XCode, and I ran it in the console, not the dev environment. Looks like I beat the compiler. I know some of you guys wrote some more advanced routines, but you all said that they run either on par or slightly faster than the simple one, no one posts that it ran 3x faster, so I didn't bother benchmarking them.
@Chris_F: Your code looks good. But I'm concerned about the registers that you use. I've never done inline with VC++, is the assembler smart enough to realize that you clobbered those registers? Because you didn't push their state. Also it won't run as fast as mine for two reasons:
1) You are only doing one integer at a time, where as I'm doing 4. Registers are 32 bit, so use the whole thing, registers are almost a million times faster than RAM.
2) You and the compiler both increment a register, do a compare, then a jump. The processor has a built in function to do all of that in a single tick: loop.
Another thing, is that my code above uses volatile to keep the assembly as it is. If I didn't have that the optimizer could come in and change the assembly, perhaps making it even faster. So it's important when writing assembly to benchmark it with and without the volatile keyword. Sometimes the compiler can make it faster, sometimes it makes it slower, sometimes it does nothing.
Last edited by ninja9578; May 15th, 2010 at 08:01 AM.
-
May 15th, 2010, 10:03 AM
#22
Re: Data Type Conversion
What happens when you swap the calls to simple and assembly? Do you notice the assembly getting slower and the simple getting faster?
Get Microsoft Visual C++ Express here or CodeBlocks here.
Get STLFilt here to radically improve error messages when using the STL.
Get these two can't live without C++ libraries, BOOST here and Loki here.
Check your code with the Comeau Compiler and FlexeLint for standards compliance and some subtle errors.
Always use [code] code tags [/code] to make code legible and preserve indentation.
Do not ask for help writing destructive software such as viruses, gamehacks, keyloggers and the suchlike.
-
May 15th, 2010, 10:38 AM
#23
Re: Data Type Conversion
Uh oh. ***? I hate when weird things like that happen Someone want to run the thing on Windows and use that magical process query function?
-
May 15th, 2010, 11:01 AM
#24
Re: Data Type Conversion
Its because of the cache. the first function is paying to load the cache, the second is using the data already loaded. Makes your asm look much faster than the C, but much of that cost is cache loading.
Get Microsoft Visual C++ Express here or CodeBlocks here.
Get STLFilt here to radically improve error messages when using the STL.
Get these two can't live without C++ libraries, BOOST here and Loki here.
Check your code with the Comeau Compiler and FlexeLint for standards compliance and some subtle errors.
Always use [code] code tags [/code] to make code legible and preserve indentation.
Do not ask for help writing destructive software such as viruses, gamehacks, keyloggers and the suchlike.
-
May 15th, 2010, 11:18 AM
#25
Re: Data Type Conversion
 Originally Posted by Russco
Its because of the cache. the first function is paying to load the cache, the second is using the data already loaded. Makes your asm look much faster than the C, but much of that cost is cache loading.
Hmmm... Do you have 400MB cache on your processor?
Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
Convenience and productivity tools for Microsoft Visual Studio:
FeinWindows - replacement windows manager for Visual Studio, and more...
-
May 15th, 2010, 11:23 AM
#26
Re: Data Type Conversion
 Originally Posted by VladimirF
Hmmm... Do you have 400MB cache on your processor? 
Itanium 3???
-
May 15th, 2010, 11:56 AM
#27
Re: Data Type Conversion
 Originally Posted by Chris_F
Itanium 3???
Is it this one? Tukwila (processor)
Than it tops at “puny” 24MiB, not anywhere near 400MB.
Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
Convenience and productivity tools for Microsoft Visual Studio:
FeinWindows - replacement windows manager for Visual Studio, and more...
-
May 15th, 2010, 12:41 PM
#28
Re: Data Type Conversion
1 mill ints = 4 mill bytes = 4Mb (well a touch less).
My cpu has 4mb of l2 cache. Isn't l2 used for data?? I was under the impression l1 was for code, l2/3 were data caches.
Why would you need 420Mb to store 1 mill ints?
Get Microsoft Visual C++ Express here or CodeBlocks here.
Get STLFilt here to radically improve error messages when using the STL.
Get these two can't live without C++ libraries, BOOST here and Loki here.
Check your code with the Comeau Compiler and FlexeLint for standards compliance and some subtle errors.
Always use [code] code tags [/code] to make code legible and preserve indentation.
Do not ask for help writing destructive software such as viruses, gamehacks, keyloggers and the suchlike.
-
May 15th, 2010, 12:50 PM
#29
Re: Data Type Conversion
 Originally Posted by Russco
1 mill ints = 4 mill bytes = 4Mb (well a touch less).
My cpu has 4mb of l2 cache. Isn't l2 used for data?? I was under the impression l1 was for code, l2/3 were data caches.
Why would you need 420Mb to store 1 mill ints?
L1 is Harvard model, which means data and code are separate. L2 is not, it's both. L3 is just a slower and larger L2.
-
May 15th, 2010, 01:01 PM
#30
Re: Data Type Conversion
 Originally Posted by Russco
1 mill ints = 4 mill bytes = 4Mb (well a touch less).
My cpu has 4mb of l2 cache. Isn't l2 used for data?? I was under the impression l1 was for code, l2/3 were data caches.
Why would you need 420Mb to store 1 mill ints?
Sorry, this thread became too long. I thought I've mentioned that I bumped the array size to 100,000,000 to reduce fluctuation in results (at the time of post #13). Looks like I didn’t say it here.
Anyway, even with 100,000,000 ints the first pass through it takes almost 3 times longer. I don’t know why; I think I read something about “hot” vs. “cold” memory. Are there electrical engineers here who can confirm / deny that?
Regardless, in the current code I run through both arrays before each measurement, so that difference is eliminated: each function runs on a “hot” memory.
Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
Convenience and productivity tools for Microsoft Visual Studio:
FeinWindows - replacement windows manager for Visual Studio, and more...
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|