CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 3 FirstFirst 123 LastLast
Results 16 to 30 of 41
  1. #16
    Join Date
    Jan 2009
    Posts
    1,689

    Re: Data Type Conversion

    Does anyone see the bus error? I hate writing 32-bit assembly on a 64-bit machine:

    Code:
    inline void Assembly(){
        __asm__ __volatile__(
    	    "  movl $1000000, %%ecx		  ;\n"	   //put the size of the table in here, don't reference it
    	    "  myloop:				  ;\n"	   //beginning of my loop
    	    "	  movb 12(%0), %%ah	  ;\n"	   //char 4
    	    "	  movb 8(%0), %%al	  ;\n"	   //char 3
    	    "	  shl $16, %%eax	  ;\n"	   //can't access high bits directly, so shift these there for now
    	    "	  movb 4(%0), %%ah	  ;\n"	   //char 2
    	    "	  movb (%0), %%al	  ;\n"	   //char 1
    	    "	  movl %%eax, (%1)	  ;\n"	   //push it out to the destination
    	    "	  add $4, %1		  ;\n"	   //move the dst ptr by 4 because we did 4 ata  time
    	    "	  add $16, %0		  ;\n"	   //move the src ptr by 16
    	    "  loop myloop			  ;\n"	   //loop until ecx is zero
    	    :							   //No output
    	    :  "r" (src),					   //Let CGG decide what registers to assign these to
    		  "r" (dst)					   //Let GCC decide what registers to assign these to
    	    :  "eax", "ecx"					   //these two get explicitly clobbred
    	    );
    }
    It runs fine for small arrays, but once I try doing one over 1000, it starts throwing bus errors.
    Last edited by ninja9578; May 14th, 2010 at 10:13 PM.

  2. #17
    Join Date
    Aug 2008
    Posts
    902

    Re: Data Type Conversion

    Quote Originally Posted by ninja9578 View Post
    Does anyone see the bus error? I hate writing 32-bit assembly on a 64-bit machine:

    Code:
    inline void Assembly(){
        __asm__ __volatile__(
    	    "  movl $1000000, %%ecx		  ;\n"	   //put the size of the table in here, don't reference it
    	    "  myloop:				  ;\n"	   //beginning of my loop
    	    "	  movb 12(%0), %%ah	  ;\n"	   //char 4
    	    "	  movb 8(%0), %%al	  ;\n"	   //char 3
    	    "	  shl $16, %%eax	  ;\n"	   //can't access high bits directly, so shift these there for now
    	    "	  movb 4(%0), %%ah	  ;\n"	   //char 2
    	    "	  movb (%0), %%al	  ;\n"	   //char 1
    	    "	  movl %%eax, (%1)	  ;\n"	   //push it out to the destination
    	    "	  add $4, %1		  ;\n"	   //move the dst ptr by 4 because we did 4 ata  time
    	    "	  add $16, %0		  ;\n"	   //move the src ptr by 16
    	    "  loop myloop			  ;\n"	   //loop until ecx is zero
    	    :							   //No output
    	    :  "r" (src),					   //Let CGG decide what registers to assign these to
    		  "r" (dst)					   //Let GCC decide what registers to assign these to
    	    :  "eax", "ecx"					   //these two get explicitly clobbred
    	    );
    }
    It runs fine for small arrays, but once I try doing one over 1000, it starts throwing bus errors.
    Oh god, I hate GCCs representation of inline asm, and AT&T syntax in general. I'm not even sure what that does.

  3. #18
    Join Date
    Jan 2009
    Posts
    1,689

    Re: Data Type Conversion

    I kind of do too, it would be much nicer if they used intels, but oh well. A few years ago I was going PPC assembly using AT&T's syntax. That was a nightmare. It's not as bad as it looks, looks like the forum software did some weird things with my tabs.
    Last edited by ninja9578; May 15th, 2010 at 12:07 AM.

  4. #19
    Join Date
    Jun 2008
    Posts
    592

    Re: Data Type Conversion

    Quote Originally Posted by ninja9578
    Does anyone see the bus error? I hate writing 32-bit assembly on a 64-bit machine:
    It thought it worked for me, but I have little experience with this matter, but I updated your code a tad bit.
    Code:
    const int TABLE_SIZE_DIV_4 = TABLE_SIZE / 4;
    
    inline void Assembly()
    {
        __asm__ __volatile__
        (
            "  movl %2, %%ecx          \n"       //put the size of the table in here, don't reference it
            "  myloop:                  \n"       //beginning of my loop
            "      movb 12(%0), %%ah      \n"       //char 4
            "      movb  8(%0), %%al      \n"       //char 3
            "      shl     $16, %%eax  \n"       //can't access high bits directly, so shift these there for now
            "      movb  4(%0), %%ah      \n"       //char 2
            "      movb  0(%0), %%al      \n"       //char 1
            "      movl  %%eax, (%1)      \n"       //push it out to the destination
            "      add      $4, %1      \n"       //move the dst ptr by 4 because we did 4 ata  time
            "      add     $16, %0      \n"       //move the src ptr by 16
            " loop myloop "
            :                                   //No output
            :  "r" (src),                       //Let CGG decide what registers to assign these to
               "r" (dst),                       //Let GCC decide what registers to assign these to
               "r" (TABLE_SIZE_DIV_4)           //Let GCC decide what registers to assign these to
            :  "eax", "ecx"                       //these two get explicitly clobbred
        );
    }
    and converted to msvc
    Code:
    const int TABLE_SIZE_DIV_4 = TABLE_SIZE / 4;
    
    inline void Assembly()
    {
        __asm
        {
    	push edx
    	push ebx
            push ecx
    
            mov edx, offset src
            mov ebx, offset dst
    
            mov ecx, TABLE_SIZE_DIV_4
            myloop:
                mov ah, [edx] + 12
                mov al, [edx] + 8
                shl eax, 16
                mov ah, [edx] + 4
                mov al, [edx] + 0
                mov [ebx], eax
                add ebx, 4
                add edx, 16
            loop myloop
    
    	pop edx
    	pop ebx
            pop ecx
        }
    }
    I make no promises this is 100% right. You need to test these for yourself.
    Last edited by Joeman; May 15th, 2010 at 02:45 AM.
    0100 0111 0110 1111 0110 0100 0010 0000 0110 1001 0111 0011 0010 0000 0110 0110 0110 1111 0111 0010
    0110 0101 0111 0110 0110 0101 0111 0010 0010 0001 0010 0001 0000 0000 0000 0000
    0000 0000 0000 0000

  5. #20
    Join Date
    Aug 2008
    Posts
    902

    Re: Data Type Conversion

    dunno how the performance compares, but I believe it to work.

    Code:
    inline void int_to_char(int *pInts, char *pChars, int arrSize)
    {
    	_asm {
    
    		mov esi, pInts
    		mov edi, pChars
    		mov ebx, arrSize
    		xor ecx, ecx
    	myloop:
    		mov eax, [esi]
    		mov byte ptr [edi], al
    		add esi, 4
    		inc edi
    		inc ecx
    		cmp ecx, ebx
    		jne myloop
    	}
    }
    Last edited by Chris_F; May 15th, 2010 at 02:40 AM.

  6. #21
    Join Date
    Jan 2009
    Posts
    1,689

    Re: Data Type Conversion

    The division by four, that's what what causing my bus error. Bloody hell, I can't believe that I missed that.

    Well, I finished my code:
    Code:
    #include <iostream>
    #include <ctime>
    
    const int TABLE_SIZE = 1000000;
    const unsigned int LOOPS = 0xFF;
    volatile int src[TABLE_SIZE];
    volatile unsigned char dst[TABLE_SIZE];
    
    void Simple(){
        for(int i = 0; i < TABLE_SIZE; i++)
    	   dst[i] = src[i];
    }
    
    inline void Assembly(){
        __asm__ __volatile__(
    	    "  movl $250000, &#37;%ecx		  ;\n"	   //put the size of the table in here, don't reference it
    	    "  myloop:				  ;\n"	   //beginning of my loop
    	    "	  movb 12(%0), %%ah	  ;\n"	   //char 4
    	    "	  movb 8(%0), %%al	  ;\n"	   //char 3
    	    "	  shl $16, %%eax	  ;\n"	   //can't access high bits directly, so shift these there for now
    	    "	  movb 4(%0), %%ah	  ;\n"	   //char 2
    	    "	  movb (%0), %%al	  ;\n"	   //char 1
    	    "	  movl %%eax, (%1)	  ;\n"	   //push it out to the destination
    	    "	  add $4, %1		  ;\n"	   //move the dst ptr by 4 because we did 4 ata  time
    	    "	  add $16, %0		  ;\n"	   //move the src ptr by 16
    	    "  loop myloop			  ;\n"	   //loop until ecx is zero
    	    :							   //No output
    	    :  "r" (src),					   //Let CGG decide what registers to assign these to
    		  "r" (dst)					   //Let GCC decide what registers to assign these to
    	    :  "eax", "ecx"
    	    );
    }
    
    
    int main (int argc, char * const argv[]) {
        
        clock_t start = clock();
        for (unsigned int i = 0; i < LOOPS; ++i)
    	   Simple();
        std::cout << clock() - start << std::endl;
    	
        start = clock();
        for (unsigned int i = 0; i < LOOPS; ++i)
    	   Assembly();
        std::cout << clock() - start << std::endl;
        
        return 0;
    }
    And sorry guys betting on the optimizer :
    Code:
    Ninjas-MacBook-Pro:Release ninja9578$ ./AssemblyChallenge
    1059456
    366081
    Yes, used maximum optimizations, not the default release build on XCode, and I ran it in the console, not the dev environment. Looks like I beat the compiler. I know some of you guys wrote some more advanced routines, but you all said that they run either on par or slightly faster than the simple one, no one posts that it ran 3x faster, so I didn't bother benchmarking them.

    @Chris_F: Your code looks good. But I'm concerned about the registers that you use. I've never done inline with VC++, is the assembler smart enough to realize that you clobbered those registers? Because you didn't push their state. Also it won't run as fast as mine for two reasons:

    1) You are only doing one integer at a time, where as I'm doing 4. Registers are 32 bit, so use the whole thing, registers are almost a million times faster than RAM.
    2) You and the compiler both increment a register, do a compare, then a jump. The processor has a built in function to do all of that in a single tick: loop.

    Another thing, is that my code above uses volatile to keep the assembly as it is. If I didn't have that the optimizer could come in and change the assembly, perhaps making it even faster. So it's important when writing assembly to benchmark it with and without the volatile keyword. Sometimes the compiler can make it faster, sometimes it makes it slower, sometimes it does nothing.
    Last edited by ninja9578; May 15th, 2010 at 08:01 AM.

  7. #22
    Join Date
    Nov 2008
    Location
    England
    Posts
    748

    Re: Data Type Conversion

    What happens when you swap the calls to simple and assembly? Do you notice the assembly getting slower and the simple getting faster?
    Get Microsoft Visual C++ Express here or CodeBlocks here.
    Get STLFilt here to radically improve error messages when using the STL.
    Get these two can't live without C++ libraries, BOOST here and Loki here.
    Check your code with the Comeau Compiler and FlexeLint for standards compliance and some subtle errors.
    Always use [code] code tags [/code] to make code legible and preserve indentation.
    Do not ask for help writing destructive software such as viruses, gamehacks, keyloggers and the suchlike.

  8. #23
    Join Date
    Jan 2009
    Posts
    1,689

    Re: Data Type Conversion

    Uh oh. ***? I hate when weird things like that happen Someone want to run the thing on Windows and use that magical process query function?

  9. #24
    Join Date
    Nov 2008
    Location
    England
    Posts
    748

    Re: Data Type Conversion

    Its because of the cache. the first function is paying to load the cache, the second is using the data already loaded. Makes your asm look much faster than the C, but much of that cost is cache loading.
    Get Microsoft Visual C++ Express here or CodeBlocks here.
    Get STLFilt here to radically improve error messages when using the STL.
    Get these two can't live without C++ libraries, BOOST here and Loki here.
    Check your code with the Comeau Compiler and FlexeLint for standards compliance and some subtle errors.
    Always use [code] code tags [/code] to make code legible and preserve indentation.
    Do not ask for help writing destructive software such as viruses, gamehacks, keyloggers and the suchlike.

  10. #25
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Quote Originally Posted by Russco View Post
    Its because of the cache. the first function is paying to load the cache, the second is using the data already loaded. Makes your asm look much faster than the C, but much of that cost is cache loading.
    Hmmm... Do you have 400MB cache on your processor?
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  11. #26
    Join Date
    Aug 2008
    Posts
    902

    Re: Data Type Conversion

    Quote Originally Posted by VladimirF View Post
    Hmmm... Do you have 400MB cache on your processor?
    Itanium 3???

  12. #27
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Quote Originally Posted by Chris_F View Post
    Itanium 3???
    Is it this one? Tukwila (processor)
    Than it tops at “puny” 24MiB, not anywhere near 400MB.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  13. #28
    Join Date
    Nov 2008
    Location
    England
    Posts
    748

    Re: Data Type Conversion

    1 mill ints = 4 mill bytes = 4Mb (well a touch less).

    My cpu has 4mb of l2 cache. Isn't l2 used for data?? I was under the impression l1 was for code, l2/3 were data caches.

    Why would you need 420Mb to store 1 mill ints?
    Get Microsoft Visual C++ Express here or CodeBlocks here.
    Get STLFilt here to radically improve error messages when using the STL.
    Get these two can't live without C++ libraries, BOOST here and Loki here.
    Check your code with the Comeau Compiler and FlexeLint for standards compliance and some subtle errors.
    Always use [code] code tags [/code] to make code legible and preserve indentation.
    Do not ask for help writing destructive software such as viruses, gamehacks, keyloggers and the suchlike.

  14. #29
    Join Date
    Aug 2008
    Posts
    902

    Re: Data Type Conversion

    Quote Originally Posted by Russco View Post
    1 mill ints = 4 mill bytes = 4Mb (well a touch less).

    My cpu has 4mb of l2 cache. Isn't l2 used for data?? I was under the impression l1 was for code, l2/3 were data caches.

    Why would you need 420Mb to store 1 mill ints?
    L1 is Harvard model, which means data and code are separate. L2 is not, it's both. L3 is just a slower and larger L2.

  15. #30
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Quote Originally Posted by Russco View Post
    1 mill ints = 4 mill bytes = 4Mb (well a touch less).

    My cpu has 4mb of l2 cache. Isn't l2 used for data?? I was under the impression l1 was for code, l2/3 were data caches.

    Why would you need 420Mb to store 1 mill ints?
    Sorry, this thread became too long. I thought I've mentioned that I bumped the array size to 100,000,000 to reduce fluctuation in results (at the time of post #13). Looks like I didn’t say it here.
    Anyway, even with 100,000,000 ints the first pass through it takes almost 3 times longer. I don’t know why; I think I read something about “hot” vs. “cold” memory. Are there electrical engineers here who can confirm / deny that?
    Regardless, in the current code I run through both arrays before each measurement, so that difference is eliminated: each function runs on a “hot” memory.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

Page 2 of 3 FirstFirst 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured