CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 3 123 LastLast
Results 1 to 15 of 41
  1. #1
    Join Date
    Apr 2008
    Posts
    163

    Data Type Conversion

    Hi,

    How can i convert an Integer buffer of size 176x144 in to unsigned char of size 176x144
    I set all the data in the integer buffer comes in the Range [0,255]

    Is there any faster method for this ?
    When using for() loop there is critical performence issue.

    Any library function available for this ?

    Rgds
    Dave

  2. #2
    Join Date
    Jul 2002
    Location
    Portsmouth. United Kingdom
    Posts
    2,727

    Re: Data Type Conversion

    I can't see anyway that a loop won't be involved somewhere.

    Do you want the result in the same buffer or copied to a different one?
    "It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
    Richard P. Feynman

  3. #3
    Join Date
    Jan 2009
    Posts
    1,689

    Re: Data Type Conversion

    If it's time critical, then you can't use a for loop. For loops take too much memory and processing power.

    pseudocode
    Code:
    allocate your new buffer
    mov ecx, 6336 (176 x 144 / 4) //you'll see why divide by four below
    mov the pointer to your int array in a register
    mov the pointer to your char array in a register
    label:
         mov the first byte of the int array in a lolo register
         mov the first byte of the next int in the hilo register
         mov the first byte of the next next int int to the lohi register
         mov the first byte of the next next next int to the hihi register
         write the entire register to the char array
         increment the int array pointer by 16
         increment the char array pointer by 4
    LOOP label
    In this pseudocode you are doing four items at a time. This will be many many times faster than doing it in C. It might be even faster to loop only 1584 times and do 8 at a time using 2 registers.

    This is for all you you guys who occasionally tell me that an optimizing compiler will always beat an assembly programmer. For small pieces of code a good assembly engineer will always beat the compiler. :P Knowledge of how processor pipelines and caching works is the key.
    Last edited by ninja9578; May 13th, 2010 at 10:27 AM.

  4. #4
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Quote Originally Posted by ninja9578 View Post
    ...This is for all you you guys who occasionally tell me that an optimizing compiler will always beat an assembly programmer. For small pieces of code a good assembly engineer will always beat the compiler. :P Knowledge of how processor pipelines and caching works is the key.
    This sounds like a challenge, and I accept!
    Let’s take some measurable buffer size of 1,000,000 integers and transfer them into unsigned char array.
    I suggest this benchmark (timing is Windows-specific, you can substitute with your OS’s favorite).
    Code:
    #include "stdafx.h"
    
    double PCFreq = 0.0; 
    __int64 CounterStart = 0; 
    
    void StartCounter() 
    { 
    	LARGE_INTEGER li; 
    	if(!QueryPerformanceFrequency(&li)) 
    		std::cout << "QueryPerformanceFrequency failed!\n"; 
    
    	PCFreq = double(li.QuadPart)/1000.0; 
    
    	QueryPerformanceCounter(&li); 
    	CounterStart = li.QuadPart; 
    } 
    double GetCounter() 
    { 
    	LARGE_INTEGER li; 
    	QueryPerformanceCounter(&li); 
    	return double(li.QuadPart-CounterStart)/PCFreq; 
    } 
    
    
    const int TABLE_SIZE = 1000000;
    volatile int src[TABLE_SIZE];
    volatile unsigned char dst[TABLE_SIZE];
    
    void Simple()
    {
    	for(int i = 0; i < TABLE_SIZE; i++)
    		dst[i] = src[i];
    }
    
    void YourFunction()
    {
    }
    
    int main()
    {
    	std::cout << "Simple loop" << std::endl;
    	StartCounter(); 
    	Simple();
    	std::cout << GetCounter() << std::endl << std::endl;
    
    	std::cout << "Your function here" << std::endl;
    	StartCounter(); 
    	YourFunction();
    	std::cout << GetCounter() << std::endl << std::endl;
    
    	return 0;
    }
    And here is what I have in stdafx.h:
    Code:
    #pragma once
    
    #ifndef _WIN32_WINNT		// Allow use of features specific to Windows XP or later.                   
    #define _WIN32_WINNT 0x0501	// Change this to the appropriate value to target other versions of Windows.
    #endif						
    
    #include <windows.h>
    #include <iostream>
    Just add your code to YourFunction() and run, then post your result here.
    Please leave Simple() function in so that we can eliminate differences in hardware.
    Everybody is welcome to participate. Since Dave stated that this issue is performance-critical, I think that doing it in this thread is appropriate.

    Vlad
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  5. #5
    Join Date
    Apr 2010
    Posts
    20

    Re: Data Type Conversion

    Consider though...
    Attached Images Attached Images  

  6. #6
    Join Date
    Apr 2010
    Posts
    20

    Re: Data Type Conversion

    Time slices here are also critical in determining which function is faster. I've written a function in C++ for YourFunction() defined as follows.

    Also, the frequency returned from QueryPerformanceFrequency shouldn't be divided by 1000.0, that frequency IS the amount ticks in a second. Dividing it by 1000.0 results in how many "performance ticks" are in a millisecond. Unless you're wanting that? Which is what I assume.

    Code:
    void YourFunction()
    {
    	const int nCount = TABLE_SIZE >> 2;
    
    	int * pSource = (int *)&src[0];
    	int * pDest = (int *)&dst[0];
    
    	for (int i = 0; i < nCount; ++i)
    	{
    		int nTemp = pSource[3];
    		nTemp <<= 8;
    		nTemp |= pSource[2];
    		nTemp <<= 8;
    		nTemp |= pSource[1];
    		nTemp <<= 8;
    		nTemp |= pSource[0];
    
    		*pDest = nTemp;
    
    		pSource += 4;
    		++pDest;
    	}
    }
    And I get different results, sometimes YourFunction() is faster, sometimes Simple() is faster. It happens when I reverse the order of the calls. Which would tell me something.
    Last edited by CppCoder2010; May 13th, 2010 at 08:43 PM.

  7. #7
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: Data Type Conversion

    I'd recommend dropping the volatile qualifiers from the data. It won't do anything except kill possible optimizations.

  8. #8
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Quote Originally Posted by Lindley View Post
    I'd recommend dropping the volatile qualifiers from the data. It won't do anything except kill possible optimizations.
    I was only trying to prevent "optimizing out"...
    But it looks like you are correct, it works fine (loops 1,000,000 times) without it.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  9. #9
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: Data Type Conversion

    Hey, if the compiler can optimize out the loop and still get the data where it needs to be, then mission accomplished. Optimizing out is really only a problem in truly trivial speed tests which probably won't mean much anyway.

  10. #10
    Join Date
    Jan 2009
    Posts
    1,689

    Re: Data Type Conversion

    Oh fun

    I will write my function tonight or tomorrow.

  11. #11
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Quote Originally Posted by CppCoder2010 View Post
    ...I get different results, sometimes YourFunction() is faster, sometimes Simple() is faster. It happens when I reverse the order of the calls. Which would tell me something.
    I too have noticed this variation. Looks like it has something to do with the memory being “touched”.
    I’ve fixed it by calling Init() first thing from the main() function:
    Code:
    void Init()
    {
    	for(int i = 0; i < TABLE_SIZE; i++)
    	{
    		src[i] = i & 0xFF;
    		dst[i] = 0;
    	}
    }
    I can then call both functions repeatedly in different order but still get consistent results.
    My first attempt at “4 elements in 1 iteration” looks almost like yours:
    Code:
    void FourInOne()
    {
    	int* p = (int*)dst;
    	for(int i = 0, j = 0; i < TABLE_SIZE; i += 4)
    	{
    		*p++ = src[i] | src[i+1] << 8 | src[i+2] << 16 | src[i+3] << 24;
    	}
    }
    But it only gets minimal benefit over the Simple() function – about 0.5&#37; <edited> I meant - 5% </edited>
    I am working on my “optimized” implementation, but interested to see the ASM results as well.
    Last edited by VladimirF; May 14th, 2010 at 06:15 PM. Reason: Correction: %5, NOT 0.5%!
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  12. #12
    Lindley is offline Elite Member Power Poster
    Join Date
    Oct 2007
    Location
    Seattle, WA
    Posts
    10,895

    Re: Data Type Conversion

    If we're assuming that the input integers are already in the proper range [0,255], then I doubt it'll be easy to get much faster than this....
    Code:
    unsigned char *srcptr = src;
    unsigned char *dstptr = reinterpret_cast<unsigned char*>(dst);// assumes little endian; +3 if BE.
    for (int i = 0; i < TABLE_SIZE; i++, ++dstptr, srcptr += 4)
    {
        *dstptr = *srcptr;
    }

  13. #13
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Well, this all is pretty sad, actually
    Below are asm listings for four functions:
    1. Simple – assigning one byte at a time, in a loop.
    2. Lindley’s code (see above).
    3. My 4-in-1 code.
    4. My super-secret SSE implementation:
    - looping over 16 elements at a time;
    - load two groups of 4 ints into two XMM registers;
    - pack into one XXM register (16-bit values);
    - repeat for the third and fourth group of 4 ints;
    - pack two XXM registers with 16-bit values into one with 8-bit values.

    Code:
    	Simple();
    00401436  xor         eax,eax 
    00401438  jmp         main+3B0h (401440h) 
    0040143A  lea         ebx,[ebx] 
    00401440  mov         dl,byte ptr src (6362480h)[eax*4] 
    00401447  mov         byte ptr dst (404380h)[eax],dl 
    0040144D  add         eax,1 
    00401450  cmp         eax,5F5E100h 
    00401455  jl          main+3B0h (401440h)
    Code:
    	Lindley();
    00401555  mov         ecx,offset src (6362480h) 
    0040155A  mov         eax,offset dst (404380h) 
    0040155F  mov         esi,5F5E100h 
    00401564  mov         dl,byte ptr [ecx] 
    00401566  mov         byte ptr [eax],dl 
    00401568  add         eax,1 
    0040156B  add         ecx,4 
    0040156E  sub         esi,1 
    00401571  jne         00401564
    Code:
    void FourInOne()
    {
    	int* p = (int*)dst;
    	for(int i = 0, j = 0; i < TABLE_SIZE; i += 4)
    00401050  xor         eax,eax 
    	{
    		*p++ = src[i] | src[i+1] << 8 | src[i+2] << 16 | src[i+3] << 24;
    00401052  mov         ecx,dword ptr src+0Ch (636248Ch)[eax*4] 
    00401059  shl         ecx,8 
    0040105C  or          ecx,dword ptr src+8 (6362488h)[eax*4] 
    00401063  add         eax,4 
    00401066  shl         ecx,8 
    00401069  or          ecx,dword ptr [eax*4+6362474h] 
    00401070  shl         ecx,8 
    00401073  or          ecx,dword ptr [eax*4+6362470h] 
    0040107A  cmp         eax,5F5E100h 
    0040107F  mov         dword ptr ___@@_PchSym_@00@UxlwvUgvhgDCglIUgvhgDCglIUivovzhvUhgwzucOlyq@+4 (40437Ch)[eax],ecx 
    00401085  jl          FourInOne+2 (401052h) 
    	}
    }
    00401087  ret
    Code:
    void SSE()
    {
    	for(int i = 0, j = 0; i < TABLE_SIZE; i += 16)
    00401000  xor         ecx,ecx 
    00401002  mov         eax,offset src+20h (63624A0h) 
    00401007  jmp         SSE+10h (401010h) 
    00401009  lea         esp,[esp] 
    	{
    		pack(&src[i], &dst[i]);
    00401010  movdqu      xmm1,xmmword ptr [eax-10h] 
    00401015  movdqu      xmm0,xmmword ptr [eax-20h] 
    0040101A  movdqu      xmm2,xmmword ptr [eax+10h] 
    0040101F  packssdw    xmm0,xmm1 
    00401023  movdqu      xmm1,xmmword ptr [eax] 
    00401027  packssdw    xmm1,xmm2 
    0040102B  packuswb    xmm0,xmm1 
    0040102F  movdqa      xmmword ptr dst (404380h)[ecx],xmm0 
    00401037  add         eax,40h 
    0040103A  add         ecx,10h 
    0040103D  cmp         eax,offset ___onexitbegin (1E0DA8A0h) 
    00401042  jl          SSE+10h (401010h) 
    	}
    }
    00401044  ret
    And here are the <sad> results:

    Code:
    Simple loop 88.9159
    
    Lindley     91.1238
    
    4-in-1 loop 85.364
    
    Vlad's SSE  81.8267
    Your mileage may vary, but the ratio should be the same.

    I *REALLY* had bigger hopes for SSE… I guess if there was an instruction to pack 32-bit values directly into 8-bit (bypassing 16-bit), we would get a little better results. Or did I miss such an instruction? Any SSE experts here?

    <edited>
    @ninja9578 - Looking at generated asm, I doubt that you will be able to shave anything off. But – good luck!
    Last edited by VladimirF; May 14th, 2010 at 06:51 PM.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  14. #14
    Join Date
    Aug 2008
    Posts
    902

    Re: Data Type Conversion

    Quote Originally Posted by VladimirF View Post
    I *REALLY* had bigger hopes for SSE… I guess if there was an instruction to pack 32-bit values directly into 8-bit (bypassing 16-bit), we would get a little better results. Or did I miss such an instruction? Any SSE experts here?
    I would guess that no mater how you try to do it, you are going to be limited by memory performance. Also, if the memory isn't aligned properly, SSE is going to run dog slow.

  15. #15
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Data Type Conversion

    Quote Originally Posted by Chris_F View Post
    I would guess that no mater how you try to do it, you are going to be limited by memory performance. Also, if the memory isn't aligned properly, SSE is going to run dog slow.
    You might be right. Reading 400,000,000 bytes and writing 100,000,000 bytes must take some time.
    And I think my arrays are aligned OK; the addresses end with 80h – what else could you wish for?
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured