Data Type Conversion

Printable View

Show 50 post(s) from this thread on one page

May 15th, 2010, 01:04 PM
VladimirF

1 Attachment(s)

Re: Data Type Conversion

OK, here are new results.
I have used Joeman’s adoptation of ninja9578’s asm. I don’t speak GCC’s asm, but the code looks very similar (to a naked eye).
Anyway, here is machine code, see if there is anything you don’t like about it:

Code:

Asm(); 004013F1 push edx 004013F2 push ebx 004013F3 push ecx 004013F4 mov edx,offset src (6362480h) 004013F9 mov ebx,offset dst (404380h) 004013FE mov ecx,dword ptr ds:[403158h] 00401404 mov ah,byte ptr [edx+0Ch] 00401407 mov al,byte ptr [edx+8] 0040140A shl eax,10h 0040140D mov ah,byte ptr [edx+4] 00401410 mov al,byte ptr [edx] 00401412 mov dword ptr [ebx],eax 00401414 add ebx,4 00401417 add edx,10h 0040141A loop main+374h (401404h) 0040141C pop edx 0040141D pop ebx 0040141E pop ecx

And results:

Code:

Init array 291.013 Simple loop 88.0397 4-in-1 loop 85.2146 ninja9578's Asm 105.585 Chris F's Asm 90.5299 Lindley 90.5767 Vlad's SSE 81.3531

For lucky owners of Visual Studio 2005 (or above) I am attaching zipped solution so that you can verify my findings.
My conclusion: it is VERY HARD (if not impossible) to beat optimizing compiler with hand-coded asm. I bet it (at least the good one) knows everything about cache, pipeline, prefetch, etc.
So the bottom line (as I see it) is: in a fight for performance choose the best algorithm and implement it in as simple as possible way, so the compiler doesn’t get confused and is able to optimize it nicely.
May 15th, 2010, 01:51 PM
Joeman

Re: Data Type Conversion

Visual Studio 2008 express EDIT: it wasn't 2010..

Code:

Init array 598.937 Simple loop 250.653 4-in-1 loop 128.359 ninja9578's Asm 116.295 Chirs F's Asm 94.1516 Lindley 311.069 Vlad's SSE 140.421

I ran the test at least 4 times to make sure the numbers didn't fluctuate too much.

Seems like some assembly versions won pretty good.
Just to be clear, I ran it in release mode without starting the debugger. I did not make any changes to the build configurations.

I even moved the function calls in different orders to make sure the order didn't effect overall timings

Overall I don't recommend placing assembly in c++ just for that fact it makes it specific to a certain type of processor, compiler and etc.

if speed is a must and you know what platform and compiler you are using your code for, I suppose this wouldn't be bad as long as you make sure it is faster
May 15th, 2010, 02:17 PM
Joeman

Re: Data Type Conversion

I screwed up and said the last result where 2010. I have now corrected my previous post and now for the Visual Studio 2010 express

Code:

Init array 352.502 Simple loop 93.4141 4-in-1 loop 83.0259 ninja9578's Asm 104.253 Chirs F's Asm 97.9349 Lindley 96.6014 Vlad's SSE 84.4739

The numbers did fluctuate a good bit though :S so this is the averaged run time :(

Seems like the simple loop is the best choice in this case
May 15th, 2010, 02:41 PM
Chris_F

Re: Data Type Conversion

That defiantly seems as if it's getting close to the memory bandwidth limit. I have yet to write a inline asm function that out performs the compiler at the same task.

I once thought that I could write a SSE memcpy function that would be lightning fast... but its results were identical to the std library which I'm pretty sure didn't compile to SSE. ;)

I'm no expert on cache lines and such, but I've found that the way in which you access memory can make a world of difference.
May 15th, 2010, 02:50 PM
VladimirF

Re: Data Type Conversion

Quote:

Originally Posted by Joeman

I screwed up and said the last result where 2010. I have now corrected my previous post and now for the Visual Studio 2010 express

You had me going mad for almost half an hour! :)
I am even installing Express to verify your results.
Could it be that you screwed up more than once and ran Debug build (or simply not optimized) in your first test? Because your numbers are inline with my Debug.

Anyway, I too screwed up (a little).
I can shave another 5% off my SSE implementation by replacing four calls to _mm_loadu_si128 (unaligned) with calls to _mm_load_si128 (aligned). Should have listen more carefully to Chris F in post # 14. For some reason, I thought that having memory aligned was enough :(
Also, as a benchmark, I call mamcpy() between two 400,000,000 MB buffers, and in my test it takes 42ms. THIS must be limited by memory speed.
May 15th, 2010, 02:52 PM
VladimirF

Re: Data Type Conversion

Quote:

Originally Posted by Chris_F

I once thought that I could write a SSE memcpy function that would be lightning fast... but its results were identical to the std library which I'm pretty sure didn't compile to SSE. ;)

Actually, if you trace into memcpy (and I think std library uses it), you'll see that it tests for alignment and presence of SSE2, and in such case does use SSE.
May 15th, 2010, 03:04 PM
Joeman

Re: Data Type Conversion

Quote:

Originally Posted by VladimirF

You had me going mad for almost half an hour! :)

As soon has I realized I messed up, I corrected asap :D

Quote:

Originally Posted by VladimirF

Could it be that you screwed up more than once and ran Debug build (or simply not optimized) in your first test? Because your numbers are inline with my Debug.

Well I ran it in release mode without debugger, BUT some how my configurations weren't correct. I think they weren't transferred over from your 2005 solution :blush:. now that I checked and fixed them, here are the results

Visual Studio 2008 express with proper release configuration

Code:

Init array 383.338 Simple loop 85.3905 4-in-1 loop 82.2526 ninja9578's Asm 105.795 Chirs F's Asm 97.2565 Lindley 92.2891 Vlad's SSE 80.3013

EDIT: I hope this is right now :)
May 15th, 2010, 04:59 PM
ninja9578

Re: Data Type Conversion

I think it will run faster if you hard coded the size of the table in. I think it's only fair, seeing that the compiler does that :P

Looks like the compiler beat me here though :-\ Perhaps we should come up with something more complicated for it. We did a big contest in college where we each had to write a piece of software that could solve some type of math problem. I wrote C with inline assembly and it beat all the others pretty easily, the only one that came close was fortran believe it or not. One guy used java... not sure what he was thinking.
May 15th, 2010, 05:16 PM
Lindley

Re: Data Type Conversion

Perhaps a SURF feature extractor? First thing that comes to mind. I have a GPU implementation (proprietary) which runs at 30+ fps, 1280x1024 images, on an NVIDIA 8-series GPU. Might be interesting to see what highly optimized CPU approaches can do. On the other hand, that might be rather ambitious.
May 15th, 2010, 07:43 PM
ninja9578

Re: Data Type Conversion

GPU will always beat a CPU implementation, it has special hardware for matrix and vector math. I'll bet in software, that would run about about 1 frame every couple of seconds :P
June 4th, 2010, 12:33 PM
VladimirF

Re: Data Type Conversion

I have to say I am a bit disappointed with this thread. No interest in over two weeks! Anyone saw Dave1024? Hope he is OK. ;)
And I would *REALLY* like to see GPU implementation!

Show 50 post(s) from this thread on one page