Performance of Managed vs Unmanaged code

OK, so I haven't had very much experience and and still learning both C++ and C#. I decided to do a test of unmanaged C++ vs C#. I made what I believe to be close to identical code in both languages (but what do I know? lol.) I was hoping some of you may first of all, tell me if you see anything wrong with my code (bad practice, just plain bad code, whatever) and I hope you can also explain the differences I'm seeing.

C++ Program:

Code:

#include <iostream> #include <string> #include <sstream> #include <windows.h> using namespace std; class Point { public: int x, y; Point(); Point(int x, int y); friend Point operator+ (Point obj1, Point obj2); string ToString(); }; int main() { long start, stop; Point *pointArray = new Point[16000000]; start = GetTickCount(); for (int i = 0; i < 16000000; i++) { pointArray[i] = Point(i, i); } Point myPoint(1, 1); for (int i = 0; i < 16000000; i++) { pointArray[i] = pointArray[i] + myPoint; } stop = GetTickCount(); cout << pointArray[0].ToString(); cout << stop - start << "ms"; cin.get(); return 0; } Point::Point() { this->x = 0; this->y = 0; } Point::Point(int x, int y) { this->x = x; this->y = y; } Point operator+(Point obj1, Point obj2) { return Point(obj1.x + obj2.x, obj1.y + obj2.y); } string Point::ToString() { ostringstream temp; temp << "Point: x = " << x << ", y = " << y << endl; return temp.str(); }

and here is my C# version:

Code:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; namespace ConsoleApplication1 { class Program { static void Main(String[] args) { Stopwatch time = new Stopwatch(); Point[] pointArray = new Point[16000000]; time.Start(); for (Int32 i = 0; i < 16000000; i++) { pointArray[i] = new Point(i, i); } Point myPoint = new Point(1, 1); for (Int32 i = 0; i < 16000000; i++) { pointArray[i] = pointArray[i] + myPoint; } time.Stop(); Console.WriteLine(pointArray[0].ToString()); Console.WriteLine("{0}ms", time.ElapsedMilliseconds); Console.ReadLine(); } } class Point { public Int32 x, y; public Point() { x = 0; y = 0; } public Point(Int32 x, Int32 y) { this.x = x; this.y = y; } public static Point operator +(Point obj1, Point obj2) { return new Point(obj1.x + obj2.x, obj1.y + obj2.y); } public override String ToString() { return String.Format("Point: x = {0}, y = {1}", this.x, this.y); } } }

OK, so now that the code is out of the way, here are my observations:

C# Release version takes approximately 2560ms to execute.

C++ Debug version takes approximately 1060ms to execute.

C++ Release version takes approximately 32ms to execute.

Now, I assumed that C# would be slower. Is there anything I could do to make the C# code faster? Also, why is the C++ Release version so much ridiculously faster than both the C# and Debug version? I'm sure there could be some simple things I'm overlooking.

Re: Performance of Managed vs Unmanaged code

Quote:

Also, why is the C++ Release version so much ridiculously faster than both the C# and Debug version?

The difference between debug and release is that the debug version does a lot more for debugging and error-trapping. Also, C++ is faster than C#.

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Skizmo

The difference between debug and release is that the debug version does a lot more for debugging and error-trapping. Also, C++ is faster than C#.

Still, 80x seems a bit much. There's got to be a better explanation than "well C++ is faster than C#."

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Chris_F

Still, 80x seems a bit much. There's got to be a better explanation than "well C++ is faster than C#."

Then post your C# question in the C# forum, and have the persons there answer why the C# code is slow.

Regards,

Paul McKenzie

Re: Performance of Managed vs Unmanaged code

Your C++ operator+ should take its arguments by const reference rather than by value.

You also need to delete[] the pointArray at the end, or better yet, use a std::vector<Point> instead of a dynamic array. In this case, you may want to time the difference between initializing the array to the full size and proceeding as you are, versus simply reserve()ing it to the full size and then push_back()ing each new Point in the first loop. The difference is that the latter should remove a useless default-constructor call.

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Paul McKenzie

Then post your C# question in the C# forum, and have the persons there answer why the C# code is slow.

Regards,

Paul McKenzie

My questions are just as much having to do with C++ as C#. If I post this there, they can easily say the same thing to me. So perhaps someone can try helping me out here before I go posting this on every board.

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Chris_F

My questions are just as much having to do with C++ as C#. If I post this there, they can easily say the same thing to me.

No they can't.

You are asking why your C# code is slow. So who else would be best to answer your question?

I know why the C++ code is fast -- if you want me to explain, then I will. If you want someone to explain why C# is slow, go ask the C# experts in the other forum (and there is only one forum here that has C# in its name). They may even reccommend (as Lindley did with the C++ version) as to how to speed up the C# code.

Regards,

Paul McKenzie

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Lindley

Your C++ operator+ should take its arguments by const reference rather than by value.

You also need to delete[] the pointArray at the end, or better yet, use a std::vector<Point> instead of a dynamic array. In this case, you may want to time the difference between initializing the array to the full size and proceeding as you are, versus simply reserve()ing it to the full size and then push_back()ing each new Point in the first loop. The difference is that the latter should remove a useless default-constructor call.

like this?

Code:

Point Point::operator+(const Point &obj1) const { return Point(this->x + obj1.x, this->y + obj1.y); }

Re: Performance of Managed vs Unmanaged code

That's one way to do it. The more usual approach is to implement operator+= as a member function, and then implement operator+ as a non-member, non-friend function in terms of +=.

Also, "this->" is unnecessary but not harmful there.

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Chris_F

like this?

Code:

Point Point::operator+(const Point &obj1) const { return Point(this->x + obj1.x, this->y + obj1.y); }

Here is a complete example, however this takes 150 ms on my "slow" PC:

Code:

#include <iostream> #include <string> #include <sstream> #include <algorithm> #include <vector> #include <windows.h> using namespace std; class Point { public: int x, y; Point(); Point(int x, int y); friend Point operator+ (const Point& obj1, const Point& obj2); Point& operator += (const Point& obj1); void SetPoint(int x_, int y_); string ToString(); }; struct PointSetter { void operator()(Point& thePoint) { ++cur; thePoint.SetPoint(cur, cur); } int cur; PointSetter(int StartNum=0) : cur(StartNum) { } }; struct PointAdder { void operator()(Point& thePoint) { thePoint += m_Point; } Point m_Point; PointAdder() : m_Point(1,1) { } }; int main() { long start, stop; std::vector<Point> pointArray(16000000); PointSetter ps; PointAdder pa; start = GetTickCount(); for_each(pointArray.begin(), pointArray.end(), ps); for_each(pointArray.begin(), pointArray.end(), pa); stop = GetTickCount(); cout << pointArray[0].ToString(); cout << stop - start << "ms"; cin.get(); return 0; } Point::Point() :x(0), y(0) { } Point::Point(int x_, int y_) :x(x_), y(y_) { } void Point::SetPoint(int x_, int y_) { x = x_; y = y_; } Point operator+(const Point& obj1, const Point& obj2) { Point temp = obj1; return temp += obj2; } Point& Point::operator+= (const Point& obj1) { x += obj1.x; y += obj1.y; return *this; } string Point::ToString() { ostringstream temp; temp << "Point: x = " << x << ", y = " << y << endl; return temp.str(); }

I added vector, a SetPoint, an operator +=, and used the algorithm functions to set and add the points.

Changing the code to this:

Code:

Point myPoint(1, 1); for (int i = 0; i < 16000000; ++i) pointArray[i].SetPoint(i,i); for (int i = 0; i < 16000000; ++i) pointArray[i] += myPoint;

replacing the algorithm functions didn't improve the time at all (not surprised). Actually, the time increased by 20 or so ms using the hand-coded for loops.

Regards,

Paul McKenzie

Re: Performance of Managed vs Unmanaged code

Thanks for that code, Paul.

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Chris_F

Thanks for that code, Paul.

No problem.

Just remember that the C# gurus over in the C-sharp forum should help you with the speeding up of your C# code.

The issue is that C++ and C# look similar, so the tendency is to lump them as one "language", and therefore the temptation is to do a line-for-line translation from C++ to C# and vice-versa. However there are constructs in one language that are unknown by the the other language, constructs that may optimize things a little better.

For example, the algorithms helped the C++ code to optimize itself a little more -- I have no idea if C# has such things. Similarly, there may be constructs in C# that have no equivalence in C++ that would optimize the C# code.

Regards,

Paul McKenzie

Re: Performance of Managed vs Unmanaged code

1) C++ has a considerably better optimiser than C#.
You can clearly see THAT part in action by the difference in the debug build and the release build.
It wouldn't surprise me at all that the optimising compiler sees the first loop can be skipped and changes it to a single loop setting the items to the value requested.

2) the C# code does a new for each item inserted. the C++ version does not.
memory allocation is a slow operation.

3) C# will spend part of it's time in the garbage collector. C++ doesn't have one. Garbage collection is pretty nifty, but it still takes a noticable amount of time.

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by OReubens

1) C++ has a considerably better optimiser than C#.
You can clearly see THAT part in action by the difference in the debug build and the release build.
It wouldn't surprise me at all that the optimising compiler sees the first loop can be skipped and changes it to a single loop setting the items to the value requested.

My first guess was that 32ms time was achieved by skipping both loops (after all, only the first point value was ever used). But both loops are present in the optimized code :(
The speed was achieved mostly, I believe, by inlining class member functions.

Quote:

Originally Posted by OReubens

3) C# will spend part of it's time in the garbage collector. C++ doesn't have one. Garbage collection is pretty nifty, but it still takes a noticable amount of time.

I don’t think that garbage collector will be involved in the measured time.

<edit> I am afraid I demonstrated my ignorance in C# and now want to retract my last statement (about GC)

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by Chris_F

C++ Release version takes approximately 32ms to execute.

Complete off-topic: what kind of hardware do you test this on?
On my (pretty decent) system, I am getting 78ms - 94ms results.
Is it time for me to upgrade?

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by VladimirF

Complete off-topic: what kind of hardware do you test this on?
On my (pretty decent) system, I am getting 78ms - 94ms results.
Is it time for me to upgrade?

I've got a reasonably fast (at least for its time) Core 2 Duo, and I'm getting 150 ms (VC 2008, release/maximize speed/_SECURE_SCL=0). So I'm in worse shape than you are.

Regards,

Paul McKenzie

Re: Performance of Managed vs Unmanaged code

The C++ optimized version indeed runs both loops. Playing around with the C# code a bit I was able to get it to run almost as fast as the C++ code my switching to a struct instead of a class and removing unnecessary NEW operations with alternatives, about 1.6x slower which is a lot better then 100x slower.

I have a 4GHz Intel i5, so my times would probably be shorter than most.

Re: Performance of Managed vs Unmanaged code

The reason why the C# code is so slow is the fact that the Point is a reference type and not a value one. A value type items would be stored linearly inside the array allocated memory while the reference type, as someone else has already pointed out here, needs to allocate memory for every array item separately, so the array then holds, described in C++ terms, pointers to the objects instead of the objects themselves.
Try to replace the "class" keyword by the "struct" keyword, you'll see that the performance will become very close to the performance of the C++ code. On my machine, the C++ code took 110ms to run, the C# took 138ms to run (both for release target).

Thanks,

Jan

Re: Performance of Managed vs Unmanaged code

There was an artificial neuralsystem app in C#.
I have ported it to unmanaged (native) C++.
It ran three times faster, in Release, unoptimized..
Why?

First, managed code uses intermediate language. The app is compiled to it and at the execution time this intermediate language is interpreted: the machine instructions are produced line-by-line in the time of execution (of course, this is more complicated, but idea is this.)
This interpretation is completely absent in C++: the compilator and linker produce machine instructions well before execution.

Second, most of local and global variables in C# are beeng placed on heap. It takes more processor time than using of stack in C++.

Third, the C# garbage collector also takes some time to housekeeping the app memory. It should move some variables between heap tiers (there are four tiers on the heap), compact the memory and update memory block data.

Fourth, I think that it is very Microsoft to sacrifice speed to convenience and easy of using C#.

Re: Performance of Managed vs Unmanaged code

Despite the name, C# isn't very closely related to the C family. It has a lot more in common with Java, really.

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by geoyar

There was an artificial neuralsystem app in C#.
I have ported it to unmanaged (native) C++.
It ran three times faster, in Release, unoptimized..
Why?

First, managed code uses intermediate language. The app is compiled to it and at the execution time this intermediate language is interpreted: the machine instructions are produced line-by-line in the time of execution (of course, this is more complicated, but idea is this.)
This interpretation is completely absent in C++: the compilator and linker produce machine instructions well before execution.

Second, most of local and global variables in C# are beeng placed on heap. It takes more processor time than using of stack in C++.

Third, the C# garbage collector also takes some time to housekeeping the app memory. It should move some variables between heap tiers (there are four tiers on the heap), compact the memory and update memory block data.

Fourth, I think that it is very Microsoft to sacrifice speed to convenience and easy of using C#.

1° Actually. C# compiles to IML. At runtime, the IML is compiled to native code. Technically, this JIT compiler could do another optimizing phase here tailormade to the actual processor and OS the app is running on. I haven't kept up on details but I doubt this is currently happening, but it is a possibility this'll happen in the future.
This is mainly why .NET code in general is so close to C++ compiled code, since it is in fact running native machine code. and NOT IML being interpreted each time.

2° Again, this isn't strictly true. C# has stack based variables as well. It's just that you can't have objects, only pointers to allocated objects. A practice you don't HAVE to obey in C++, but which in general is good practice anyway to keep your stack consumption low.

3° THIS is a major cause of performance differences. C++ offers you a much more tight control over memory allocation and deallocation. Garbage collection is good but it's still overhead that will kick in at times you may not want it to.

4° Sorry, but this isn't "microsoft" at all. C# was designed to be a "RAD" convenience language, something a novice programmer can learn to use fast and effectively, and a language that would make it easier for experienced programmers to design the UI portion of their apps.
As good as frameworks as MFC and QT are, designing a whole UI in those languages tends to be tedious. Doing the same in C# tends to be a lot easier.
Part of that RAD approach is not having to remember to clean up stuff... In 'pure' languages, the cleanup is typically where the bulk of the bugs that make it in to release builds end up. It's simply harder to find them than results that aren't as you expect them to be.
The very concept is present in Java as well. A language microsoft had nothing to do with (and which they now probably wished they never even made compilers and tools for).

The gist of it... C# runs fast because it is ultimately native machine code once it's running.
Well written C++ will always be faster than well written C#.

I have seen very well written C++ code that did several times (yes, TIMES, not percentages) faster than an equally well written version in C#. There's just no substitute for having a lot of low level control when you want it to.

C# (and to an extent managed C++) take a LOT of the grind out of writing code that deals with COM and a lot of the other new technologies. If you have a sizable application it's very well worth considering making the performance critical parts of your app in C++, and making the UI and the interfacing in C#.

Part of being a good C++ programmer, is knowing when C++ isn't the best choice of language for a problem at hand.

Re: Performance of Managed vs Unmanaged code

Performance isn't everything. In many cases, C# will have very acceptable performance to C++, so it becomes a question of ease of development.

Consider passing data around using Message Queuing (MSMQ). Performance doesn't really matter in this fire and forget scenario (after all, there isn't going to be much perf different for a client pushing a message onto the queue between C++ and C#).

In terms of development, let's look at the two approaches in code.

C++:

Code:

HRESULT SendMulticast( WCHAR * wszAddress ) { // Validate the input string. if (wszAddress == NULL) { return MQ_ERROR_INVALID_PARAMETER; } // Define the required variables and constants. const int NUMBEROFPROPERTIES = 5; // Number of properties DWORD cPropId = 0; // Properties counter HRESULT hr = MQ_OK; // Return code HANDLE hQueue = NULL; // Queue handle // Define an MQMSGPROPS structure. MQMSGPROPS msgProps; MSGPROPID aMsgPropId[NUMBEROFPROPERTIES]; MQPROPVARIANT aMsgPropVar[NUMBEROFPROPERTIES]; HRESULT aMsgStatus[NUMBEROFPROPERTIES]; // Specify the message properties to be sent. aMsgPropId[cPropId] = PROPID_M_LABEL; // Property ID aMsgPropVar[cPropId].vt = VT_LPWSTR; // Type indicator aMsgPropVar[cPropId].pwszVal = L"test message"; // The message's label cPropId++; // Initialize the MQMSGPROPS structure. msgProps.cProp = cPropId; // Number of message properties msgProps.aPropID = aMsgPropId; // IDs of the message properties msgProps.aPropVar = aMsgPropVar; // Values of the message properties msgProps.aStatus = aMsgStatus; // Error reports // Generate a multicast address format name. WCHAR * wszFormatName = NULL; DWORD dwFormatNameLength = 0; dwFormatNameLength = wcslen(wszAddress) + 11; wszFormatName = new WCHAR[wcslen(wszAddress) + 11]; if (wszFormatName == NULL) { return MQ_ERROR_INSUFFICIENT_RESOURCES; } memset(wszFormatName, 0, dwFormatNameLength*sizeof(WCHAR)); // ************************************ // You must concatenate the string "MULTICAST=" and wszAddress into // the wszFormatName buffer. // wszFormatName = "MULTICAST=" + wszAddress // If the format name is too long for the buffer, return // FALSE. // ************************************ // Open the queues using the multicast address format name. hr = MQOpenQueue( wszFormatName, // Multicast address format name MQ_SEND_ACCESS, // Access mode MQ_DENY_NONE, // Share mode &hQueue // OUT: queue handle ); // Free the memory that was allocated for the format name string. delete [] wszFormatName; // Handle any error returned by MQOpenQueue. if (FAILED(hr)) { return hr; } // Send the message to the queues listening to the multicast // address. hr = MQSendMessage( hQueue, // Queue handle &msgProps, // Message property structure MQ_NO_TRANSACTION // No transaction ); if (FAILED(hr)) { MQCloseQueue(hQueue); return hr; } // Close the queue. hr = MQCloseQueue(hQueue); return hr; }

C#:

Code:

//Send the message the queue using( var mq = newMessageQueue( "FORMATNAME:MULTICAST=224.1.2.3:8001" ) ) { mq.Send( "Test message" ); }

Re: Performance of Managed vs Unmanaged code

Quote:

Originally Posted by OReubens

2° Again, this isn't strictly true. C# has stack based variables as well. It's just that you can't have objects, only pointers to allocated objects. A practice you don't HAVE to obey in C++, but which in general is good practice anyway to keep your stack consumption low.

On the contrary, putting objects on the stack is a very good practice in C++. A well-designed object will keep itself small by using the heap internally as necessary, so stack overflow isn't a problem; but cleanup is also automated since you just let the stack unroll and everything is taken care of.

Furthermore, the "use the heap internally" design caveat is in many cases trivial, since you can foist off the actual heap management on STL containers and the like.