multicore support in C#

Printable View

Show 50 post(s) from this thread on one page

December 10th, 2008, 08:45 PM
ZOverLord

Re: multicore support in C#

Quote:

Originally Posted by Mutant_Fruit

This is an exercise in learning how to multi-thread an applications workflow. It deals with splitting a task up into decent sized chunks which can then be executed in parallel. It isn't really relevant to the discussion unless you want to use it as an example of how you can achieve great performance by letting windows automatically manage which core your thread executes on...

I'm not sure what relevance this has at all.

I provided the links, I know what the page contents are about, the point in the muti-thread link was to show how the libraries failed to provide a perfomance improvement. Which means, in some cases, it maybe better, to create your own methods, for your core selection than use libraires.

The second link shows how to create factors. I am sure the person that started this thread can see that.
December 11th, 2008, 04:18 AM
Mutant_Fruit

Re: multicore support in C#

Quote:

Originally Posted by ZOverLord

First, we are NOT talking about threads here, we are talking about launching processes. Secondly, A standard desktop has more than the windows scheduler. Here is just one example of many. But one example trumps "no other way ", so my point has been made.

Unfortunately, that's not a scheduler. So yes, the windows scheduler is still the only thing that can bounce threads around onto the 'best' core. I did make a confusing statement alright in my last post, what i was trying to say is that there's no way to automatically bounce threads between cores so that all threads are spread evenly. Of course, there's a way to automatically set affinity. C# exposes this, the task manager exposes this (if you care to check it out) and any program can expose this facility.

However, you're still left with the extremely difficult task of finding the best core for each and every thread.

Quote:

If it was 'stupid and braindead' as you say then why is load-balancing part of the next version of Visual Studio 2010? Is Microsoft 'stupid and braindead' as well?

Load balancing != choosing a specific core.

Quote:

from: http://channel9.msdn.com/pdc2008/TL26/
from: http://msdn.microsoft.com/en-us/conc...y/default.aspx

Parallel programming != choosing specific cores.

Quote:

The walkthroughs in this section introduce you to the use of the PerformanceCounter component. The walkthroughs show you how to use both system performance counters and custom performance counters.

from: http://msdn.microsoft.com/en-us/library/d8xb98ke.aspx

Performance counters can give you this information on cpu usage etc, but if you want to even attempt to utilise this you are essentially implementing a userland scheduler, which runs under the windows scheduler, and as I said before - will be constrained by the windows scheduler.

Quote:

Microsoft is NOT adding this to Visual Studio 2010 for development machines, it's doing it to help you load-balance on your users system.

Load balancing is good - setting tasks to a specific core is bad unless you know that no other application will be choosing that specific core.

Quote:

There is no standard server or desktop. What would they be? What do they look like?

Not an embedded system. Not one which has 1000 cores in a single box. One which does not use a third party scheduler which overrides cpu affinity set by a program.

Quote:

I provided the links, I know what the page contents are about, the point in the muti-thread link was to show how the libraries failed to provide a perfomance improvement.

I have seen benchmarks where the the library was used incorrectly/badly.

http://garuma.wordpress.com/2008/07/...ith-a-p-twist/

The parallel library can improve performance significantly when it's used for a suitable task. Once again, this has nothing to do with setting specific tasks to specific cores.

1) So, setting a specific task to a specific core is still bad.
2) Yes you can get performance counter information from windows - no i really don't believe you can implement a userland scheduler which will outperform the built in scheduler. However, i haven't ever seen a userland scheduler nor am I going to implement such a thing to prove my point.
3) Splitting up a large task and letting the OS schedule that work onto the best core is good. It's what people have been doing for years and will continue do to for decades to come.
4) It is still hard to tell from usercode which core is good because you never know what tasks will be starting/stopping at any given instant. A userland scheduler will just be a poor and slow imitation of the windows scheduler.

And that's me for this thread. I can't make my point any clearer and I have yet to see a good rebuttal for my basic point:

If you force an intensive task to core0 and there's already an intensive task on core0 - you've just cut your performance.
December 11th, 2008, 04:59 AM
ZOverLord

Re: multicore support in C#

Quote:

Originally Posted by Mutant_Fruit

Unfortunately, that's not a scheduler. So yes, the windows scheduler is still the only thing that can bounce threads around onto the 'best' core. I did make a confusing statement alright in my last post, what i was trying to say is that there's no way to automatically bounce threads between cores so that all threads are spread evenly. Of course, there's a way to automatically set affinity. C# exposes this, the task manager exposes this (if you care to check it out) and any program can expose this facility.

However, you're still left with the extremely difficult task of finding the best core for each and every thread.

Load balancing != choosing a specific core.

Parallel programming != choosing specific cores.

Performance counters can give you this information on cpu usage etc, but if you want to even attempt to utilise this you are essentially implementing a userland scheduler, which runs under the windows scheduler, and as I said before - will be constrained by the windows scheduler.

Load balancing is good - setting tasks to a specific core is bad unless you know that no other application will be choosing that specific core.

Not an embedded system. Not one which has 1000 cores in a single box. One which does not use a third party scheduler which overrides cpu affinity set by a program.

I have seen benchmarks where the the library was used incorrectly/badly.

http://garuma.wordpress.com/2008/07/...ith-a-p-twist/

The parallel library can improve performance significantly when it's used for a suitable task. Once again, this has nothing to do with setting specific tasks to specific cores.

1) So, setting a specific task to a specific core is still bad.
2) Yes you can get performance counter information from windows - no i really don't believe you can implement a userland scheduler which will outperform the built in scheduler. However, i haven't ever seen a userland scheduler nor am I going to implement such a thing to prove my point.
3) Splitting up a large task and letting the OS schedule that work onto the best core is good. It's what people have been doing for years and will continue do to for decades to come.
4) It is still hard to tell from usercode which core is good because you never know what tasks will be starting/stopping at any given instant. A userland scheduler will just be a poor and slow imitation of the windows scheduler.

And that's me for this thread. I can't make my point any clearer and I have yet to see a good rebuttal for my basic point:

If you force an intensive task to core0 and there's already an intensive task on core0 - you've just cut your performance.

The links I provided are based on facts, the orginal poster can determine the difference between your unsubstantiated statements and facts.

Again you stated "A standard desktop only has the windows scheduler. It has no other way to automatically set thread affinity" One would need to agree that by forcing a process into a specific core, that another method is present to "automatically set thread affinity" since the process itself can be placed in a different core then the windows scheduler may have placed it in. But you can't see that.

My Links stand as fact, supported by Microosft as well.

None as in "Zero" of Your point(s), besides saying that the libraries can provide performance improvements ("Which I never said is not true, I did state that this is not true in all cases, which does not mean never, lol") are not supported by facts.

I have asked for links to white papers or links to articles that specifically support your allegations, and so far, you have produced none. Which does not surprise me at all.

Suddenly, you are quoting links about how good the libraries are, yet you call these concepts "stupid" in your other posts, and you refuse to "Own Up" to the fact that Visual Studio 2010 will allow you to do these "Same" stupid things, lol.

You also, still, are not capable to comprehend, that the original poster is talking about process load balancing , not thread load balancing, and they would also like to exlclude cores from the mix of cores that will get processes spawned by their application as well, but leave those cores avaliable for other processes.

Your solution, is to "Fool" the "Entire" operating system into thinking that some cores are not present, this also does not surprise me.

Yes, using your methods, could, in fact, cause a 200 percent slow down, and more, in processing, because when you make entire cores go away for any and all processing on a system, that kind of stuff happens.

So, I will leave you with your opinions, as well as many links here, that support the facts.

I will answer any questions I can, if the creator of this thread would like my input, but it is "Fruitless" to banter with your unsubstantiated opinions and beliefs.
December 11th, 2008, 04:02 PM
toraj58

Re: multicore support in C#

The amount of performance gained by the use of a multicore processor depends on the problem being solved and the algorithms used, as well as their implementation in software (Amdahl's law). For so-called "embarrassingly parallel" problems, a dual-core processor with two cores at 2GHz may perform very nearly as fast as a single core of 4GHz. Other problems though may not yield so much speedup. This all assumes however that the software has been designed to take advantage of available parallelism. If it hasn't, there will not be any speedup at all. However, the processor will multitask better since it can run two programs at once, one on each core.

In addition to operating system (OS) support, adjustments to existing software are required to maximize utilization of the computing resources provided by multi-core processors. Also, the ability of multi-core processors to increase application performance depends on the use of multiple threads within applications. The situation is improving: for example the American PC game developer Valve Corporation has stated that it will use multi core optimizations for the next version of its Source engine, shipped with Half-Life 2: Episode Two, the next installment of its Half-Life series., and Crytek is developing similar technologies for CryEngine 2, which powers their game, Crysis. Emergent Game Technologies' Gamebryo engine includes their Floodgate technology which simplifies multicore development across game platforms. See Dynamic Acceleration Technology for the Santa Rosa platform for an example of a technique to improve single-thread performance on dual-core processors.

Integration of a multi-core chip drives production yields down and they are more difficult to manage thermally than lower-density single-chip designs. Intel has partially countered this first problem by creating its quad-core designs by combining two dual-core on a single die with a unified cache, hence any two working dual-core dies can be used, as opposed to producing four cores on a single die and requiring all four to work to produce a quad-core. From an architectural point of view, ultimately, single CPU designs may make better use of the silicon surface area than multiprocessing cores, so a development commitment to this architecture may carry the risk of obsolescence. Finally, raw processing power is not the only constraint on system performance. Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage. If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement. If memory bandwidth is not a problem, a 90% improvement can be expected. It would be possible for an application that used two CPUs to end up running faster on one dual-core if communication between the CPUs was the limiting factor, which would count as more than 100% improvement.

Managing concurrency acquires a central role in developing parallel applications. The basic steps in designing parallel applications are:

Partitioning
The partitioning stage of a design is intended to expose opportunities for parallel execution. Hence, the focus is on defining a large number of small tasks in order to yield what is termed a fine-grained decomposition of a problem.

Communication
The tasks generated by a partition are intended to execute concurrently but cannot, in general, execute independently. The computation to be performed in one task will typically require data associated with another task. Data must then be transferred between tasks so as to allow computation to proceed. This information flow is specified in the communication phase of a design.

Agglomeration
In the third stage, we move from the abstract toward the concrete. We revisit decisions made in the partitioning and communication phases with a view to obtaining an algorithm that will execute efficiently on some class of parallel computer. In particular, we consider whether it is useful to combine, or agglomerate, tasks identified by the partitioning phase, so as to provide a smaller number of tasks, each of greater size. We also determine whether it is worthwhile to replicate data and/or computation.

Mapping
In the fourth and final stage of the parallel algorithm design process, we specify where each task is to execute. This mapping problem does not arise on uniprocessors or on shared-memory computers that provide automatic task scheduling.
On the other hand, on the server side, multicore processors are ideal because they allow many users to connect to a site simultaneously and have independent threads of execution. This allows for Web servers and application servers that have much better throughput.

Show 50 post(s) from this thread on one page