Whats peoples thoughts on newmat vs boost for matrix manipulation.
I've used newmat before and find it very good. I've also heard boost is good but have never used it as yet. Can boost do everything newmat can?
Printable View
Whats peoples thoughts on newmat vs boost for matrix manipulation.
I've used newmat before and find it very good. I've also heard boost is good but have never used it as yet. Can boost do everything newmat can?
I'd probably go with LAPACK for heavy-duty matrix work, since there are highly optimized builds of it custom-tailored for specific CPUs available from several vendors. Or any C++ package that uses LAPACK.
Only tricky bit is that LAPACK, being Fortran-based, assumes column-major matrix ordering.
In addition to LAPACK based code, you might want to take a look at various APL based libraries.
APL was (as a language) specifically designed to handle multi-dimensional matrix operations. At the core is the ability to perform minimal (lazy) evaluations. When it was introduced (1970's), an APL based small system [it would site on a desktop] could outperform even the fastest mainframes.
According to the Wikipedia article, the notion that APL derives its speed from lazy evaluation is a myth: they claim it actually comes from the ability to do optimal instruction scheduling since equations are specified without specific guidelines about what needs to be evaluated first. This also makes it extremely extensible to multi-core and other parallel architectures.
take a look here
http://research.microsoft.com/en-us/...-e5777f48f77b/
Hmm, haven't seen that one before. What, does MS have a compulsive need to compete with OpenCL or something?
There's a current trend to open up graphics cards for general number crunching. This is what nVidia offers for example,
http://www.nvidia.com/object/cuda_learn.html
One needs to be careful whe using Wikipedia. ;)
The MCM 800 was built specifically to handle APL. "Micro Computer Machines, Inc." ran into financial issues and "ILC Data Device Corp" recieved a number of these machines along with detailed engineering documentation as compensation (approx 1977). At the time I was a "Software Technician" with the engineering department and spent a few months dedicated to understanding the machine. [as a side note, these financial troubles were the driving factor of the "name change" to "MCM Computers
While different terminologies are used, the core premise was (and still is) that a command can be executed, without performing all of the operations implied by that operation in order to satisfy the computational requirements of later commands.
A trivial example is the multiplication of two matrices, followed by a summation of the values in the first row of the result. Even if the intermediate array was [1000][1000], the calculation of 999,000 cells never needed to be performed.
By having the language/platform determine what portions of calculation where actually necessary, without any direction from the programmer is still a valuable technique today. When combined with the modern parallization methods (which were not practical 30+ years ago), the effect is amazing to many people.
In closing, my previous recommendation was simply "to check out" the features provided. If the math has already been reduced down to an optimal (or near optimal) pathway, then the other alternatives may provide much better options. But if it is desirable to express the material in the general mathematical terms, and let the computer figure out which calculations (and in what order) are necessary, then APL is still hard to beat.
(I recently used it when working on a photon detector for medical use. The physicists expressed everything as high order matrix calculations, while the engineers wanted/needed discrete analytical results. Since the physicists wer constantly revising their equation sets, being able to express them directly and running a new set of simulation was a great help)
BSGP (Microsoft's offering) makes some good sense because it is NOT tied to a specific graphics card vendor.
So far [IMPO] every alternative listed has its strengths and weaknesses when applied to specific conditions. Attempting to say "X is better than Y" without knowing more about the specifics of coleteks' requirements is impossible.
Even the choice to use the GPU for non-graphical calculations may yield slower system performance if the GPU is at saturation doing complex 3-D rendering. The number of cores/processors also comes into play when measuring the items listed for scalability (Consider what will happen when Keifer is releases in a year or so, how will 32 CPU's stackup against a single or dual GPU?
Unless more detailed information is provided by coletek, I believe the best bet is to look at the information that was presented, and then measure which one of the viable alternatives provide the best fit. Raw performance is often secondary to other criteria provided it is fast enough to meet the requirements
First of all if the case is the GPU is busy doing 3-D rendering CPU will most likely be at 100% if there is a multicore CPU, than the cores will be busy with other threads, even if they are idle.
and its not reasonable to assume that there is a 3D app running while doing complex heavy processing.
multicore CPUs will be alot worse in data-processing than a single core.
GPU's are really good at processing large ammount of data, it will be a rare case that in this situation a new GPU will lose against a new CPU.
CPUs are really to process the multi-task system while GPUs are for 3d processing.
There are also cards available that are for physics processing. In the end the best soluttion would be a new card just for doing data-processing, but i dont see that happening.
The question of GPU vs multi-core CPU is largely this: Is the algorithm in question fine-grained parallelizable, or course-grained parallelizable? The former favors a GPU, the latter a multi-core setup.
even tho operating system will interfere with multi-tasking?
idk if you can claim a whole core in a program?
Depends alot on the motherboard...consider:
if the machine is going to be dedicated to one application then that is ALOT of serious processing power.Quote:
Compared to a quad-socket Xeon X7460 (24 cores) at 2.66 GHz, the dual-socket X5570 at 2.93 GHz with HT enabled (two fewer physical CPUs, but 16 virtual cores and 8 physical cores) came in just 3.2% behind at 25,000 (compared to X7460's 25,830). With HT disabled (comparing 8 physical cores to 24 cores) it came in slightly lower at 23,650, about 8.4% behind X7460.
The exact point is that without knowing more about the specific use case, either end of the spectrum (or somewhere in the middle) is possible.
Much depends on the data coupling, and how much can be broken into independant chunks...again we dont know...
There is little doubt that a dedicated processor would be a good choice, but once again we know nothing about the actual situation to know if this would or would not be appropriate.
Every point you have raised could easily be true, but there definately not enough known to make an accross the board declaration as to any of the approaches (including the one I introduced) being the "best" choice for a specific (but unknown) condition.
i agree its all "if"s
but the fact that the multi task system will "poison" the core and that GPUs can process larger chunks at higher interval. i geuss there is a way to "dedicate" a CPU core to one thread, but i still dont see how that can beat a GPU
hmm i was kinda hoping that physics processors would become standard in computers...
CPU's will get more and more cores but it will take some time until they reach hundreds which are available on the GPU today.
Physics processors are kind of standard already because most new computers have a programmable GPU.
http://www.ddj.com/hpc-high-performa...ting/207200659
Both the CPU and GPU can be programmed in the same way using fine-grain parallelism. For the CPU the Intel TBB library can be used for example.
http://www.threadingbuildingblocks.org/
I know they've got DirectX, I was just kinda hoping that for once everyone could just get behind one standard (OpenCL) rather than throwing multiple competing solutions at the problem yet again. Probably a good thing in the long run, but still slightly vexing.
Anyway, the big bottleneck on GPUs for now is memory bandwidth, not core count. At least that was the case a year ago....I mean, everyone knows moving data between main memory and GPU memory is slow, but even on-card data transfers were a bottleneck in the last GPU program I worked on.
It's a complex world and big companies want to control it. There will never be just one standard. Although a relief for programmers I don't think one standard would be any good in the long run. Also standards need competition to stay fit. For example OpenGL really had become old and tired.
The memory bandwidth problem isn't that much of a problem in practice. You just make sure to pass as little a possible as seldom as possible. Most applications can be decomposed to achieve this.