Re: Matrix Multiplication
- From: Evgenii Rudnyi <usenet@xxxxxxxxx>
- Date: Tue, 15 Jan 2008 13:08:40 -0800 (PST)
I would like to re-iterate my point. The text that I has written was
not about making benchmarks at all. It was designed for novices who
are interested in programming of computations with matrices. The goal
was actually to show that a naive three loop implementation is not
good in any programming language. To this end I have chosen the
dimension 1000x1000 deliberately to make sure that the matrix does not
fit the processor cache. On the other hand, in my view it is useful to
make a three loop implementation at the beginning anyway, say as a
reference point.
For those who are interested in theory I would recommend the ATLAS
paper where there is a discussion why a compiler cannot do things that
ATLAS does.
So the main point was that it is important to use a good library even
in this seemingly simple case.
http://matrixprogramming.com/MatrixMultiply/
On Jan 15, 9:31 am, Sebastian Hanigk <han...@xxxxxxxxx> wrote:
pixel <p...@xxxxxxxx> writes:
The original poster's naive three-loop implementation shootout is not
a good benchmark. For other strategies I could point you in the
direction of cache-obliviousness[1].
It's as good a benchmark as any
If you want to benchmark the matrix multiplication performance, it is
not a good benchmark because the implemented algorithm is suboptimal
from a theoretical and practical point of view.
- it dismantles your cache-obliviousness claim which, if true, should
produce a big performance gain, not a tiny one.
Careful. Cache-oblivious algorithms will not provide a performance boost
if your existing algorithm and its implementation are already quite
good. Look again at matrix multiplication: Intel's MKL routines provide
already over 90% of theoretical peak performance, one does not expect
large benefits here. The real benefit of a cache-oblivious algorithm is
its portability; instead of fiddling around with cache sizes, optimal
blocking and so on for every new processor and architecture, you
implement a cache-oblivious algorithm once and it's good on every
architecture - but obviously it will have a hard time against
hardware-optimised implementations.
Furthermore, language does influence the performance - change arrays
from static to dynamic, make matrix sizes a power of two, and
reevaluate what's "ridiculous" about the *same* algorithm grinding to
a halt.
I do not negate the influence of the choice of languages, but compared
with the right choice of algorithms it should be a small effect. There
exists a language benchmark (<http://shootout.alioth.debian.org/>), but
beware that they compare compounds of implementation, language and
run-time system against each other.
Sebastian
.
- Follow-Ups:
- Re: Matrix Multiplication
- From: Sebastian Hanigk
- Re: Matrix Multiplication
- From: user923005
- Re: Matrix Multiplication
- References:
- Matrix Multiplication
- From: Evgenii Rudnyi
- Re: Matrix Multiplication
- From: user923005
- Re: Matrix Multiplication
- From: Evgenii Rudnyi
- Re: Matrix Multiplication
- From: b52b
- Re: Matrix Multiplication
- From: Evgenii Rudnyi
- Re: Matrix Multiplication
- From: Gordon Sande
- Re: Matrix Multiplication
- From: Evgenii Rudnyi
- Re: Matrix Multiplication
- From: Gordon Sande
- Re: Matrix Multiplication
- From: Evgenii Rudnyi
- Re: Matrix Multiplication
- From: user923005
- Re: Matrix Multiplication
- From: Gordon Sande
- Re: Matrix Multiplication
- From: user923005
- Re: Matrix Multiplication
- From: pixel
- Re: Matrix Multiplication
- From: Sebastian Hanigk
- Matrix Multiplication
- Prev by Date: Re: Free sending and sharing any whole Electrical Eng Ebooks and solution manual/ Solutions Manual to Modern Control Engineering by Ogata (4th edition)
- Next by Date: Re: Free sending and sharing any whole solution manual Ebook
- Previous by thread: Re: Matrix Multiplication
- Next by thread: Re: Matrix Multiplication
- Index(es):
Relevant Pages
|