Alternatively, this code may be a good candidate to benefit from going to MMX/SSE/SSE2, but that would mean non portable code, and code that may not work as well on varying processors.
It's all Greek & Latin to me! :P