How to optimize this function ?
Hi, ppl.
I want to optimize the wave function in the main code. I tried SSE instructions but the code become slower. I'm compiling using gcc -O3 -march=i686 -mcpu=i686 -malign-double -funroll-loops -fomit-frame-pointer -o float_conv float_conv.c -lm
Can anybody help me?
Thanx.
#include <stdio.h>
#include <math.h>
#include <unistd.h>
#include <sys/time.h>
float points[] = {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1};
float y[] = {3, 3.38382, 3.14269, 2.47603, 1.59766, 0.698843, -0.0755374, -0.637482, -0.954788, -1.04158, -0.944139};
float xx[4] = {1.02938, 7.39821, 5.98217, 3.8127};
float vt;
int main(void){
struct timeval ts,te;
int i,j,k = 500000;
int nx=11;
float somaN = 0;
float val;
printf("Calculation...\n");
gettimeofday(&ts,NULL);
for (j = 0; j < k; j++)
{
for (i=0;i<=nx;i++)
{
//WAVE FUNCTION
val = exp(-xx[2]*points[i]/2.0)*(xx[0]*cos(xx[3]*points[i])+ xx[1]*sin(xx[3]*points[i]));
somaN = somaN+(y[i]-val)*(y[i]-val);
}
val = sqrt(somaN)/(nx+1);
}
gettimeofday(&te,NULL);
printf("Time: %f\n",(double)(te.tv_sec-ts.tv_sec)+0.000001*(double)(te.tv_usec-ts.tv_usec));
return 0;
}
Re: How to optimize this function ?
Have you tried working with double instead of float? I'm not exactly sure that exp, sin, cos and the like have a float overload, so there may be lots of unneccessary conversions.
By the way, you have a bug with your bounds on i. The y[] array has 11 values, yet you loop from 0 to 11 inclusive. y[11] doesn't exist, so you're accessing one element past the end of the array. The loop should be
for (i = 0; i < nx; i++)
Re: How to optimize this function ?
Run it with a profiler, and find out where the time is going. Then attack the code that's taking most time. Any other approach is flailing in the dark.
Re: How to optimize this function ?
Well, Graham in this particular case, there is not much point running it in a profiler. What else is it going to say than that the line doing the calculations is taking the most time?
val = exp(-xx[2]*points[i]/2.0)*(xx[0]*cos(xx[3]*points[i])+ xx[1]*sin(xx[3]*points[i]));
Re: How to optimize this function ?
If you can sacrifice some accuracy, you could use a lookup table for the cosine and sine values, then use something simple like linear interpolation between the points. This will be much faster than actually numerically approximating the functions.
Re: How to optimize this function ?
The following line is computed k * nx times:
Quote:
val = exp(-xx[2]*points[ i ]/2.0)*(xx[0]*cos(xx[3]*points[ i ])+ xx[1]*sin(xx[3]*points[ i ]));
k is very big. But since that line does not depend on k, only on i which is small, then it would be better to place that line outside of the loop on k. This can be done if you have a table of val[] for all the values of i, and in the inner loop, you'll use the precomputed val[ i ] instead of val.
Re: How to optimize this function ?
You can change the exp part calculating once
double res = exp(-xx[2]*0.1/2.0);
so exp(-xx[2]*points[i]/2.0) = intpow(res, i);
Since pow is for double exponent (very slow for integral exponent) you define your own pow function(recursive or whatever)
and MAYBE (i havent check it out) for sin and cos since
cos(xx[3]*points[i]) = cos(xx[3]*0.1*i) = cos(A*i) you can use the :
sin(nA) = sinA*{ (2cosA)^(n-1) - ( n - 2 | 1) * (2cosA)^(n-3) + ( n - 3 | 2 ) *(2cosA) ^(n-5) - ...}
and
cos(nA) = 1/2*{(2cosA)^n - n/1 * (2cosA)^(n-2) + n/2 * ( n - 3 | 1) * (2cosA)^(n-4) - n/3 * ( n - 4 | 2 ) * (2cosA)^(n-6) + ...}
Re: How to optimize this function ?
You do xx[3]*points[i] twice on this calculation, try creating a variable and save this 2nd calculation. You can also put these 2 fors variables in a register(for (register inti=0 ; ...) , if the compiler can put them in a register this will speed up these for loop. Also, Bob Davis gave a good hint, instead of use an array of angles if you can use an array of sin and cos of these angles (if they will not change) you will save the sin and cos calculations. You will not lose acurancy if you get them from a good calculator, the windows calculator for exemple.
Re: How to optimize this function ?
Rabelo the "hard" part in this code is the exp, cos and sin functions, not adds and muls
Re: How to optimize this function ?
Hi Yves.
Quote:
Originally Posted by Yves M
Have you tried working with double instead of float? I'm not exactly sure that exp, sin, cos and the like have a float overload, so there may be lots of unneccessary conversions.
I tried this, with no success.
Quote:
Originally Posted by Yves M
By the way, you have a bug with your bounds on i. The y[] array has 11 values, yet you loop from 0 to 11 inclusive. y[11] doesn't exist, so you're accessing one element past the end of the array. The loop should be
for (i = 0; i < nx; i++)
You're right. I put here a slightly modified version of my code. But the correct version remains slow.
Thanx.
Re: How to optimize this function ?
Hello Bob.
Unfortunatelly, I can't use a table.
Quote:
Originally Posted by Bob Davis
If you can sacrifice some accuracy, you could use a lookup table for the cosine and sine values, then use something simple like linear interpolation between the points. This will be much faster than actually numerically approximating the functions.
Re: How to optimize this function ?
Hi olivthill.
The line must be inside the loop. This code is only to test the evaluation function of a genetic algorithm. Using a profiler I detected that this function is the most called in my program. So, I want to make it faster.
Quote:
Originally Posted by olivthill
The following line is computed k * nx times:
k is very big. But since that line does not depend on k, only on i which is small, then it would be better to place that line outside of the loop on k. This can be done if you have a table of val[] for all the values of i, and in the inner loop, you'll use the precomputed val[ i ] instead of val.
Re: How to optimize this function ?
Sorry, bilm_ks.
I forgot to say that xx and y values change a lot of times during the process. The code I put here is only a small part of my program.
Quote:
Originally Posted by bilm_ks
You can change the exp part calculating once
double res = exp(-xx[2]*0.1/2.0);
so exp(-xx[2]*points[i]/2.0) = intpow(res, i);
Since pow is for double exponent (very slow for integral exponent) you define your own pow function(recursive or whatever)
and MAYBE (i havent check it out) for sin and cos since
cos(xx[3]*points[i]) = cos(xx[3]*0.1*i) = cos(A*i) you can use the :
sin(nA) = sinA*{ (2cosA)^(n-1) - ( n - 2 | 1) * (2cosA)^(n-3) + ( n - 3 | 2 ) *(2cosA) ^(n-5) - ...}
and
cos(nA) = 1/2*{(2cosA)^n - n/1 * (2cosA)^(n-2) + n/2 * ( n - 3 | 1) * (2cosA)^(n-4) - n/3 * ( n - 4 | 2 ) * (2cosA)^(n-6) + ...}
Re: How to optimize this function ?
I think you're right. That's why I tried SSE intrinsic to do the job, but the codes I found only does add, sub, mul and div, not trigs.
Quote:
Originally Posted by bilm_ks
Rabelo the "hard" part in this code is the exp, cos and sin functions, not adds and muls
Re: How to optimize this function ?
Nobody else has another clue ?