Click to See Complete Forum and Search --> : fir filter


dave2k
June 22nd, 2008, 02:00 PM
i have code for fir filter below. The way it's coded means it is really slow during debugging (vs 8). Can anyone identify where the slow bits are or even better, suggest quicker code?

Thanks

vector<double> filter( const vector<double>& b, const vector<double>& x )
{
vector<double> y;
vector<double> prev_x;


// Cycle for x
for( size_t i = 0; i < x.size(); i++ ) {
double y_n;
prev_x.push_back( x[i] );

if( prev_x.size() > b.size() )
prev_x.erase( prev_x.begin() );

// Cycle for prev_x
size_t nt = prev_x.size();
y_n = 0;
for( size_t j = 0; j < nt; j++ ) {
y_n += b[j] * prev_x[nt-j-1];
}
y.push_back( y_n );
}
return y;
}

by the way vectors are by know means necessary - arrays are ok...

laserlight
June 22nd, 2008, 02:41 PM
It may be possible to make some macro-optimisation of the algorithm itself, but barring that it looks like you may be better off working with iterators to the ranges involved instead of maintaining a temporary vector that has elements pushed and erased. Of course, note that without optimisations turned on it may well be normal for std::vector to be slow, and even with optimisations turned on MSVC may include optional bounds checking that would slow things down.

Hermit
June 22nd, 2008, 03:29 PM
How many taps are you testing with? If the tap of the filter is sufficiently large, you might get big speed improvements by using "fast convolution" - that is, multiplying in the frequency domain rather than convolving in the time domain.

I'll also second Lindley's suggestions. First of all, it's never a good idea to use a debug build to profile performance. Secondly, the way you're using vectors is pretty inefficient.

To insert the newest sample and discard the oldest one, you are actually pushing and erasing elements of the vector, which entails a lot of unnecessary copying (and some unnecessary allocations). You can get the same effect by using a proper circular buffer, where inserting a new sample consists only of incrementing a pointer mod the size of the vector, and a single dereference-assignment.

I would also not return a vector, but instead accept an output iterator as an argument. Although return value optimization makes this a non-issue in simple cases, it's my understanding that a common DSP practice is to allocate all necessary buffers once and to use them repeatedly as more data is available. In order to support this, your algorithm would need to be able to output data to an existing buffer.