Digital Signal Processing performance improvement help

I created a function which apply weighted moving average and Loglikehood ratio test and Binary Integration to process my XBand Radar signal.

I flashed and ran this function on particle Photon with wifi on and using micro() function to get time before and after execute this function. The result I got is ~12600 microseconds. I know many ppl have better way to optimize this code, so here it is. thanks for all your help.

 int Detection::Detector(volatile float *XBsignal,volatile float *PIRsignal, int lenXBsignal, int lenPIRsignal, float noise_Var, float Vt, float signal_Var,int Fs,float meanXBVal,int *PIRsum)
{
    int Fo = 100;  //default for stage 2, 10 observation
    int Fo1 = 10;  // default stage 1, 10 observation
    int *stage1Detection = NULL;
    stage1Detection = (int*) malloc(Fo*sizeof(int));
    int window = 7;  //moving average window, using Matlab to calculate window
    float *binomialCoeff = NULL;
    int lenbinomialCoeff;
    float h[2] = {0.5,0.5};
    binomialCoeff = (float*) malloc(window*sizeof(float));
    
    
    conv(h,h,2,2,binomialCoeff,&lenbinomialCoeff);
    
    for (int n = 1;n<=window-3;n++)
    {
        conv(binomialCoeff,h,lenbinomialCoeff,2,binomialCoeff,&lenbinomialCoeff);
    }
    
    float lrt =(2*pow(noise_Var,2)*pow(signal_Var,2)/(pow(signal_Var,2) - pow(noise_Var,2)))*log(Vt) - (Fs/Fo)*log(noise_Var/signal_Var);
    
    // Detection::SetValues(XBsignal, lenXBsignal);
    mean = meanXBVal;
    // XFiltered = (float*) malloc(Fs*sizeof(float));
    // Detection::WMA(XBsignal,XFiltered,lenXBsignal,7);
    // do not need mean value anymore free memory
//120 usecond up to here
    int sumXB = 0;
    int sumPIR = 0;
    int stage2Observation = 0;
    int sumXBandstage2 = 0;
    int sumPIRstage2 = 0;
    int k;
    for (int observation = 0;observation < Fo; observation++)
    {
        float sumxobssq = 0;
        // calculate weighted moving average for each element
        for (int i = 0; i < Fs/Fo; i++)
        {
            float WMA_k = 0;
            k  = i + (Fs/Fo)*observation;
            
            if (k < window - int(window/2))
            {
                for (int n = (window/2)-i;n < window;n++)
                {
                    WMA_k = WMA_k+binomialCoeff[n]*(XBsignal[n-window/2 + k]-mean);
                }
                // wsignal[k] = WMA_k;  //should be devide by sum of weight, but sum of weight = 1 in this case
            }
            else if (k < Fs - int(round(window/2)))
            {
                for (int n = 0;n<window;n++)
                {
                    WMA_k = WMA_k+binomialCoeff[n]*(XBsignal[n+k-int(window/2)]-mean);
                }
                // wsignal[k] = WMA_k;  //should be devide by sum of weight, but sum of weight = 1 in this case
            }
            else
            {
                WMA_k = XBsignal[k] - mean;
            }
            //end weighted moving average
                sumxobssq = sumxobssq + pow(WMA_k,2);  // calculate N*R^2 for log likelihood ratio test
        }
        if (sumxobssq > lrt)
        {
            sumXB = sumXB + 1;
        }
        sumPIR = sumPIR + PIRsignal[observation];
        stage2Observation++;
        if (stage2Observation == Fo1)
        {
            if (sumXB > 7)
                sumXBandstage2 =  sumXBandstage2 + 1;
            if (sumPIR > 7)
                sumPIRstage2 = sumPIRstage2 + 1;
            
             stage2Observation = 0;
             sumPIR = 0;
             sumXB = 0;
         }
     }
    
     (*PIRsum) = sumPIRstage2;
     free(stage1Detection);
    //  free(XFiltered);
     stage1Detection = NULL;
    //  XFiltered = NULL;
    return (sumPIRstage2 + sumXBandstage2);
}

Don’t use floats! ( Look into fixed point math )

If you need to do large amounts of DSP you might want to just dump the raw data to a server and have it crunch numbers. Otherwise you need to know how microcontrollers handle mathematical operations.

Basically :
floats = evil
multiplying ints = sometimes evil
dividing ints = almost always evil
multiplying floats = very evil
dividing floats = very very very evil

Thanks @jakeypoo, that help. One of my objective is doing all DSP on this Photon, so I try to optimize as much as possible without going use a server to do all the processes.

I will look into data type and dividing, I understand all math instruction function does not have equally run time. Faster to do multiply than divide.

None of the above quoted code depends on the inputs so I would either calculate the binomial coefficients once in setup() and store them in a global array or more likely just write out the precomputed values, probably as integers as @jakeypoo said. Try scaling by 64 for your current window length the see the integer values.

I would try to find out how much actual precision your RADAR return signal has and use integers as @jakeypoo said. I am not sure if pow(x,2) is faster or slower than x*x but I would guess slower. Similarly log(x) can be approximated in a variety of ways that might be faster than what you are using now.

1 Like

Thanks @bko. I plan to make an Initialized() function to get those common, input independence initialized. Just curious, how do you know 64 is good value to scale? I varied it in Matlab and 64 is exactly what I need to make them integer.

look like it is = (2^window)/2?

1 Like

I guess I would think of this as (1/2)^(window-1), giving you 1/64, but that’s the right idea!