Resampling

Imports¶

In [1]:

            
                Copied!
                
                    
                    
                
                

        
import time
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

from openseize import producer
from openseize.file_io.edf import Reader
from openseize.demos import paths
from openseize.filtering.fir import Kaiser
from openseize.resampling.resampling import downsample, upsample, resample
import time
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

from openseize import producer
from openseize.file_io.edf import Reader
from openseize.demos import paths
from openseize.filtering.fir import Kaiser
from openseize.resampling.resampling import downsample, upsample, resample

Introduction¶

There are three possible operations that can be applied to a signal to change its sampling rate. The signal may be downsampled from a higher to a lower sampling rate, upsampled from a lower to higher sampling rate, or more generally resampled using a ratio of upsampling to downsampling to achieve non-integer conversion of the signal's sample rate. For example, resampling from 5 KHz to 1500 Hz requries upsampling by 3 then downsampling by a factor of 10. This tutorial covers the mathematics of resampling and the design of resamplers in Openseize. As with filtering, this topic is deep and curious readers that want to go deeper are referred to the references at the end of this tutorial.

Decimation and Expansion Defined¶

At the outset, it is important that a few terms are clarified. First are Decimation and Expansion. In this tutorial we refer to M-fold decimation as the process of picking every Mth sample from a sequence of samples $x[n]$. Mathematically, this is expressed as:

$ y[n] = x[nM] \quad (1) $

As we will see this is distinct from Downsampling. Second, we introduce L-fold expansion as the insertion of L-1 zeros between each pair of samples in $x[n]$. Mathematically this is expressed as:

$ y[n] = x[\frac{n}{L}] = \begin{cases} 1 & \text{n is divisible by L} \\ 0 & \text{otherwise} \end{cases} \quad (2) $

Again this process is distinct from Upsampling.

Naive Downsampling and Aliasing¶

With these definitions out of the way we will first examime how the frequency content of a signal changes when the signal is decimated. To do so we are going to fetch the first channel of data from our demo data. With this data we will then pick out every 25th element. Since the sample rate of the data is 5 KHz this should reduce the sampling rate to 200 Hz.

In [2]:

            
                Copied!
                
                    
                    
                
                

        
# lets read in the first 4 seconds of data from the demo file sampled at 5 KHz.
# if your not sure whats going on here please see the reading_files demo first
fp = paths.locate('recording_001.edf')
reader = Reader(fp)
reader.channels = [0]
fs = 5000
x = reader.read(0, 4*fs) #first 4 seconds of data
# lets read in the first 4 seconds of data from the demo file sampled at 5 KHz.
# if your not sure whats going on here please see the reading_files demo first
fp = paths.locate('recording_001.edf')
reader = Reader(fp)
reader.channels = [0]
fs = 5000
x = reader.read(0, 4*fs) #first 4 seconds of data

To examine the frequencies that are present in this signal, we will use the Fast Fourier Transform.

In [3]:

            
                Copied!
                
# since the signal is real we use the real FFT and we build the frequencies
x_f = np.fft.rfft(x).reshape(-1,1)
x_freqs = np.fft.rfftfreq(x.shape[-1], d=1/fs)
# since the signal is real we use the real FFT and we build the frequencies
x_f = np.fft.rfft(x).reshape(-1,1)
x_freqs = np.fft.rfftfreq(x.shape[-1], d=1/fs)

So the plan here is to plot the frequency content of x, then decimate x by a factor of 25 and compare the frequency content with and without the decimation.

In [4]:

            
                Copied!
                
                    
                    
                
                

        
# plot the frequency content of the full signal and make a zoom to 100 Hz
fig, ax = plt.subplots(figsize=(12, 6))

#make axis with inset and then axarr
axinset = inset_axes(ax, width="50%", height=3, loc=1, borderpad=3)
axarr = [ax, axinset]

#plot and add rectangle for zoomed region
[ax.plot(x_freqs, np.abs(x_f), color='gray') for ax in axarr]
rect = Rectangle((0,0), 100, axarr[0].get_ylim()[-1], color='green', alpha=0.25)
axarr[0].add_patch(rect)

# set xlim of inset, labels, spines
axarr[1].set_xlim([0,100])
axarr[0].set_ylabel('$|X^F(f)|$', fontsize=14)
axarr[0].set_xlabel('Frequency (Hz)', fontsize=14)
[ax.spines.top.set_visible(False) for ax in axarr]
[ax.spines.right.set_visible(False) for ax in axarr]
ax.set_title('Fourier Transform of Undecimated Signal', fontsize=18)

plt.show()
# plot the frequency content of the full signal and make a zoom to 100 Hz
fig, ax = plt.subplots(figsize=(12, 6))

#make axis with inset and then axarr
axinset = inset_axes(ax, width="50%", height=3, loc=1, borderpad=3)
axarr = [ax, axinset]

#plot and add rectangle for zoomed region
[ax.plot(x_freqs, np.abs(x_f), color='gray') for ax in axarr]
rect = Rectangle((0,0), 100, axarr[0].get_ylim()[-1], color='green', alpha=0.25)
axarr[0].add_patch(rect)

# set xlim of inset, labels, spines
axarr[1].set_xlim([0,100])
axarr[0].set_ylabel('$|X^F(f)|$', fontsize=14)
axarr[0].set_xlabel('Frequency (Hz)', fontsize=14)
[ax.spines.top.set_visible(False) for ax in axarr]
[ax.spines.right.set_visible(False) for ax in axarr]
ax.set_title('Fourier Transform of Undecimated Signal', fontsize=18)

plt.show()

In [5]:

            
                Copied!
                
                    
                    
                
                

        
# Now decimate the signal by a factor of 25 reducing the sample rate from 5 KHz to 200 Hz
decimated = x[0, ::25]
decimated_f = np.fft.rfft(decimated)
decimated_freqs = np.fft.rfftfreq(decimated.shape[-1], d=1/fs * 25)

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(x_freqs, np.abs(x_f), label='Undecimated', color='gray')
# note the factor of 25, we have 25x less signal so the amplitude drops by the decimation factor
ax.plot(decimated_freqs, 25*np.abs(decimated_f), label='25-fold Decimation', color='tab:green', alpha=0.5)

ax.set_ylabel('$|X^F(f)|$', fontsize=14)
ax.set_xlabel('Frequency (Hz)', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.set_xlim([0,100])
ax.set_title('Fourier Transform of Decimated Signal', fontsize=18)
ax.legend(fontsize=14)
plt.show()
# Now decimate the signal by a factor of 25 reducing the sample rate from 5 KHz to 200 Hz
decimated = x[0, ::25]
decimated_f = np.fft.rfft(decimated)
decimated_freqs = np.fft.rfftfreq(decimated.shape[-1], d=1/fs * 25)

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(x_freqs, np.abs(x_f), label='Undecimated', color='gray')
# note the factor of 25, we have 25x less signal so the amplitude drops by the decimation factor
ax.plot(decimated_freqs, 25*np.abs(decimated_f), label='25-fold Decimation', color='tab:green', alpha=0.5)

ax.set_ylabel('$|X^F(f)|$', fontsize=14)
ax.set_xlabel('Frequency (Hz)', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.set_xlim([0,100])
ax.set_title('Fourier Transform of Decimated Signal', fontsize=18)
ax.legend(fontsize=14)
plt.show()

What is going on here? The decimated signal doesn't have the same frequency content below 100 Hz as the undecimated data. How can this be? Lets do a little thought experiment. Imagine a wheel with spokes that is spinning at a rate of 600 Hz and you have a strobe light that allows you to vary the freqeuncy of the light pulses. In an autoshop they call this a strobe gun. Next, you turn the lights out and start varying the frequency of the stobe gun. As you increase the frequency up towards 300 Hz (1/2 the wheels frequency) the wheel appears to slow down until you reach 300 Hz and then the wheel appears to stop! If you push the strobe gun a little above 300 Hz, the wheel then appears to rotate backwards. This phenomenon is called aliasing. The frequencies in the range 300 - 600 Hz are masqureading as lower frequencies that you see as counter rotations. Hence the name aliasing.

Mathematically, decimation causes the FFT to be replicated along the frequency axis. If the bandwidth of these replicates exceeds the bandwidth of the downsampled nyquist rate = sample rate / (2M) they will distort each other. This is what is occurring in our data. We have frequencies above our new nyquist of 100 Hz. All that signal from 100 - 5kHz is masquerading as lower frequencies and distorting our signal. Thus, the frequency content in the range 0,100 Hz does not match the frequency content of the original signal in the same (0,100) Hz range.

Our discussion above hints at how to solve this problem. If we lowpass the data so that no signal is above 100 Hz BEFORE we decimate then there is no longer any high frequencies that can masquerade as low frequencies. Lets try this out. We will filter out the signal above 100 Hz then decimate and see if the frequency content of the result matches the frequency content of the original signal.

Antialiasing Filters¶

The two step process of filtering then decimation is called Downsampling. The filter that preceeds the M-fold decimation is called the antialias filter. It is typically designed so that the cutoff is at the new nyquist frequency. In this case that is 100 Hz. Also the width of the transition band is narrow ~ 10% of the cutoff frequency. Lets build a Kaiser filter to do just that and then decimate.

In [6]:

            
                Copied!
                
# Build a kaiser with the cutoff at the nyquist and set the transition width at 10% the nyquist.
kaiser = Kaiser(fpass=90, fstop=110, gpass=0.1, gstop=40, fs=5000)

#filter and decimate our signal by a factor of 25 from 5 kHz to 200 Hz
filtered = kaiser(x, chunksize=10e6, axis=-1)
antialiased = filtered[0, ::25]
# Build a kaiser with the cutoff at the nyquist and set the transition width at 10% the nyquist.
kaiser = Kaiser(fpass=90, fstop=110, gpass=0.1, gstop=40, fs=5000)

#filter and decimate our signal by a factor of 25 from 5 kHz to 200 Hz
filtered = kaiser(x, chunksize=10e6, axis=-1)
antialiased = filtered[0, ::25]

In [7]:

            
                Copied!
                
                    
                    
                
                

        
# Now lets again compare the frequency content of our signals
antialiased_f = np.fft.rfft(antialiased)
antialiased_freqs = np.fft.rfftfreq(antialiased.shape[-1], d=1/fs * 25)

fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(x_freqs, np.abs(x_f), label='original', linewidth=2, color='gray')
ax.plot(decimated_freqs, 25*np.abs(decimated_f), label='Decimated (aliased)', color='green', alpha=0.5)
ax.plot(antialiased_freqs, 25*np.abs(antialiased_f), label='Downsampled (antialiased)', color='k', 
        linestyle=(0, [1,2]))

ax.set_xlim([0,100])
ax.set_ylabel('$|X^F(f)|$', fontsize=14)
ax.set_xlabel('Frequency (Hz)', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.legend()
plt.show()
# Now lets again compare the frequency content of our signals
antialiased_f = np.fft.rfft(antialiased)
antialiased_freqs = np.fft.rfftfreq(antialiased.shape[-1], d=1/fs * 25)

fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(x_freqs, np.abs(x_f), label='original', linewidth=2, color='gray')
ax.plot(decimated_freqs, 25*np.abs(decimated_f), label='Decimated (aliased)', color='green', alpha=0.5)
ax.plot(antialiased_freqs, 25*np.abs(antialiased_f), label='Downsampled (antialiased)', color='k', 
        linestyle=(0, [1,2]))

ax.set_xlim([0,100])
ax.set_ylabel('$|X^F(f)|$', fontsize=14)
ax.set_xlabel('Frequency (Hz)', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.legend()
plt.show()

Mathematics of Downsampling¶

In the preceeding section, we found that if we filter the original data from 5 KHz to 100 Hz (1/2 the desired sampling rate of 200 Hz) before decimating, the resultant signal is no longer aliased. However, this is wasteful computationally because we are filtering all the data and then throwing out 24 out of every 25 samples during the decimation step. Do we need to compute all the filtered values before decimating?

Convolution with FIRs¶

Applying a FIR filter $h[k]$ to a data sequence $x[n]$ is a convolution expressed as:

$ y[n] = \sum \limits_{i=1}^{K} h[i] x[n-i] \quad (3) $

For each sample n, we must perform K multiplications since the filter is K elements long. Similarly, M-fold downsampling can be expressed as:

$ y[n] = \sum \limits_{i=1}^{K} h[i] x[nM-i] \quad (4) $

Which by the commutativity of convolution can be written as:

$ y[n] = \sum \limits_{i=1}^{K} h[nM-i] x[i] \quad (5) $

The Polyphase Decomposition¶

Notice in Eqn. 5 that we need only every M filter values. So for FIR filters we can potentially reduce our computations to ~ K / M computations for each sample n. To see how we can achieve this savings lets substitute into 4 (See B. Porat Ch 12):

$ i = i^{\prime} M + m \quad \text{where} \quad 0 \leq m \leq M-1 $

giving:

$ \begin{align*} y[n] &= \sum \limits_{i=1}^{K} h[nM-i] x[i] = \sum \limits_{m=0}^{M-1} \sum \limits_{i^{\prime}} h[i^{\prime}M + m]x[nM-i^{\prime}M-m] \\ &= \sum \limits_{m=0}^{M-1} \sum \limits_{i^{\prime}} h[i^{\prime}M + m]x[M(n-i^{\prime})-m] \quad (6) \end{align*} $

A Worked Example¶

The effect of this substitution may not be obvious so lets work through a small example. Lets say we have a signal that we would like to down-sample by a factor of 3 and that the length of the filter h is 100 taps.

for the m=0 terms the filter h is:

$ h[3i^{\prime}] = \{h[0],\ h[3],\ h[6], ... h[99]\} $

and the signal points are:

$ x[3(n-i^{\prime})] = \{x[3n],\ x[3n-3],\ x[3n-6], ... x[3n-99]\} $

So the convolutions for the m=0 subsequences are

$ \sum \limits_{i^{\prime}} h[3i^{\prime}]x[3(n-i^{\prime})] \quad \text{for} \ m=0 $

Similarly for m=1, the convolutions are:

$ \sum \limits_{i^{\prime}} h[3i^{\prime} + 1]x[3(n-i^{\prime})-1] \quad \text{for} \ m=1 $

and for the m=2 subsequences, the convolutions are:

$ \sum \limits_{i^{\prime}} h[3i^{\prime} + 2]x[3(n-i^{\prime})-2] \quad \text{for} \ m=2 $

This substitution partitions the filter and subsequences into distinct subsequences that are 1/3 the length of the original filter and data sequence x[n]. Thus each point in each data subsequence is only multiplied by 1/3 of the filters coeffs! These filter subsequences for each m are called the polyphase components.

Using our partitioning, the filtered values can be written as a sum over the convolutions of the subsequences:

$ \sum \limits_{i^{\prime}} h[3i^{\prime}]x[3(n-i^{\prime})] + \sum \limits_{i^{\prime}} h[3i^{\prime} + 1]x[3(n-i^{\prime})-1] + \sum \limits_{i^{\prime}} h[3i^{\prime} + 2]x[3(n-i^{\prime})-2] $

Just to reiterate, convolution requires K multiplications for each datapoint since the filter is K-elements long. This polyphase decomposition approach requires only K/M multiplications + M sums. Polyphase decomposition picks out the correct filter and data samples to allow us to decimate BEFORE filtering. This is an M-Fold computational savings!

Downsampling in Openseize¶

Openseize implements polyphase decomposition for data sequences x[n] that are too large to fit into memory. We now turn to explore how to apply this functionality in the context of downsampling.

As usual, we will start by asking for help to understand the downsample functions call signature.

In [8]:

            
                Copied!
                
help(downsample)
help(downsample)

Help on function downsample in module openseize.resampling.resampling:

downsample(data: Union[openseize.core.producer.Producer, numpy.ndarray[Any, numpy.dtype[numpy.float64]]], M: int, fs: float, chunksize: int, axis: int = -1, **kwargs) -> Union[openseize.core.producer.Producer, numpy.ndarray[Any, numpy.dtype[numpy.float64]]]
    Downsamples an array or producer of arrays using polyphase
    decomposition by an integer decimation factor.
    
    Args:
        data:
            The producer or ndarray data to be downsampled.
        M:
            The decimation factor describing which Mth samples of data
            survive decimation. (E.g. M=10 -> every 10th sample survives)
        fs:
            The sampling rate of data in Hz.
        chunksize:
            The number of samples to hold in memory during downsampling.
            This method will require ~ 3 times chunksize in memory.
        axis:
            The axis of data along which downsampling will occur.
        kwargs:
            Any valid keyword for a Kaiser lowpass filter. The default
            values for this antialiasing filter are:
    
            - fstop: int
                The stop band edge frequency.
                Defaults to cutoff + cutoff / 10 where cutoff =
                fs // (2 * M).
            - fpass: int
                The pass band edge frequency. Must be less than fstop.
                Defaults to cutoff - cutoff / 10 where cutoff = fs //
                (2 * M).
            - gpass: int
                The pass band attenuation in dB. Defaults to a max loss
                in the passband of 0.1 dB = ~1.1% amplitude loss.
            - gstop: int
                The max attenuation in the stop band in dB. Defaults to
                40 dB or 99%  amplitude attenuation.
    
    Returns:
        An array or producer of arrays of downsampled data depending on 
          input data parameters datatype.
    
    References:
        1. Porat, B. (1997). A Course In Digital Signal Processing. John
           Wiley & Sons. Chapter 12 "Multirate Signal Processing"
        2. Polyphase implementation: scipy.signal.resample_poly

So we need to provide a producer or array of data, the downsample factor -- this is the M-fold decimation amount, the sampling rate, the amount of samples to hold in memory during the downsampling and the sample axis. Lets build a producer for our demo data and call this function timing how long it takes to run.

In [9]:

            
                Copied!
                
#the reader was built above but the channel read was only the 0th. Lets downsample all the channels
reader.channels = [0,1,2,3]
pro = producer(reader, chunksize=5e6, axis=-1)
print(pro.shape)
#the reader was built above but the channel read was only the 0th. Lets downsample all the channels
reader.channels = [0,1,2,3]
pro = producer(reader, chunksize=5e6, axis=-1)
print(pro.shape)

(4, 18875000)

In [10]:

            
                Copied!
                
# now lets downsample the entire data (~1 hour) by 25 fold.
t0 = time.perf_counter()
polyphased = downsample(pro, M=25, chunksize=5e6, fs=5000)
elapsed = time.perf_counter() - t0
print('Producer with shape {} was downsampled in {} secs'.format(pro.shape, elapsed))
# now lets downsample the entire data (~1 hour) by 25 fold.
t0 = time.perf_counter()
polyphased = downsample(pro, M=25, chunksize=5e6, fs=5000)
elapsed = time.perf_counter() - t0
print('Producer with shape {} was downsampled in {} secs'.format(pro.shape, elapsed))

Producer with shape (4, 18875000) was downsampled in 0.00017939899771590717 secs

Important, remember alogirthms in Openseize are delayed execution, meaning they don't run until you actually ask for their output. Lets ask for the output and time the downsample execution...

In [11]:

            
                Copied!
                
t0 = time.perf_counter()
polyresult = polyphased.to_array()
elapsed = time.perf_counter() - t0
print('Producer with shape {} was downsampled in {} secs'.format(pro.shape, elapsed))
t0 = time.perf_counter()
polyresult = polyphased.to_array()
elapsed = time.perf_counter() - t0
print('Producer with shape {} was downsampled in {} secs'.format(pro.shape, elapsed))

Producer with shape (4, 18875000) was downsampled in 3.7424030090005544 secs

Great so we see that the downsampling of 4 channels of data of 1 hour duration took almost 4 secs. Lets compare the first 4-secs of this polyphased downsample data with the 4 secs of downsampled data computed by filtering then decimation from the aliasing section of this demo.

In [12]:

            
                Copied!
                
                    
                    
                
                

        
# get the first 4 secs of openseizes polyphase result and get its Fourier representation
pp = polyresult[0, 0:int(4*fs/25)]
pp_f = np.fft.rfft(pp)
pp_freqs = np.fft.rfftfreq(pp.shape[-1], d=1/fs * 25)

fig, ax = plt.subplots(figsize=(14, 6))

# Plot the original (not downsampled) FFT of the first 4 secs of data
ax.plot(x_freqs, np.abs(x_f), label='original', linewidth=2, color='gray')

# Plot the FFT of the first 4 seconds of polyphase downsampled data
ax.plot(pp_freqs, 25*np.abs(pp_f), label='Polyphase Downsampled', color='green', alpha=0.5)

# Plot the FFT of the filtered and decimated (antialiased from above) first 4 secs of data
ax.plot(antialiased_freqs, 25*np.abs(antialiased_f), label='Standard Downsampled', color='k', 
        linestyle=(0, [1,5]))

ax.set_xlim([0,100])
ax.set_ylabel('$|X^F(f)|$', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.legend()
ax.annotate('',(98.5, 4000), xytext=(98.5, 13000), arrowprops=dict(facecolor='k'))
ax.set_title('Fourier Transform of Polyphase Downsampled Data', fontsize=18)
plt.show()
# get the first 4 secs of openseizes polyphase result and get its Fourier representation
pp = polyresult[0, 0:int(4*fs/25)]
pp_f = np.fft.rfft(pp)
pp_freqs = np.fft.rfftfreq(pp.shape[-1], d=1/fs * 25)

fig, ax = plt.subplots(figsize=(14, 6))

# Plot the original (not downsampled) FFT of the first 4 secs of data
ax.plot(x_freqs, np.abs(x_f), label='original', linewidth=2, color='gray')

# Plot the FFT of the first 4 seconds of polyphase downsampled data
ax.plot(pp_freqs, 25*np.abs(pp_f), label='Polyphase Downsampled', color='green', alpha=0.5)

# Plot the FFT of the filtered and decimated (antialiased from above) first 4 secs of data
ax.plot(antialiased_freqs, 25*np.abs(antialiased_f), label='Standard Downsampled', color='k', 
        linestyle=(0, [1,5]))

ax.set_xlim([0,100])
ax.set_ylabel('$|X^F(f)|$', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.legend()
ax.annotate('',(98.5, 4000), xytext=(98.5, 13000), arrowprops=dict(facecolor='k'))
ax.set_title('Fourier Transform of Polyphase Downsampled Data', fontsize=18)
plt.show()

The frequency content of the original, polyphase, and standard downsampled data are all very close. Notice around 100 Hz (see arrow), the standard and polyphase downsampling deviate slightly from the original frequency content in x[n]. This is due to the Kaiser filter used in both the standard and polyphase algorithms. The Kaiser filter's transfer function $H^f(\theta)$ has some width around the 100 Hz cutoff causing a slight deviation there.

Mathematics of Upsampling¶

As a reminder expansion is the insertion of L-1 zeros between each of the samples in x[n].

$ y[n] = x[\frac{n}{L}] = \begin{cases} 1 & \text{n is divisible by L} \\ 0 & \text{otherwise} \end{cases} $

The result of this operation is pretty useless until we replace the zeros with Interpolated values. How do we interpolate the signal at each zero? Interpolation is carried out by low-pass filtering the expanded sequence x[n]. Expansion followed by interpolation is called Upsampling. L-fold expansion followed by filtering is expressed as:

$ y[n] = \sum \limits_{i=0}^K h[i] x[n-Li] = \sum \limits_{i=0}^K x[i] h[n-Li] \quad (7) $

Here the filter h is zero everywhere except where the sample indices "n" are divisible by L. Thus as with decimation we have the potential for a L-fold computational savings. This is equivalent to saying we can filter before we expand the sequence x[n]. This saves us from filtering L zeros for every non-zero data sample. As with decimation, there is a clever substitution that helps us achieve this savings. In this case we substitute n with:

$ n = n^{\prime}L + L - 1- l \quad \text{where} \quad 0 \leq l \leq L-1 \quad (8) $

giving for each l:

$ y[n^{\prime}L + L - 1- l] = \sum \limits_{i=0}^K x[i] h[(n^{\prime}-i)L + L - 1 - l] \quad (9) $

As with the decimation, the filter is broken into subsequences where each subsequence is a distinct subsequence because they are shifted by different l's and then only every Lth term survives. This gives us the K / L computational savings. We sum the $y[n^{\prime}L + L - 1- l]$ terms across l-values to get each y[n].

Important, upsampling does not lead to aliasing. While expansion leads to replicates in the Fourier domain, these replicates are always spaced far enough apart along the frequency axis so as not to interfere with each other.

Since this upsampling polyphase decomposition is similar to downsampling, we won't work out a specific example but if you set the L-fold upsample to L=3, you will find the argument follows closely what we did for downsampling. Lets now turn to using Openseize's upsample

Upsampling in Openseize¶

As usual start by asking for help

In [13]:

            
                Copied!
                
help(upsample)
help(upsample)

Help on function upsample in module openseize.resampling.resampling:

upsample(data: Union[openseize.core.producer.Producer, numpy.ndarray[Any, numpy.dtype[numpy.float64]]], L: int, fs: float, chunksize: int, axis: int = -1, **kwargs) -> Union[openseize.core.producer.Producer, numpy.ndarray[Any, numpy.dtype[numpy.float64]]]
    Upsamples an array or producer of arrays using polyphase
    decomposition by an integer expansion factor.
    
    Args:
        data:
            The producer or ndarray data to be downsampled.
        L:
            The expansion factor. L-1 interpolated values will be inserted
            between consecutive samples in data along axis.
        fs:
            The sampling rate of data in Hz.
        chunksize:
            The number of samples to hold in memory during upsampling.
            This method will require ~ 3 times chunksize in memory.
        axis:
            The axis of data along which upsampling will occur.
        kwargs:
            Any valid keyword for a Kaiser lowpass filter. The default
            values for this interpolation filter are:
    
            - fstop: int
                The stop band edge frequency.
                Defaults to cutoff + cutoff / 10 where cutoff =
                fs // (2 * L).
            - fpass: int
                The pass band edge frequency. Must be less than fstop.
                Defaults to cutoff - cutoff / 10 where cutoff = fs //
                (2 * L).
            - gpass: int
                The pass band attenuation in dB. Defaults to a max loss
                in the passband of 0.1 dB = ~1.1% amplitude loss.
            - gstop: int
                The max attenuation in the stop band in dB. Defaults to
                40 dB or 99%  amplitude attenuation.
    
    Returns:
        An array or producer of arrays of upsampled data depending on the 
          input data parameters datatype.
    
    References:
        1. Porat, B. (1997). A Course In Digital Signal Processing. John
           Wiley & Sons. Chapter 12 "Multirate Signal Processing"
        2. Polyphase implementation: scipy.signal.resample_poly

In [14]:

            
                Copied!
                
# lets take our 4 seconds of data x and upsample  L=3 fold

# notice we are supplying the array this time instead of a producer for variety
# also notice the chunksize is larger than array size along last axis so this means 
# 'use the entire array at once'
t0 = time.perf_counter()

upsampled_x = upsample(x, L=3, fs=fs, chunksize=10e6, axis=-1)

elapsed = time.perf_counter() - t0
print('Array with shape {} was downsampled in {} secs'.format(x.shape, elapsed))
# lets take our 4 seconds of data x and upsample  L=3 fold

# notice we are supplying the array this time instead of a producer for variety
# also notice the chunksize is larger than array size along last axis so this means 
# 'use the entire array at once'
t0 = time.perf_counter()

upsampled_x = upsample(x, L=3, fs=fs, chunksize=10e6, axis=-1)

elapsed = time.perf_counter() - t0
print('Array with shape {} was downsampled in {} secs'.format(x.shape, elapsed))

Array with shape (1, 20000) was downsampled in 0.010576804001175333 secs

In [15]:

            
                Copied!
                
                    
                    
                
                

        
# and lets make a plot to convince ourselves that the upsampling is working right
fig, ax = plt.subplots(figsize=(14, 8))

times = np.arange(0, 4, 1/fs)
up_times = np.arange(0, 4, 1/(fs*3))

axinset = inset_axes(ax, width="50%", height=2, loc=1, borderpad=2)
axarr = [ax, axinset]

# plot the original and upsampled to each axis and remove spines
[ax.plot(up_times, upsampled_x[0], marker='o', linestyle='--', color='green', alpha=0.5, 
         label='Upsampled') for ax in axarr]
[ax.plot(times, x[0], marker='o', color='gray', label='original') for ax in axarr]
[ax.spines.top.set_visible(False) for ax in axarr]
[ax.spines.right.set_visible(False) for ax in axarr]

# add rectangle to indicate zoomed region
rect = Rectangle((.045, -20), .005, axarr[0].get_ylim()[-1], color='tab:blue', alpha=0.25)
axarr[0].add_patch(rect)

#set axis limits
axarr[0].set_xlim([-0.001,0.1])
axarr[0].set_ylim([-50, 120])
axarr[1].set_xlim([0.045,0.05])
axarr[1].set_ylim([-12,30])

#set labels and plot
axarr[0].set_ylabel(r'Voltage $(\ \mu V \ )$', fontsize=14)
axarr[0].set_xlabel('time (s)', fontsize=14)
axarr[0].legend(loc='upper left', fontsize=14)
plt.show()
# and lets make a plot to convince ourselves that the upsampling is working right
fig, ax = plt.subplots(figsize=(14, 8))

times = np.arange(0, 4, 1/fs)
up_times = np.arange(0, 4, 1/(fs*3))

axinset = inset_axes(ax, width="50%", height=2, loc=1, borderpad=2)
axarr = [ax, axinset]

# plot the original and upsampled to each axis and remove spines
[ax.plot(up_times, upsampled_x[0], marker='o', linestyle='--', color='green', alpha=0.5, 
         label='Upsampled') for ax in axarr]
[ax.plot(times, x[0], marker='o', color='gray', label='original') for ax in axarr]
[ax.spines.top.set_visible(False) for ax in axarr]
[ax.spines.right.set_visible(False) for ax in axarr]

# add rectangle to indicate zoomed region
rect = Rectangle((.045, -20), .005, axarr[0].get_ylim()[-1], color='tab:blue', alpha=0.25)
axarr[0].add_patch(rect)

#set axis limits
axarr[0].set_xlim([-0.001,0.1])
axarr[0].set_ylim([-50, 120])
axarr[1].set_xlim([0.045,0.05])
axarr[1].set_ylim([-12,30])

#set labels and plot
axarr[0].set_ylabel(r'Voltage $(\ \mu V \ )$', fontsize=14)
axarr[0].set_xlabel('time (s)', fontsize=14)
axarr[0].legend(loc='upper left', fontsize=14)
plt.show()

So we can see that between each pair of original data samples we now have two interpolated points giving us an upsample of 3-fold (i.e. original samples are now spaced 3 samples apart). One last point to make about upsampling is that upsampling does not increase the information content the interpolated samples are just that; they are interpolated. So be careful drawing conclusions that rely on samples that were interpolated.

Resampling¶

So far, we have only considered downsampling by an integer factor M or upsampling by an integer factor L. What if we want to resample data by a non-integer rational number. For example lets say our data is sampled at 5 KHz but we would like to resample the data to 2 kHz. We can't simply downsample this data because 5 divided by 2 is not an integer. So what can we do? Notice that if we upsampled our 5 KHz data by 2 and then downsampled by 5 we would get 2 KHz. This is the basis for resampling. We form a rational upsample to downsample ratio of L / M.

Polyphase decomposition resampling can be accomplished by combining the downsample and upsample polyphase implementations discussed earlier in this demo. Briefly, we make the following substitutions:

$ \begin{align*} n &= n^{\prime}L + l \quad \text{where} \quad 0 \leq l \leq L-1 \quad (10)\\ i &= i^{\prime} M + m \quad \text{where} \quad 0 \leq m \leq M-1 \quad (11) \end{align*} $

Notice the similarity with the up and downsample substitutions Eqns 6 and 8. With these substitutions into our standard FIR convolution (Eqn 3). We obtain for each l:

$ y[n^{\prime}L+l] = \sum \limits_{m=0}^{M-1} \sum \limits_{i=0}^{K} x[i^{\prime}M + m] \ h[(n^{\prime}-i^{\prime})ML + lM - mL] \quad (12) $

Although the effect of this substitution is not immediately apparent, you can see that we are picking out M distinct subsequences of the data x[n] and that the filter is being broken into distint subsequences for the MLth elements shifted by l and m. Its a little complicated and the curious reader is encouraged to see all the details and simplification of this substitution in B. Porat's A Course in Digital Signal Processing Ch 12.

Openseize implements this polyphase decomposition for resampling data. So lets now turn to how users can use this functionality. Again we will ask for help to see the call signature of resample.

In [16]:

            
                Copied!
                
help(resample)
help(resample)

Help on function resample in module openseize.resampling.resampling:

resample(data: Union[openseize.core.producer.Producer, numpy.ndarray[Any, numpy.dtype[numpy.float64]]], L: int, M: int, fs: float, chunksize: int, axis: int = -1, **kwargs) -> Union[openseize.core.producer.Producer, numpy.ndarray[Any, numpy.dtype[numpy.float64]]]
    Resamples an array or producer of arrays using polyphase
    decomposition by a rational factor L / M.
    
    Args:
        data:
            The producer or ndarray of data to be resampled.
        L:
            The expansion factor. L-1 interpolated values will be inserted
            between consecutive samples in data along axis.
        M:
            The decimation factor describing which Mth samples of data
            survive decimation. (E.g. M=10 -> every 10th sample survives)
        fs:
            The sampling rate of data in Hz.
        chunksize:
            The number of samples to hold in memory during upsampling.
            This method will require ~ 3 times chunksize in memory.
        axis:
            The axis of data along which resampling will occur.
        kwargs:
            Any valid keyword for a Kaiser lowpass filter. The default
            values for combined antialiasing & interpolation filter are:
    
            - fstop: int
                The stop band edge frequency.
                Defaults to cutoff + cutoff / 10 where cutoff =
                fs / (2 * max(L,M)).
            - fpass: int
                The pass band edge frequency. Must be less than fstop.
                Defaults to cutoff - cutoff / 10 where cutoff = fs /
                (2 * max(L,M)).
            - gpass: int
                The pass band attenuation in dB. Defaults to a max loss
                in the passband of 0.1 dB = ~1.1% amplitude loss.
            - gstop: int
                The max attenuation in the stop band in dB. Defaults to
                40 dB or 99%  amplitude attenuation.
    
    Note:
        The resampling factor L/M may be reducible. If so, L and M are
        reduced prior to resampling.
    
    Returns:
        An array or producer of arrays of resampled data depending on input
          data parameters datatype.
    
    References:
        1. Porat, B. (1997). A Course In Digital Signal Processing. John
           Wiley & Sons. Chapter 12 "Multirate Signal Processing"
        2. Polyphase implementation: scipy.signal.resample_poly

To demonstrate how to use this resampling function, we will get all 4 channels of our readers data for the entire duration of the recording (~ 1 hour). The data was initially sampled at 5 KHz and we will resample to 256 Hz.

In this case 5 kHz is not divisible by 256 Hz so we will need to upsample by some amount L and then downsample by some amount M to achieve this. In this case, we could upsample by L=256 and then downsample by M=5000 but this is wasteful because 256/5000 can be reduced by 8 to L / M = 32 / 625. At this point, you may be thinking wow, upsampling the data 32 times is going to make the data very large and there will be lots of computations to do this. However the polyphase decompsition method will reduce the problem to computing filtered values of 32 * 625 (MxL) subsequences of the data and simply summing the results. Lets try this out on the demo data.

In [17]:

            
                Copied!
                
# create a producer of all 4 channels of the readers data
reader.channels = [0,1,2,3]
pro = producer(reader, chunksize=5e6, axis=-1)
print(pro.shape)
# create a producer of all 4 channels of the readers data
reader.channels = [0,1,2,3]
pro = producer(reader, chunksize=5e6, axis=-1)
print(pro.shape)

(4, 18875000)

In [18]:

            
                Copied!
                
                    
                    
                
                

        
# Build a resampling producer and get all its values timing how long it takes
rpro = resample(pro, L=32, M=625, fs=5000, chunksize=5e6, axis=-1)
t0 = time.perf_counter()
op_poly = rpro.to_array()
elapsed = time.perf_counter() - t0
samples = pro.shape[-1]
print('Producer with shape {} = {:.2f} hours was resampled in {} secs'.format(pro.shape, 
                                                                          samples/(3600*5000), elapsed))
# Build a resampling producer and get all its values timing how long it takes
rpro = resample(pro, L=32, M=625, fs=5000, chunksize=5e6, axis=-1)
t0 = time.perf_counter()
op_poly = rpro.to_array()
elapsed = time.perf_counter() - t0
samples = pro.shape[-1]
print('Producer with shape {} = {:.2f} hours was resampled in {} secs'.format(pro.shape, 
                                                                          samples/(3600*5000), elapsed))

Producer with shape (4, 18875000) = 1.05 hours was resampled in 3.7136126769983093 secs

That's pretty fast, especially considering if we did this the non-polyphase way. In that case we would expand the data an then apply an interpolation filter. The expanded sequence that the interpolating filter would run over would be 32X longer (filled mostly with zeros) than the original data sequence x[n]. Then we would need to run an antialiasing filter over this upsampled sequence measuring ~32 hours in duration and then decimate. This would require far more time to compute!

Lets now make some plots to see if this resampled data computed iteratively matches Scipy's polyphase resample which requires all the data at once in an array.

In [19]:

            
                Copied!
                
                    
                    
                
                

        
# Compute the resampled data using scipy

# The polyphase filter will need a cutoff at 5000/(2*625) that is the Nyquist/M since M > L
fcut = 5000/(2*max(32, 625))
fpass = fcut - fcut/10
fstop = fcut + fcut/10
filt = Kaiser(fpass, fstop, gpass=0.1, gstop=40, fs=5000)

# give the filter to scipy polyphase and time
t0 = time.perf_counter()
signals = pro.to_array()
sp_poly = sp.signal.resample_poly(signals, up=32, down=625, axis=-1, window=filt.coeffs)
elapsed = time.perf_counter() - t0
samples = pro.shape[-1]
print('Scipy Array with shape {}  = {:.2f} hours was resampled in {} secs'.format(pro.shape, 
                                                                              samples/(3600*5000), elapsed))
# Compute the resampled data using scipy

# The polyphase filter will need a cutoff at 5000/(2*625) that is the Nyquist/M since M > L
fcut = 5000/(2*max(32, 625))
fpass = fcut - fcut/10
fstop = fcut + fcut/10
filt = Kaiser(fpass, fstop, gpass=0.1, gstop=40, fs=5000)

# give the filter to scipy polyphase and time
t0 = time.perf_counter()
signals = pro.to_array()
sp_poly = sp.signal.resample_poly(signals, up=32, down=625, axis=-1, window=filt.coeffs)
elapsed = time.perf_counter() - t0
samples = pro.shape[-1]
print('Scipy Array with shape {}  = {:.2f} hours was resampled in {} secs'.format(pro.shape, 
                                                                              samples/(3600*5000), elapsed))

Scipy Array with shape (4, 18875000)  = 1.05 hours was resampled in 2.993395033001434 secs

In [20]:

            
                Copied!
                
                    
                    
                
                

        
# lets plot the first second of original data, scipy resampled data and openseize resampled data
fs = 5000
new_fs = 256

# get 1st second of each data accting for different sample rates
original = signals[0, :fs]
op_result = op_poly[0, :new_fs]
sp_result = sp_poly[0, :new_fs]

times = np.arange(0, 1, 1/fs)
retimes = np.arange(0, 1, 1/new_fs)

# plot original, scipy and openseize results
fig, ax = plt.subplots(figsize=(14,6))
ax.plot(times, original, color='gray', alpha=0.5, label='original')
ax.plot(retimes, sp_result, color='green', alpha=0.5, label='scipy polyphase', linewidth=2)
ax.plot(retimes, op_result, color='k', linestyle=(0, [1,5]), label='openseize polyphase')

ax.set_ylabel(r'Voltage $(\ \mu V \ )$', fontsize=14)
ax.set_xlabel('time (s)', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.legend(fontsize=14)

plt.show()
# lets plot the first second of original data, scipy resampled data and openseize resampled data
fs = 5000
new_fs = 256

# get 1st second of each data accting for different sample rates
original = signals[0, :fs]
op_result = op_poly[0, :new_fs]
sp_result = sp_poly[0, :new_fs]

times = np.arange(0, 1, 1/fs)
retimes = np.arange(0, 1, 1/new_fs)

# plot original, scipy and openseize results
fig, ax = plt.subplots(figsize=(14,6))
ax.plot(times, original, color='gray', alpha=0.5, label='original')
ax.plot(retimes, sp_result, color='green', alpha=0.5, label='scipy polyphase', linewidth=2)
ax.plot(retimes, op_result, color='k', linestyle=(0, [1,5]), label='openseize polyphase')

ax.set_ylabel(r'Voltage $(\ \mu V \ )$', fontsize=14)
ax.set_xlabel('time (s)', fontsize=14)
ax.spines.top.set_visible(False)
ax.spines.right.set_visible(False)
ax.legend(fontsize=14)

plt.show()

In [21]:

            
                Copied!
                
# lets check that the shape of the resampled matches what we expect

# we expect samples to be reduced by L/M = 256/5000
expected_len = int(len(original) * 256/5000)
msg = 'Does the resampled length = {} match the expected length = {}? {}'
print(msg.format(len(op_result), expected_len, len(op_result) == expected_len))
# lets check that the shape of the resampled matches what we expect

# we expect samples to be reduced by L/M = 256/5000
expected_len = int(len(original) * 256/5000)
msg = 'Does the resampled length = {} match the expected length = {}? {}'
print(msg.format(len(op_result), expected_len, len(op_result) == expected_len))

Does the resampled length = 256 match the expected length = 256? True

In [22]:

            
                Copied!
                
# Finally, lets check that openseizes resampled result matches scipys result for all channels and samples
msg = 'Does Openseize resampled match Scipy resampled for all {} channels and all {} samples? {}'
print(msg.format(*sp_poly.shape, np.allclose(sp_poly, op_poly)))
# Finally, lets check that openseizes resampled result matches scipys result for all channels and samples
msg = 'Does Openseize resampled match Scipy resampled for all {} channels and all {} samples? {}'
print(msg.format(*sp_poly.shape, np.allclose(sp_poly, op_poly)))

Does Openseize resampled match Scipy resampled for all 4 channels and all 966400 samples? True

Resource Recovery¶

In [23]:

            
                Copied!
                
# In this tutorial we have used a reader instance to read and resample data. This
# reader maintains an open link to a file an continues to use resources until it is
# explicitly closed. Lets close the reader
reader.close()
# In this tutorial we have used a reader instance to read and resample data. This
# reader maintains an open link to a file an continues to use resources until it is
# explicitly closed. Lets close the reader
reader.close()

References¶

A Course In Digital Signal Processing by Boaz Porat; Wiley and Sons 1997 Chapter 12.

This book is a fanatastic resource for anyone looking to understand resampling and many other signal processing tasks. It covers the material with just the right amount of proof and heavy dose of practical application.

SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17(3), 261-272.
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2020)

Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020).