Other Formats
Imports¶
from openseize.file_io import bases
from openseize.file_io import edf
from openseize.demos import paths
Introduction¶
Openseize's ability to scale DSP operations from small to very large EEG datasets starts with the iterative reading of files stored to disk. In the previous tutorial we looked at how to use Openseize's EDF Reader and Writer. This format is supported by a large number of data acquisition and software vendors (vendor list). If you are an experimentalist looking to use Openseize, we would encourage you to check the list in the link above and your hardware/software manuals to see if your system can export EDF.
While the EDF format is heavily used, it is by no means exclusive. Indeed, there is a zoo of different EEG data formats. With no agreed upon standard, readers of these formats may or may not support iterative reading from an opened file. This tutorial provides guidance for users and developers who wish to use Openseize with non-EDF file types that may or may not support iterative file reading.
Rather than exhaustively covering each of the EEG data formats, this tutorial will highlight 3 general strategies for using Openseize on non-EDF data. They are:
- Array-like file types stored to disk
- File to file conversion
- Extending Openseize's Readers
Array like file types stored to disk¶
Array-like file types include:
- Matlab's © saved matrix files (*.mat)
- Hierarchal Data Format's files (*.hdf5)
- Numpy's saved ndarray files (*.npy)
- and many more
Since Openseize can produce from ndarrays (including numpy memmaps), if you can convert to an array type, Openseize can work with it. For example, if you have data stored as *.mat files you have two options depending on the file's size.
- For small mat files use scipy.io.loadmat to load the data to an ndarray.
- For large mat files use the h5py library to create a memory mapped array.
For the large mat file option, you'll need to convert the memory mapped array to a numpy memmap or better still build and HDF5 reader.
Note, many Matlab based EEG analysis systems use builtin data storage and saving mechanisms. If the analysis system can't export to EDF then you can in general load the files in the Matlab console, locate the data field and export it as a *.mat file.
File to file conversion¶
While the option of converting the files from their original file type to EDF sounds promising, a brief survey of source code that performs this conversion reveals a problem. Most (if not all) conversion tools rely on loading the data into memory before the file is written to the EDF format. For small files you can use MNE-Python to load the data to their raw (*.fif) format and then export to the EDF format. They offer conversion from many file types but again be aware that if your files are large this may overrun your computer's memory.
Extending Openseize's Readers¶
Extending Openseize's Readers is by far the best option for reading non-EDF files. Openseize's developers felt so strongly about this that they developed a protocol for writing new concrete readers. The Reader protocol is a collection of related methods in the abstract Reader base class. This collection ensures that all concrete readers have a common set of methods that the producer can call upon to produce from a reader object.
What are these common methods? In another tab open the reference docs for the abstract base Reader. If you expand the source-code tab you should see 7 methods (init, channels (setter), channels (getter), shape, read, enter, exit, and close). The methods without the abc.abstractmethod decorator are methods that you get for free whenever you inherit this abstract Reader base class. These methods handle initialization of concrete Reader instances and context management.
Importantly, the methods marked with @abc.abstractmethod are methods that your new concrete Reader must supply because all readers must have these methods. If you build a concrete reader with the channels, shape and read methods supplied, Openseize can produce from it! As an example, we are going to pseudocode what a concrete HDF5 reader would look like. It uses h5py which is not included with openseize and it assumes a specific file layout so don't expect it to run, but it will give you a good idea of how to develop your own concrete reader.
# Before reading its a good idea to know something about HDF5 and h5py. Also as you
# look over this, keep an eye on the EDF reader to see its similarities with this new
# reader
class HDF5Reader(bases.Reader): # inherit from the base Reader class
"""A prototype (non-functional) reader of Hierarchal Data Format files.
Attrs:
name (str): The name of the dataset to read from in the HDF5
header (dict): A dictionary of metadata for the named dataset. This
reader assumes (probably wrongly) that the metadata
is stored to the file itself instead of the dataset
inside the file. A real (functional) HDF5 reader
should determine where the metadata is stored.
channels (list): A list of channels that this reader will read during a
read method call.
"""
# below we will override the base's init because for HDF5s we need a path
# and a name of a dataset in the file
def __init__(self, path, name):
"""Initialize this Reader with a path to an HDF5 & name of a dataset in
the file."""
# notice that the base class Reader uses python's standard open
# instead to open the hdf5 we will need h5py
self._fobj = h5py.File(path, 'r') # open the file
# name is unique to this reader -- it refers to a dataset in the HDF5
self.name = name
# now we need to get the header of the file
# for an HDF5 file this could be stored as an attr of the file or it
# could be stored as an attr of a dataset in the file. Here we assume
# its an attr of the file.
self.header = dict(self._fobj.attrs)
# The channels are probably stored to the header
# We will extend the init to include a secret list of '_channels'
# These are kept secret so they can never be deleted from the reader
# and when we set new channels we can check them before setting the
# secret '_channels' that this reader actually uses.
self._channels = self.header.channels
@property
def channels(self):
"""Returns the channels this reader is set to read per read method call."""
# This method gets the secret '_channels' we set in the __init__
# these '_channels' were initialized to be all the headers channels
return self._channels
@channels.setter
def channels(self, values):
"""Changes which channels this reader will read per reader method call."""
# When we change which channels this reader reads, we validate the values
# the client entered and then set the secret channels. This delay allows
# us time to validate values before changing this reader
# this validate method is missing in our prototype
self.validate(values)
self._channels = values
def shape(self):
"""Returns a shape tuple of the dataset in the HDF5 file."""
return self._fobj[self.name].shape # shape is a property of hdf5 datasets
def read(self, start, stop):
"""Read values from this HDF5's channels between start and stop indices."""
#This is where all the business happens. This required method is what
# producers rely on!
dset = self._fobj[self.name]
# below we assume channel axis is 0, a true functional HDF5 reader would
# check!
return dset[self.channels, start:stop]
The HDF5 Reader fulfills the obligation (by inheritance) to supply channels, shape and read methods. For free we get context management:
with HDF5Reader(path, name) as infile:
# read first 50 samples
result = infile.read(0, 50)
There is a lot missing from our HDF5 reader like the flexibility of handling different storage locations of metadata and channel checking but the basic idea is there. If you supply the required channels and reader methods, Openseize can produce from the file to bring all of Openseize's iterative goodness to your analyses.
Reader Development Roadmap¶
Recognizing that many Openseize users are looking for a batteries-included-solution to analyzing their data, the maintainers of Openseize are working to include more Reader types. Our priority is to design readers for BioSemi (bdf), General data format (gdf) and BrainVision formats this year (2023). However, we are open to changing this schedule if users request a specific format for reading in the github issues.