EDF Format
Imports¶
import numpy as np
from openseize.file_io.bases import Reader, Header, Writer
from openseize.file_io import edf, annotations
from openseize import demos
from openseize import producer
Introduction¶
Openseize currently provides tools for reading and writing European Data Format (EDF) binary files. The details of this file specification can be found here: https://www.edfplus.info/specs/edf.html
This demo will describe how to open, read, and produce data from an EDF file using the EDF Reader class and write data to an EDF file using the EDF Writer class. Additionally, this demo will cover how to read Comma Separated (CSV) and Tab separated value (TSV) annotation text files and use the resulting annotations to mask produced EEG numpy arrays.
Reading EDF Files¶
Our goal is to use a Reader object to read from an EDF file. Let's first take a look at the help file available for the Reader class, to see what we'll need in order to create one.
help(edf.Reader)
Help on class Reader in module openseize.file_io.edf: class Reader(openseize.file_io.bases.Reader) | Reader(path: Union[str, pathlib.Path]) -> None | | A reader of European Data Format (EDF/EDF+) files. | | This reader supports reading EEG data and metadata from an EDF file with | and without context management (see Introduction). If opened outside | of context management, you should close this Reader's instance manually | by calling the 'close' method to recover open file resources when you | finish processing a file. | | Attributes: | header (dict): | A dictionary representation of the EDFs header. | shape (tuple): | A (channels, samples) shape tuple. | channels (Sequence): | The channels to be returned from the 'read' method call. | | Examples: | >>> from openseize.demos import paths | >>> filepath = paths.locate('recording_001.edf') | >>> from openseize.io.edf import Reader | >>> # open a reader using context management and reading 120 samples | >>> # from all 4 channels | >>> with Reader(filepath) as infile: | >>> x = infile.read(start=0, stop=120) | >>> print(x.shape) | ... (4, 120) | | Method resolution order: | Reader | openseize.file_io.bases.Reader | abc.ABC | openseize.core.mixins.ViewInstance | builtins.object | | Methods defined here: | | __init__(self, path: Union[str, pathlib.Path]) -> None | Extends the Reader ABC with a header attribute. | | read(self, start: int, stop: Optional[int] = None, padvalue: float = nan) -> numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]] | Reads samples from this EDF from this Reader's channels. | | Args: | start: | The start sample index to read. | stop: | The stop sample index to read (exclusive). If None, samples | will be read until the end of file. | padvalue: | Value to pad to channels that run out of samples to return. | Only applicable if sample rates of channels differ. | | Returns: | A float64 array of shape len(chs) x (stop-start) samples. | | ---------------------------------------------------------------------- | Readonly properties defined here: | | shape | Returns a 2-tuple containing the number of channels and | number of samples in this EDF. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | channels | Returns the channels that this Reader will read. | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __abstractmethods__ = frozenset() | | ---------------------------------------------------------------------- | Methods inherited from openseize.file_io.bases.Reader: | | __enter__(self) | Return reader instance as target variable of this context. | | __exit__(self, exc_type, exc_value, traceback) | On context exit, close this reader's file object and propogate | errors by returning None. | | close(self) | Close this reader instance's opened file object. | | ---------------------------------------------------------------------- | Data descriptors inherited from openseize.file_io.bases.Reader: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) | | ---------------------------------------------------------------------- | Methods inherited from openseize.core.mixins.ViewInstance: | | __repr__(self) | Returns the __init__'s signature as the echo representation. | | Returns: str | | __str__(self) | Returns this instances print representation.
The only parameter that is required to instantiate the Reader is a path to an EDF file. So we should pick a file to read off of. For these demos, we have stored demo data to a remote Zenodo repository. The demos module we imported has access to the files in this repo; we can see what's available by calling the available method.
demos.paths.available
---Available demo data files & location--- ------------------------------------------ annotations_001.txt '/home/matt/python...nnotations_001.txt' recording_001.edf '/home/matt/python.../recording_001.edf' 5872_Left_group A.txt '/home/matt/python...2_Left_group A.txt' split0.edf '/home/matt/python...os/data/split0.edf' 5872_Left_group A-D.edf '/home/matt/python...Left_group A-D.edf' irregular_write_test.edf '/home/matt/python...lar_write_test.edf' write_test.edf '/home/matt/python...ata/write_test.edf' CW0259_SWDs.npy '/home/matt/python...ta/CW0259_SWDs.npy' split1.edf '/home/matt/python...os/data/split1.edf'
If the file is currently on your system, you'll see a local location after that file's name. If not, you'll see a link to the Zenodo repo. Regardless of its location, we can get access to a file by calling the locate method. If the file hasn't already been found on your local machine, it will be downloaded to the demos data folder. This may take a few minutes, but will occur only once.
# Get access to the file's path locally, downloading if needed
filepath = demos.paths.locate('recording_001.edf')
# We can see the file's location on our local machine now that it has downloaded.
demos.paths.available
---Available demo data files & location--- ------------------------------------------ annotations_001.txt '/home/matt/python...nnotations_001.txt' recording_001.edf '/home/matt/python.../recording_001.edf' 5872_Left_group A.txt '/home/matt/python...2_Left_group A.txt' split0.edf '/home/matt/python...os/data/split0.edf' 5872_Left_group A-D.edf '/home/matt/python...Left_group A-D.edf' irregular_write_test.edf '/home/matt/python...lar_write_test.edf' write_test.edf '/home/matt/python...ata/write_test.edf' CW0259_SWDs.npy '/home/matt/python...ta/CW0259_SWDs.npy' split1.edf '/home/matt/python...os/data/split1.edf'
Now that we have our demo file path, we can pass it in to create our Reader object.
reader = edf.Reader(filepath)
Properties and Attributes¶
To view the attributes and properties of this reader we can print the reader instance.
# Print out the reader object to see its attributes
print(reader)
Reader Object ---Attributes & Properties--- {'path': PosixPath('/home/matt/python/nri/openseize/src/openseize/demos/data/recording_001.edf'), 'header': {'version': '0', 'patient': 'PIN-42 M 11-MAR-1952 Animal', 'recording': 'Startdate 15-AUG-2020 X X X', 'start_date': '15.08.20', 'start_time': '09.59.15', 'header_bytes': 1536, 'reserved_0': 'EDF+C', 'num_records': 3775, 'record_duration': 1.0, 'num_signals': 5, 'names': ['EEG EEG_1_SA-B', 'EEG EEG_2_SA-B', 'EEG EEG_3_SA-B', 'EEG EEG_4_SA-B', 'EDF Annotations'], 'transducers': ['8401 HS:15279', '8401 HS:15279', '8401 HS:15279', '8401 HS:15279', ''], 'physical_dim': ['uV', 'uV', 'uV', 'uV', ''], 'physical_min': [-8144.31, -8144.31, -8144.31, -8144.31, -1.0], 'physical_max': [8144.319, 8144.319, 8144.319, 8144.319, 1.0], 'digital_min': [-8192.0, -8192.0, -8192.0, -8192.0, -32768.0], 'digital_max': [8192.0, 8192.0, 8192.0, 8192.0, 32767.0], 'prefiltering': ['none', 'none', 'none', 'none', ''], 'samples_per_record': [5000, 5000, 5000, 5000, 1024], 'reserved_1': ['', '', '', '', '']}, 'channels': [0, 1, 2, 3], 'shape': (4, 18875000)} Type help(Reader) for full documentation
The reader contains three attributes; a path to the open file, a dictionary containing the EDF's header information, and the shape of the data, represented as a 2-D numpy array, with channels along 0th axis and samples along the 1st axis.
The header dictionary contains all information stored to the header section of the EDF file. Details on the exact meaning of each of these fields can be found here: https://www.edfplus.info/specs/edf.html. To ease access to the header data, the header is a dict instance that has been extended to include '.' dot notation attribute access.
# Fetch the names of the channels using '.' dot notation
print(reader.header.names)
['EEG EEG_1_SA-B', 'EEG EEG_2_SA-B', 'EEG EEG_3_SA-B', 'EEG EEG_4_SA-B', 'EDF Annotations']
With the open reader instance, we can call the read method to read EDF data. To understand the parameters of this method lets ask for help.
help(reader.read)
Help on method read in module openseize.file_io.edf: read(start: int, stop: Optional[int] = None, padvalue: float = nan) -> numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]] method of openseize.file_io.edf.Reader instance Reads samples from this EDF from this Reader's channels. Args: start: The start sample index to read. stop: The stop sample index to read (exclusive). If None, samples will be read until the end of file. padvalue: Value to pad to channels that run out of samples to return. Only applicable if sample rates of channels differ. Returns: A float64 array of shape len(chs) x (stop-start) samples.
The Readers read method reads from a start sample to a stop sample within the file. If the stop sample is not given the reader will read to the end of the file.
# read samples 0 to 5 for all 4 channels
reader.read(0, 5)
array([[-19.87908032, 7.95793213, 19.88808032, 18.89390131, 18.89390131], [-86.4890744 , 51.70180884, 63.63195703, 88.48643243, 63.63195703], [-85.49489539, 44.74255573, 29.82987048, 79.53882129, 52.69598785], [ 62.63777802, 95.44568555, 77.55046326, 36.7891236 , 109.36419177]])
In addition to reading specific samples, the read method supports reading only a selection of channels.
#set the reader to read only channels 0 and 2
reader.channels = [0, 2]
# read samples 0 to 5 for channels 0 and 2
print(reader.read(0, 5))
#set the reader to read all channels again
reader.channels = reader.header.channels
print('Next read will read channels = ', reader.channels)
[[-19.87908032 7.95793213 19.88808032 18.89390131 18.89390131] [-85.49489539 44.74255573 29.82987048 79.53882129 52.69598785]] Next read will read channels = [0, 1, 2, 3]
The EDF file specification allows for signals that may be sampled at different sample rates to be stored to the same file. In this case, a signal will have fewer samples than other signals in the file. In order to return non-ragged numpy arrays, the Reader will append the value of the padvalue parameter to shorter signals so that all signals have the same length. This padvalue defaults to np.NaN but may take on any value useful for your analysis.
File Resources and Context Management¶
We have seen how to create a Reader instance and use it's read method to extract data from an EDF file. However, the file is still open and using resources that you need to recover. To do this you can call the Reader instance's close method.
reader.close()
To address this potential resource leak, openseize supports opening files using context managment. What does this mean? In python you open a text file using a piece of code that looks like this
with open('somefile.text', 'r') as infile:
process file
When opened this way the file is automatically closed at the end of the "with" context. EDF Readers support opening EDF files in a context managed protocol too. Here's how to open the file using the context manager protocol.
# Open Reader as Context Manager and read data from within context
with edf.Reader(filepath) as cmreader:
data = cmreader.read(0,)
print(data[:5])
# Attempt to read from Reader after context has closed
try:
cmreader.read(0,)
except ValueError as err:
print("\nValueError:", err)
[[-1.98790803e+01 7.95793213e+00 1.98880803e+01 ... 4.50000000e-03 4.50000000e-03 4.50000000e-03] [-8.64890744e+01 5.17018088e+01 6.36319570e+01 ... 4.50000000e-03 4.50000000e-03 4.50000000e-03] [-8.54948954e+01 4.47425557e+01 2.98298705e+01 ... 4.50000000e-03 4.50000000e-03 4.50000000e-03] [ 6.26377780e+01 9.54456855e+01 7.75504633e+01 ... 4.50000000e-03 4.50000000e-03 4.50000000e-03]] ValueError: seek of closed file
This method of opening files inside a specific context and performing operations on the data is the preferred way to work with files in Openseize since the resources are automatically recovered at the end of the context.
Writing EDF Files¶
In addition to an EDF file Reader, Openseize provides an EDF file writer. One of the use cases for this Writer is to split an EDF with channels corresponding to multiple subjects into multiple EDFs containing channels for only one subject. For example, if your EDF contains 3 subjects with 4 channels, their is a total of 12 signals in the EDF. The Writer can then be used to write 3 files each containing 4 channels. Lets examine how to use this Writer. We'll again start by asking for help.
help(edf.Writer)
Help on class Writer in module openseize.file_io.edf: class Writer(openseize.file_io.bases.Writer) | Writer(path: Union[str, pathlib.Path]) -> None | | A writer of European Data Format (EDF/EDF+) files. | | This Writer is a context manager for writing EEG data and metadata to an | EDF binary file. Unlike Readers it must be opened under the context | management protocol. Importantly, this writer does not currently support | writing annotations to an EDF file. | | Attributes: | path (Path): | A python path instance to target file to write data to. | | Examples: | >>> from openseize.demos import paths | >>> filepath = paths.locate('recording_001.edf') | >>> # Create a reader that will read only channels [0, 1] | >>> # and write out these channels to a new file | >>> writepath = paths.data_dir.joinpath('subset_001.edf') | >>> with Reader(filepath) as reader: | >>> with Writer(writepath) as writer: | >>> writer.write(reader.header, reader, channels=[0, 1]) | | Method resolution order: | Writer | openseize.file_io.bases.Writer | abc.ABC | openseize.core.mixins.ViewInstance | builtins.object | | Methods defined here: | | __init__(self, path: Union[str, pathlib.Path]) -> None | Initialize this Writer. See base class for futher details. | | write(self, header: openseize.file_io.edf.Header, data: Union[numpy.ndarray, openseize.file_io.edf.Reader], channels: Sequence[int], verbose: bool = True) -> None | Write header metadata and data for channel in channels to this | Writer's file instance. | | Args: | header: | A mapping of EDF compliant fields and values. | data: | An array with shape (channels, samples) or Reader instance. | channels: | Channel indices to write to this Writer's open file. | verbose: | An option to print progress of write. | | Raises: | ValueErrror: An error occurs if samples to be written is not | divisible by the number of records in the Header | instance. | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __abstractmethods__ = frozenset() | | ---------------------------------------------------------------------- | Methods inherited from openseize.file_io.bases.Writer: | | __enter__(self) | Return instance as target variable of this context. | | __exit__(self, exc_type, exc_value, traceback) | Close this instances file object & propagate any error by | returning None. | | ---------------------------------------------------------------------- | Data descriptors inherited from openseize.file_io.bases.Writer: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) | | ---------------------------------------------------------------------- | Methods inherited from openseize.core.mixins.ViewInstance: | | __repr__(self) | Returns the __init__'s signature as the echo representation. | | Returns: str | | __str__(self) | Returns this instances print representation.
To construct a Writer instance you need to provide a file path where the writer will write the new EDF file to. The write method is what you will need to call in order to write data to the file path. Lets examine this method by asking for the method's documentation.
help(edf.Writer.write)
Help on function write in module openseize.file_io.edf: write(self, header: openseize.file_io.edf.Header, data: Union[numpy.ndarray, openseize.file_io.edf.Reader], channels: Sequence[int], verbose: bool = True) -> None Write header metadata and data for channel in channels to this Writer's file instance. Args: header: A mapping of EDF compliant fields and values. data: An array with shape (channels, samples) or Reader instance. channels: Channel indices to write to this Writer's open file. verbose: An option to print progress of write. Raises: ValueErrror: An error occurs if samples to be written is not divisible by the number of records in the Header instance.
To write an EDF compliant file, the write method will need an EDF Header instance with all required fields and values expected of the EDF file type. An enumeration of the required fields and values can be found by examining the header printed above or be reading the EDF file specification here: https://www.edfplus.info/specs/edf.html
In addition to an EDF compliant Header instance, the write method needs data. This data may be an in-memory or a reader instance from which data will be fetched.
Lastly, the write method can take a list of channel indices. These channel indices will be used to filter both the Header instance and the data. For example, if you provide a Header containing metadata for 4 signals and an array containing 4 signals, you can request to write out a subset of the signals, say channel indices [0, 2]. This allows for the splitting of a multichannel EDF into multiple EDFs. Importantly, both the new data written and the new Header will contain only data and metadata for the 2 channels written. Lets demonstrate these ideas with an example.
# Create path for new EDF file to save to
save_path = demos.paths.data_dir.joinpath('subset_001.edf')
# Create an EDF writer pointing to this path
writer = edf.Writer(save_path)
print(writer)
Writer Object ---Attributes & Properties--- {'path': PosixPath('/home/matt/python/nri/openseize/src/openseize/demos/data/subset_001.edf'), 'mode': 'wb'} Type help(Writer) for full documentation
The writer knows where it will write data to and the write method can now be called to perform the writing. We will write channels 0 and 2 from the 'recording_001.edf' we used earlier. Since this file has a header, we will reuse that header. The write method will select metadata from the header corresponding to channels 0 and 2. The method will also only write data records corresponding to channels 0 and 2. Remember to open the reader as a context manager so the file resources are automatically recovered.
#locate the path to the recording
fp = demos.paths.locate('recording_001.edf')
#open the reader as context manager
with edf.Reader(fp) as reader:
#open the writer as a context manager
with edf.Writer(save_path) as writer:
#write channels 0 and 2 from the header and reader's data
writer.write(reader.header, reader, channels=[0,2])
Writing data: 100.0% complete
Notice here that we called both the Reader and Writer as context managers. Just like reader instances, writer instances maintain an open file to write to that is using your machines resources. By opening both the reader and writer as context managers, these file resources will be closed when the reading and writing is finished.
Now let's reopen the 'subset_001.edf' file we just wrote and make sure the header and data looks correct.
with edf.Reader(save_path) as reader:
# lets print the readers Header-- it should only have metadata for channels 0 and 2
print('---EDF SUBSET HEADER---')
print(reader.header)
#lets print the first 5 samples and check these against the full data
print('---EDF SUBSET DATA---')
print(reader.read(0,5))
---EDF SUBSET HEADER--- {'version': '0', 'patient': 'PIN-42 M 11-MAR-1952 Animal', 'recording': 'Startdate 15-AUG-2020 X X X', 'start_date': '15.08.20', 'start_time': '09.59.15', 'header_bytes': 768, 'reserved_0': 'EDF+C', 'num_records': 3775, 'record_duration': 1.0, 'num_signals': 2, 'names': ['EEG EEG_1_SA-B', 'EEG EEG_3_SA-B'], 'transducers': ['8401 HS:15279', '8401 HS:15279'], 'physical_dim': ['uV', 'uV'], 'physical_min': [-8144.31, -8144.31], 'physical_max': [8144.319, 8144.319], 'digital_min': [-8192.0, -8192.0], 'digital_max': [8192.0, 8192.0], 'prefiltering': ['none', 'none'], 'samples_per_record': [5000, 5000], 'reserved_1': ['', '']} {'Accessible Properties': ['annotated', 'annotation', 'channels', 'offsets', 'record_map', 'samples', 'slopes']} ---EDF SUBSET DATA--- [[-19.87908032 7.95793213 19.88808032 18.89390131 18.89390131] [-85.49489539 44.74255573 29.82987048 79.53882129 52.69598785]]
Both the header and the data appear to contain only the metadata and data for channels 0 and 2. Now lets check that is the case by examining all the data against the original 'recording_001.edf' demo file.
#fp is the still the filepath to recording_001.edf
with edf.Reader(fp) as reader:
#read all 4 channels from the file
all_data = reader.read(0)
#save_path is where the subset_001.edf resides
with edf.Reader(save_path) as reader:
#read the 2 channels from the subset file
two_ch_data = reader.read(0)
print("Do the arrays match? -> ", np.allclose(all_data[[0,2], :], two_ch_data))
Do the arrays match? -> True
EDF Annotations¶
In addition to EDF file readers, Openseize provides annotation file readers. Typically, annotation files are comma-separated or tab-separated value text files that contain time-stamps and labels of important events that occurred during an EEG recording session. Here we will show how to open a Pinnacle format annotation TSV text file. Let's start by looking at the documentation for this annotation reader.
help(annotations.Pinnacle)
Help on class Pinnacle in module openseize.file_io.annotations: class Pinnacle(openseize.file_io.bases.Annotations) | Pinnacle(path: Union[str, pathlib.Path], **kwargs) -> None | | A reader of Pinnacle Technologies© annotation csv files. | | This annotation reader's 'read' method reads annotations into a list of | Annotation dataclass instances. Each Annotation dataclass has the | following attributes: | | - label: A string label given to an annotation. | - time: Time, relative to recording start, in secs of an annotation. | - duration: The duration in seconds of an annotation. | - channel: The channel(s) an annotation was detected on. | | Attributes: | path: | Python path instance to Pinnacle© file. | kwargs: | Any valid kwarg for csv.DictReader initializer. | | Examples: | >>> # read the annotations from the demo annotation file | >>> from openseize.demos import paths | >>> filepath = paths.locate('annotations_001.txt') | >>> from openseize.io.annotations import Pinnacle | >>> # read the 'rest' and 'exploring' annotations | >>> with Pinnacle(filepath, start=6) as pinnacle: | >>> annotations = pinnacle.read(labels=['rest', 'exploring']) | >>> # get the first annotation and print it | >>> print(annotations[0]) | >>> # print the first annotations duration | >>> print(annotations[0].duration) | | Method resolution order: | Pinnacle | openseize.file_io.bases.Annotations | abc.ABC | builtins.object | | Methods defined here: | | channel(self, row: Dict[str, str]) -> Union[int, str] | Extracts the annotation channel for a row in this file. | | duration(self, row: Dict[str, str]) -> float | Measures the duration of an annotation for a row in this file. | | label(self, row: Dict[str, str]) -> str | Extracts the annotation label for a row in this file. | | open(self, path: Union[str, pathlib.Path], start: int = 0, delimiter: str = '\t', **kwargs) -> Tuple[IO[str], Iterable[dict]] | Opens a Pinnacle formatted CSV annotation file. | | Called by 'Annotations.__init__' to initialize this Pinnacle | context manager. | | Args: | path: | A annotation file path location. | start: | The row number of the column headers in the file. | delimiter: | The string character seperating columns in the file. | **kwargs: | Any valid keyword argument for CSV.DictReader builtin. | | Returns: | A tuple (file_obj, DictReader) where file_obj is the open file | instance and DictReader is the builtin csv DictReader. | | time(self, row: Dict[str, str]) -> float | Extracts the annotation time of a row of this file. | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __abstractmethods__ = frozenset() | | ---------------------------------------------------------------------- | Methods inherited from openseize.file_io.bases.Annotations: | | __enter__(self) | Return this instance as target variable of this context. | | __exit__(self, exc_type, exc_value, traceback) | Closes this instance's file obj. & propagate errors by returning | None. | | __init__(self, path: Union[str, pathlib.Path], **kwargs) -> None | Initialize this Annotations reader. | | Args: | path: | A path location to an annotation file. | **kwargs: | Any valid kwarg for a subclasses 'open' method. | | read(self, labels: Optional[Sequence[str]] = None) -> List[openseize.file_io.bases.Annotation] | Reads annotations with labels to a list of Annotation instances. | | Args: | labels: | A sequence of annotation string labels for which Annotation | instances will be returned. If None, return all. | | Returns: | A list of Annotation dataclass instances (see Annotation). | | ---------------------------------------------------------------------- | Data descriptors inherited from openseize.file_io.bases.Annotations: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)
To construct an annotation reader you will need to provide a path to an annotation file. This path is given to the open method (see above). Additionally, you may need to pass in a start line of the file. This describes what line the column data starts on. Lets fetch the demo file "annotations_001.txt" if it is not on your system already and display the file's contents.
#determine the local path using the locate method and download if necessary
annotations_path = demos.paths.locate('annotations_001.txt')
#lets take a look at the file
with open(annotations_path, 'r') as infile:
for idx, row in enumerate(infile):
print(idx, row)
0 Experiment ID Experiment 1 Animal ID Animal 2 Researcher Test 3 Directory path 4 5 6 Number Start Time End Time Time From Start Channel Annotation 7 0 08/15/20 09:59:15.215 08/15/20 09:59:15.215 0.0000 ALL Started Recording 8 1 08/15/20 10:00:00.000 08/15/20 10:00:00.000 44.7850 ALL Qi_start 9 2 08/15/20 10:00:25.000 08/15/20 10:00:30.000 69.7850 ALL grooming 10 3 08/15/20 10:00:45.000 08/15/20 10:00:50.000 89.7850 ALL grooming 11 4 08/15/20 10:02:15.000 08/15/20 10:02:20.000 179.7850 ALL grooming 12 5 08/15/20 10:04:36.000 08/15/20 10:04:41.000 320.7850 ALL exploring 13 6 08/15/20 10:05:50.000 08/15/20 10:05:55.000 394.7850 ALL exploring 14 7 08/15/20 10:08:50.000 08/15/20 10:08:55.000 574.7850 ALL rest 15 8 08/15/20 10:10:14.000 08/15/20 10:10:19.000 658.7850 ALL exploring 16 9 08/15/20 10:17:10.000 08/15/20 10:17:15.000 1074.7850 ALL rest 17 10 08/15/20 10:35:49.000 08/15/20 10:35:54.000 2193.7850 ALL rest 18 11 08/15/20 10:40:00.000 08/15/20 10:40:00.000 2444.7850 ALL Qi_stop 19 12 08/15/20 11:02:09.879 08/15/20 11:02:09.879 3774.6640 ALL Stopped Recording
With this path we can now construct an Annotations reader instance. Just as with Readers and Writers an instance can (and most of the time should) be constructed as a context manager. Below we are going to construct the annotations reader starting from line 6 since that is the row containing the column headers of the file. Note this initialization argument is passed to the open method which can accept any argument that python's builtin CSV.DictReader can accept.
#open the annotations and read all the annotations in the file using the 'read' method
with annotations.Pinnacle(annotations_path, start=6) as reader:
#call read to get the annotations as a sequence of Annotation instances (to be described in a moment)
annotes = reader.read()
#print the sequence of annotation instances
for instance in annotes:
print(instance)
Annotation(label='Started Recording', time=0.0, duration=0.0, channel='ALL') Annotation(label='Qi_start', time=44.785, duration=0.0, channel='ALL') Annotation(label='grooming', time=69.785, duration=5.0, channel='ALL') Annotation(label='grooming', time=89.785, duration=5.0, channel='ALL') Annotation(label='grooming', time=179.785, duration=5.0, channel='ALL') Annotation(label='exploring', time=320.785, duration=5.0, channel='ALL') Annotation(label='exploring', time=394.785, duration=5.0, channel='ALL') Annotation(label='rest', time=574.785, duration=5.0, channel='ALL') Annotation(label='exploring', time=658.785, duration=5.0, channel='ALL') Annotation(label='rest', time=1074.785, duration=5.0, channel='ALL') Annotation(label='rest', time=2193.785, duration=5.0, channel='ALL') Annotation(label='Qi_stop', time=2444.785, duration=0.0, channel='ALL') Annotation(label='Stopped Recording', time=3774.664, duration=0.0, channel='ALL')
You can see that we have fetched all of the annotations from the displayed file and stored each annotation to an Annotation instance. What is this instance? An Annotation object is a python dataclass. If you haven't seen this before, you can think of it as a simple container with '.' dot notation access to the container's contents. Let's examime the third dataclass instance.
#fetch the third annotation item and display it
item = annotes[3]
print(item)
#access the items time from recording start
print('This annotation occurred at {} s relative to the start time'.format(item.time))
Annotation(label='grooming', time=89.785, duration=5.0, channel='ALL') This annotation occurred at 89.785 s relative to the start time
The key pieces of information are given to us in a single Annotation instance:
- label - a piece of text describing the annotation
- time - the exact point in time (in seconds) from the beginning of the recording that the annotation takes place
- duration - the length (in seconds) of the annotation from its start time
- channel - a list of the channels in the EEG recording that the annotation is applied to
In the preceding example, we read all of the annotations from the Pinnacle formatted file but the Annotations 'read' method can accept a sequence of labels to selectively read only some of the annotations. Let's show how this works on this demo annotation file.
#read only the annotations with labels matching either rest or exploring
with annotations.Pinnacle(annotations_path, start=6) as reader:
subset_annotes = reader.read(labels=['rest', 'exploring'])
for annote in subset_annotes:
print(annote)
Annotation(label='exploring', time=320.785, duration=5.0, channel='ALL') Annotation(label='exploring', time=394.785, duration=5.0, channel='ALL') Annotation(label='rest', time=574.785, duration=5.0, channel='ALL') Annotation(label='exploring', time=658.785, duration=5.0, channel='ALL') Annotation(label='rest', time=1074.785, duration=5.0, channel='ALL') Annotation(label='rest', time=2193.785, duration=5.0, channel='ALL')
Producing from EDF Files with Annotations¶
Two important components of an Annotation instance is the time and duration attributes. These attributes allow for selective filtering of EEG data returned from either a Reader or a producer. To do this, the annotation dataclass instances are converted into a boolean mask that can pick out samples of data to keep or discard. Here we will demonstrate how to construct a boolean mask from a list of annotation instances and use that mask to filter a producer's yielded numpy arrays. Further details can be found in the producer demo.
The annotations module provides a method for generating a mask automatically from a series of annotation objects, the as_mask method.
help(annotations.as_mask)
Help on function as_mask in module openseize.file_io.annotations: as_mask(annotations: Sequence[openseize.file_io.bases.Annotation], size: int, fs: float, include: bool = True) -> numpy.ndarray[typing.Any, numpy.dtype[numpy.bool_]] Creates a boolean mask from a sequence of annotation dataclass instances.. Producers of EEG data may recieve an optional boolean array mask. This function creates a boolean mask from a sequence of annotations and is therefore useful for filtering EEG data by annotation label during processing. Args: annotations: A sequence of annotation dataclass instances to convert to a mask. size: The length of the boolean array to return. fs: The sampling rate in Hz of the digital system. include: Boolean determining if annotations should be set to True or False in the returned array. True means all values are False in the returned array except for samples where the annotations are located. Returns: A 1-D boolean array of length size. Examples: >>> # read the annotations from the demo annotation file >>> from openseize.demos import paths >>> filepath = paths.locate('annotations_001.txt') >>> from openseize.io.annotations import Pinnacle >>> # read the 'rest' anotations >>> with Pinnacle(filepath, start=6) as pinnacle: >>> annotations = pinnacle.read(labels=['rest']) >>> # create a mask measuring 3700 secs at 5000 Hz >>> mask = as_mask(annotations, size=3700*5000, fs=5000) >>> # measure the total time in secs of 'rest' annotation >>> print(np.count_nonzero(mask) / 5000) 15.0
To construct a mask, as_mask needs a sequence of annotation dataclass instances, the size of the mask along the sample axis, the sampling rate to convert the annotation times to samples, and a boolean "include" parameter which determines if the annotations should be kept (True) or discarded (False) from the EEG data. Here, as an example, we create such a mask.
#Read the annotations from the demo file
with annotations.Pinnacle(annotations_path, start=6) as reader:
subset_annotes = reader.read(labels=['rest', 'exploring'])
#Build the mask; fp is the still the filepath to recording_001.edf; size and fs can be fetched from reader
with edf.Reader(fp) as reader:
size = reader.shape[-1]
fs = reader.header.samples_per_record[0]
mask = annotations.as_mask(subset_annotes, size, fs, include=True)
#print the first 10 values of the mask
print(mask[:10])
#The first True values should occur at 320.785 secs * 5000 Hz since fs=5000 and the first annotation
#(see above occurs at 320.785). Lets confirm this by print 10 samples around this sample
start = int(320.785 * 5000)
print(mask[start-5: start+5])
#lastly lets print out the total number of samples we will keep
expected = len(subset_annotes) * 5 * 5000 # each annote is 5 secs @ 5 kHz
actual = np.count_nonzero(mask)
print('Expected number of samples to keep is {} \nActual number kept is {}'.format(expected, actual))
[False False False False False False False False False False] [False False False False False True True True True True] Expected number of samples to keep is 150000 Actual number kept is 150000
# Build a producer with this mask and show that it has the expected shape
with edf.Reader(filepath) as reader:
masked_producer = producer(reader.read(start=6), chunksize=500, axis=-1, mask=mask)
print("Producer Shape (w/ mask applied):",masked_producer.shape)
Producer Shape (w/ mask applied): (4, 150000)
As we can see, the producer maintains four channels, with 150000 records each, exactly as we anticipated from our mask.