Skip to content

H5py-like API to access BLISS scan data

Scan data is saved in NeXus-compliant HDF5 files. When reading these files during acquisition, failures will occur often in which case the file needs to be closed and opened again. To avoid having to deal with this issue, blissdata provide an h5py-like API which can be used to read the scan data during and after the experiment without changes in the reader code.

In the future the h5py-like API will also support fetching data from memory (Redis or Lima) when possible.

In the examples below we will use this function to process scan data

def process_scan_data(nxentry):
    # Detectors and motors from which to read data
    datasets = zip(
        nxentry["instrument/samy/value"],
        nxentry["instrument/diode1/data"],
        nxentry["instrument/eiger1/data"]
    )

    # Loop over all points of the scan
    for y, I0, image in datasets:
        print("samy", y)
        print("iodet", I0)
        print("eiger1", image)

HDF5 files during the experiment

from blissdata.h5api import dynamic_hdf5

filename = "/tmp/scans/inhouse/id002211/id00/20221101/sample/sample_0001/sample_0001.h5"

with dynamic_hdf5.File(filename, lima_names=["eiger1"]) as root:
    for scan in root:  # loops indefinitely
        print("\nScan", scan)
        process_scan_data(root[scan], lima_names)

The lima_names is the only deviation from the h5py API and can be omitted when the dataset is closed (i.e. nothing is being written to it anymore). Other non-h5py arguments that can be provided are:

  • retry_period: period in seconds to retry failed HDF5 read operations
  • retry_timeout: time in seconds to retry failed HDF5 read operations after which a RetryTimeoutException will be raised

HDF5 files after the experiment

When the dataset is closed, no writer will access it anymore. You can use blissdata.h5api.dynamic_hdf5 as shown above or you can use h5py directly without changing the code (except for the lima_names argument)

import h5py

filename = "/tmp/scans/inhouse/id002211/id00/20221101/sample/sample_0001/sample_0001.h5"

with h5py.File(filename) as root:
    for scan in root:  # loops over all scans in the file
        print("\nScan", scan)
        process_scan_data(root[scan])

Alternatively you can also do this (same code, different import)

from blissdata.h5api import static_hdf5

filename = "/tmp/scans/inhouse/id002211/id00/20221101/sample/sample_0001/sample_0001.h5"

with static_hdf5.File(filename) as root:
    for scan in root:  # loops over all scans in the file
        print("\nScan", scan)
        process_scan_data(root[scan])

Static vs. Dynamic

The static and dynamic API’s are identical and mimic the read-only part of the h5py API. They mainly provide the classes Group(Mapping) and Dataset(Sequence). The values of Group are of type Group or Dataset. Note that a Mapping mainly provides item getting and iteration while Sequence mainly provides slicing and iteration.

Although the API’s are the same, they behave differently

Static HDF5 Dynamic HDF5
group[name] Return immediately Block until the key is present or the scan is marked as “FINISHED”
dataset[idx] Return immediately Block until the entire slice is available or until the scan is marked as “FINISHED”
for name in group Stops when all names are yielded Stops when the scan is marked as “PREPARED”
for data in dataset Stops when all data points are yielded Stops when the scan is marked as “FINISHED”

The only exception to this is the top-level Group in the dynamic HDF5 API. As no scan is associated with the top-level group, the loop for name in group never exists and group[name] black forever until the key is present.