Data policy and data format

The ESRF data policy allows users to access their data and electronic logbook at https://data.esrf.fr. The data is written in Nexus compliant HDF5 files in a specific directory structure.

Directory structure¶

Under the ESRF data policy the data directory is determined from: proposal name, sample name and dataset name. The data directory for a user experiment is:

/data/visitor/proposal/beamline/sample/sample_dataset/

Proposal¶

Change the proposal name with the command newproposal():

DEMO  [1]: newproposal("hg123")
Proposal set to 'hg123'
Data path: /data/visitor/hg123/idxy/sample/sample_0001

The root directory is determined from the proposal name (can be configured):

user experiment: /data/visitor/proposal
inhouse (starts with the beamline name): /data/idxy/inhouse/proposal/
test (starts with test, temp or tmp): /data/idxy/tmp/proposal

When no proposal name is specified the default used is the beamline proposal name of the month (in this example February 2020 at beamline idxy).

DEMO  [3]: newproposal()
Proposal set to 'idxy2002'
Data path: /data/idxy/inhouse/idxy2002/idxy/sample/sample_0001

The default beamline proposal is also accessible through https://data.esrf.fr and offers an electronic logbook with the exception of test proposals.

At the end of an experiment you can use the function endproposal() to ensure nobody accidentally adds data to the proposal directory from Bliss. Just like the command newproposal() it switches data saving to the default beamline proposal of the month.

DEMO  [3]: endproposal()
Proposal set to 'idxy2002'
Data path: /data/idxy/inhouse/idxy2002/idxy/sample/sample_0001

Sample¶

Change sample name with the function newsample():

DEMO  [4]: newsample("treated_cells_1_4")
Sample set to 'treated_cells_1_4'
Data path: /data/visitor/hg123/idxy/treated_cells_1_4/treated_cells_1_4_0001

Dataset¶

Change the dataset name with the function newdataset():

DEMO  [5]: newdataset("area1")
Dataset set to 'area1'
Data path: /data/visitor/hg123/idxy/treated_cells_1_4/treated_cells_1_4_area1

When no dataset name is specified, datasets are numbered (unique within a sample):

DEMO  [6]: newdataset()
Dataset set to '0005'
Data path: /data/visitor/hg123/idxy/treated_cells_1_4/treated_cells_1_4_0005

Data structure¶

Within the directory of a dataset you can find the main Nexus compliant HDF5 file.

/data/visitor/proposal/beamline/sample/sample_dataset/sample_dataset.h5

All scan data belonging to the dataset are accessible through this one file, although some detectors (for example all 2D detectors) actually save their data in separate files.

Data writing¶

A TANGO device referred to as the Nexus writer saves all data produced by BLISS. Data saving can be monitored through this device:

Nexus writer

Accessing data¶

The ESRF Data Analysis Unit provids two main tools to visualize generic Nexus data:

silx view
pymca

A description on how to access HDF5 files, while they are being modified by a writer, can be found here.

Silx view¶

silx view sample_dataset.h5

Silx view can be called directly from a BLISS shell via silx_view() function. It points then on the last scan’s data file.

Warning

Do not use a silx versions older than 0.12.0

BLISS locks the dataset file sample_dataset.h5 during scans. If other applications have locked the file (which is the default behaviour when opening HDF5 files) BLISS cannot write to it and scans will be prevented from starting. Tools like silx and pymca do not lock the file and are able to visualize its content without disturbing ongoing scans.

PyMca¶

pymca sample_dataset.h5

Provides easy access to positioner and detector data for plotting.

PyMca can be launched directly from a BLISS shell via the pymca() function. It points then on the last scan’s data file.

Nexus pymca

Nexus format¶

The dataset file sample_dataset.h5 contains one entry for each scan. Using the Silx viewer it looks like this:

Nexus scans

A scan entry has two subsections by default: instrument (for positioner and detector data/metadata) and measurement (links to all data under instrument). Additional subsections can appear when defining plots, scan notes, sample description and application definitions.

Nexus scan

The instrument section contains groups of data and metadata for detectors and positioners:

Nexus scan

The measurement group together with the positioner group under instrument has all links to all data and motor positions at the start of a scan:

Nexus scan

Custom metadata¶

BLISS provides metadata for detectors but if something is missing, there are two mechanisms to add custom metadata. The first is through user notes:

DEMO [1]: s = loopscan(10,0.1,run=False)
DEMO [2]: s.add_comment("This is a comment")
DEMO [3]: s.add_comment("This is another comment")
DEMO [4]: s.add_comment("And another one")
DEMO [5]: s.run()

The second is for more structured metadata can do this:

DEMO [1]: from bliss.scanning.scan_meta import get_user_scan_meta
DEMO [2]: scan_meta = get_user_scan_meta()
DEMO [3]: scan_meta.instrument.set("diffcam_distance", {"diffcam": {"distance": 10.2, "distance@units": "mm"}})

This adds an HDF5 dataset /<scan>/instrument/diffcam/distance with a units attribute to the NXdetector group diffcam. The string "diffcam_distance" is not part of the metadata, it is just an identifier to allow removing the metadata for the next scan.

DEMO [4]: scan_meta.instrument.remove("diffcam_distance")

Instead of a static dictionary, a function can be provided that returns a dictionary. If it returns None or an empty dictionary it will be ignored. For example the sample-detector distance could be derived from a motor position and you only want it for scans that have the detector

DEMO [1]: from bliss.scanning.scan_meta import get_user_scan_meta
DEMO [2]: scan_meta = get_user_scan_meta()
DEMO [3]: def diffcam_distance(scan):
            channels = scan.scan_info.get("channels", dict())
            diffcam_used = f"{diffcam}:image" in channels
            if diffcam_used:
                return {diffcam.name: {"distance": sy.position+10.2, "distance@units": sy.unit}}

DEMO [4]: scan_meta.instrument.set("diffcam_distance", diffcam_distance)

These commands need to be added to the session’s setup file for them to be persistent.

We used the instrument category in the examples above which adds metadata to the /<scan>/instrument HDF5 group. You can add your own categories. For example to add the /<scan>/sample/chemical_formula HDF5 dataset at the end of the scan

DEMO [1]: from bliss.scanning.scan_meta import get_user_scan_meta, META_TIMING
DEMO [2]: scan_meta = get_user_scan_meta()
DEMO [3]: scan_meta.add_categories({"sample"})
DEMO [4]: scan_meta.sample.timing = META_TIMING.END  # Default: META_TIMING.START
DEMO [5]: scan_meta.sample.set("chemical_formula", {"chemical_formula": "Fe2O3"})

Note that categories can have a timing for metadata to be generated at different stages of the scan: start, prepared and end.

When adding metadata this way, consult the Nexus standard first to know whether the information you want to add already has a designated location and name.