Data policy

A data policy in BLISS determines data structure (file format and directory structure) and registeration of data/metadata with external services. BLISS currently has two data policies

The ESRF data policy allows users to access their data and electronic logbook at https://data.esrf.fr. The data is written in Nexus compliant HDF5 files in a specific directory structure.

Note

The ESRF data policy requires configuration The Nexus writer requires configuration

The basic data policy does not impose a data directory structure or register data with any external service. Data can (but does not have to be) written in Nexus compliant HDF5 files. The basic data policy is the default policy for BLISS.

Note

The Nexus writer requires configuration

Adding new data policies is described here.

ESRF data policy¶

Data diagram

Directory structure¶

The directory in which files are saved is derived from the proposal, collection and dataset name provided by the user. The functions described below change these three and notify the data policy services.

Change proposal¶

DEMO  [1]: newproposal("hg123")
Proposal set to 'hg123'
Data path: /data/visitor/hg123/id00/sample/sample_0001

When no proposal name is given, the default proposal is inhouse proposal {beamline}{yymm}. For example at ID21 in January 2020 the default proposal name is id212001.

The data root directory is derived from the proposal name

no name given: /data/{beamline}/inhouse/
name starts with the beamline name: /data/{beamline}/inhouse/
test*, tmp* or temp*: /data/{beamline}/tmp/
all other names: /data/visitor/

These root paths can be configured but these are the defaults.

Change collection¶

A collection is a group of datasets that share some characteristics

DEMO  [2]: newcollection("sample1")
Dataset collection set to 'sample1'
Data path: /data/visitor/hg123/id00/sample1/sample1_0001

When no collection name is given, the default “sample” is used. Note that you can always come back to an existing collection to add more dataset.

When the datasets in a collection share the same sample, you can use

DEMO  [2]: newsample("sample1")
Dataset collection set to 'sample1'
Data path: /data/visitor/hg123/id00/sample1/sample1_0001

Change dataset¶

Named datasets¶

DEMO  [3]: newdataset("area1")
Dataset set to 'area1'
Data path: /data/visitor/hg123/id00/sample1/sample1_area1

When the dataset already exists the name will be automatically incremented (“area1_0002”, “area1_0003”, …). Note that you can never come back to the same dataset after you changed dataset.

Unnamed datasets¶

DEMO  [4]: newdataset()
Dataset set to '0002'
Data path: /data/visitor/hg123/id00/sample1/sample1_0002

The dataset will be named automatically “0001”, “0002”, … The dataset number is independent for each sample. Note that you can never come back to the same dataset after you changed dataset.

Dataset registration¶

The data and metadata of datasets are registered with the ESRF data policy services when configured in the beamline configuration. The command icat_info provides feedback on dataset registration of the current proposal:

DEMO_SESSION [1]: SCAN_SAVING.icat_info()
ICAT proposal time slot:
 proposal    ID002109
 beamline    ID00
 startDate   2021-09-01T12:47:58.948+02:00
 id          *********
 title       ID002109
 url         https://data.esrf.fr/investigation/*********/datasets

Datasets: 3 unconfirmed, 110 confirmed

Unconfirmed datasets:
 Name          Time since end
 sample_0060   0:00:44.645211
 sample_0061   0:00:41.392353
 sample_0062   0:00:17.115062

The first section provides information on the current time slot assigned to the current proposal. The second section shows a list of datasets which have been send to ICAT but have not been registered.

Manual dataset registration¶

Dataset registration with ICAT happens automatically but in case of failure, datasets can be registered manually.

DEMO_SESSION [2]: SCAN_SAVING.icat_register_datasets()

DEMO_SESSION [3]: SCAN_SAVING.icat_register_dataset("sample_0008")

Policy state¶

To get an overview of the current state of the data policy

DEMO  [5]: SCAN_SAVING
  Out [5]: Parameters (default) -

     .user_name             = 'denolf'
     .images_path_template  = 'scan{scan_number}'
     .images_prefix         = '{img_acq_device}_'
     .date_format           = '%Y%m%d'
     .scan_number_format    = '%04d'
     .dataset_number_format = '%04d'
     .session               = 'demo'
     .date                  = '20200208'
     .scan_name             = '{scan_name}'
     .scan_number           = '{scan_number}'
     .img_acq_device        = '<images_* only> acquisition device name'
     .writer                = 'nexus'
     .data_policy           = 'ESRF'
     .template              = '{proposal_name}/{beamline}/{sample_name}/{sample_name}_{dataset_name}'
     .beamline              = 'id00'
     .proposal_name         = 'hg123'
     .proposal_type         = 'inhouse'
     .base_path             = '/data/visitor'
     .sample_name           = 'sample1'
     .dataset_name          = '0001'
     .data_filename         = '{sample_name}_{dataset_name}'
     .images_path_relative  = True
     .creation_date         = '2020-02-08-12:09'
     .last_accessed         = '2020-02-08-12:12'
   ---------  ---------  -------------------------------------------------------------------
   exists     filename   /data/visitor/hg123/id00/sample1/sample1_0001/sample1_0001.h5
   exists     directory  /data/visitor/hg123/id00/sample1/sample1_0001
   Metadata   RUNNING    Dataset is running
   ---------  ---------  -------------------------------------------------------------------

Basic data policy¶

This data policy requires the user to use the SCAN_SAVING object directly to define where the data will be saved. The data location is completely determined by specifying base_path, template and data_filename

DEMO  [1]: SCAN_SAVING.base_path = "/tmp/data"
DEMO  [2]: SCAN_SAVING.writer = "nexus"
DEMO  [3]: SCAN_SAVING.template = "{date}/{session}/{mysubdir}"
DEMO  [4]: SCAN_SAVING.date_format = "%y%b"
DEMO  [5]: SCAN_SAVING.add("mysubdir", "sample1")
DEMO  [6]: SCAN_SAVING.data_filename = "scan{scan_number}"
DEMO  [7]: SCAN_SAVING.filename
  Out [7]: '/tmp/data/20Feb/demo/sample1/scan{scan_number}.h5'

Note that each attribute can be a template string to be filled with other attributes from the SCAN_SAVING object.

Policy state¶

To get an overview of the current state of the data policy

DEMO [8]: SCAN_SAVING
 Out [8]: Parameters (default) -

    .base_path            = '/tmp/data'
    .data_filename        = 'scan{scan_number}'
    .user_name            = 'denolf'
    .template             = '{date}/{session}/{mysubdir}'
    .images_path_relative = True
    .images_path_template = 'scan{scan_number}'
    .images_prefix        = '{img_acq_device}_'
    .date_format          = '%y%b'
    .scan_number_format   = '%04d'
    .mysubdir             = 'sample1'
    .session              = 'demo'
    .date                 = '20Feb'
    .scan_name            = '{scan_name}'
    .scan_number          = '{scan_number}'
    .img_acq_device       = '<images_* only> acquisition device name'
    .writer               = 'nexus'
    .data_policy          = 'None'
    .creation_date        = '2020-02-08-12:04'
    .last_accessed        = '2020-02-08-12:05'
    --------  ---------  -----------------------------------------------------------------
    exists    filename   /tmp/data/20Feb/demo/sample1/scan{scan_number}.h5
    exists    directory  /tmp/data/20Feb/demo/sample1
    --------  ---------  -----------------------------------------------------------------