Data policy
A data policy in BLISS determines data structure (file format and directory structure) and registeration of data/metadata with external services. BLISS currently has two data policies
- The ESRF data policy allows users to access their data and electronic logbook at https://data.esrf.fr. The data is written in Nexus compliant HDF5 files in a specific directory structure.
Note
The ESRF data policy requires configuration The Nexus writer requires configuration
- The basic data policy does not impose a data directory structure or register data with any external service. Data can (but does not have to be) written in Nexus compliant HDF5 files. The basic data policy is the default policy for BLISS.
Note
The Nexus writer requires configuration
Adding new data policies is described here.
ESRF data policy¶
Directory structure¶
The directory in which files are saved is derived from the proposal, collection and dataset name provided by the user. The functions described below change these three and notify the data policy services.
Change proposal¶
DEMO [1]: newproposal("hg123")
Proposal set to 'hg123'
Data path: /data/visitor/hg123/id00/sample/sample_0001
When no proposal name is given, the default proposal is inhouse proposal
{beamline}{yymm}
. For example at ID21 in January 2020 the default proposal
name is id212001
.
The data root directory is derived from the proposal name
- no name given:
/data/{beamline}/inhouse/
- name starts with the beamline name:
/data/{beamline}/inhouse/
- test*, tmp* or temp*:
/data/{beamline}/tmp/
- all other names:
/data/visitor/
These root paths can be configured but these are the defaults.
Change collection¶
A collection is a group of datasets that share some characteristics
DEMO [2]: newcollection("sample1")
Dataset collection set to 'sample1'
Data path: /data/visitor/hg123/id00/sample1/sample1_0001
When no collection name is given, the default “sample” is used. Note that you can always come back to an existing collection to add more dataset.
When the datasets in a collection share the same sample, you can use
DEMO [2]: newsample("sample1")
Dataset collection set to 'sample1'
Data path: /data/visitor/hg123/id00/sample1/sample1_0001
Change dataset¶
Named datasets¶
DEMO [3]: newdataset("area1")
Dataset set to 'area1'
Data path: /data/visitor/hg123/id00/sample1/sample1_area1
When the dataset already exists the name will be automatically incremented (“area1_0002”, “area1_0003”, …). Note that you can never come back to the same dataset after you changed dataset.
Unnamed datasets¶
DEMO [4]: newdataset()
Dataset set to '0002'
Data path: /data/visitor/hg123/id00/sample1/sample1_0002
The dataset will be named automatically “0001”, “0002”, … The dataset number is independent for each sample. Note that you can never come back to the same dataset after you changed dataset.
Dataset registration¶
The data and metadata of datasets are registered with the ESRF data policy services
when configured in the beamline configuration. The command
icat_info
provides feedback on dataset registration of the current proposal:
DEMO_SESSION [1]: SCAN_SAVING.icat_info()
ICAT proposal time slot:
proposal ID002109
beamline ID00
startDate 2021-09-01T12:47:58.948+02:00
id *********
title ID002109
url https://data.esrf.fr/investigation/*********/datasets
Datasets: 3 unconfirmed, 110 confirmed
Unconfirmed datasets:
Name Time since end
sample_0060 0:00:44.645211
sample_0061 0:00:41.392353
sample_0062 0:00:17.115062
The first section provides information on the current time slot assigned to the current proposal. The second section shows a list of datasets which have been send to ICAT but have not been registered.
Manual dataset registration¶
Dataset registration with ICAT happens automatically but in case of failure, datasets can be registered manually.
Register all unconfirmed datasets with ICAT
DEMO_SESSION [2]: SCAN_SAVING.icat_register_datasets()
Register a specific dataset with ICAT
DEMO_SESSION [3]: SCAN_SAVING.icat_register_dataset("sample_0008")
Policy state¶
To get an overview of the current state of the data policy
DEMO [5]: SCAN_SAVING
Out [5]: Parameters (default) -
.user_name = 'denolf'
.images_path_template = 'scan{scan_number}'
.images_prefix = '{img_acq_device}_'
.date_format = '%Y%m%d'
.scan_number_format = '%04d'
.dataset_number_format = '%04d'
.session = 'demo'
.date = '20200208'
.scan_name = '{scan_name}'
.scan_number = '{scan_number}'
.img_acq_device = '<images_* only> acquisition device name'
.writer = 'nexus'
.data_policy = 'ESRF'
.template = '{proposal_name}/{beamline}/{sample_name}/{sample_name}_{dataset_name}'
.beamline = 'id00'
.proposal_name = 'hg123'
.proposal_type = 'inhouse'
.base_path = '/data/visitor'
.sample_name = 'sample1'
.dataset_name = '0001'
.data_filename = '{sample_name}_{dataset_name}'
.images_path_relative = True
.creation_date = '2020-02-08-12:09'
.last_accessed = '2020-02-08-12:12'
--------- --------- -------------------------------------------------------------------
exists filename /data/visitor/hg123/id00/sample1/sample1_0001/sample1_0001.h5
exists directory /data/visitor/hg123/id00/sample1/sample1_0001
Metadata RUNNING Dataset is running
--------- --------- -------------------------------------------------------------------
Basic data policy¶
This data policy requires the user to use the SCAN_SAVING object directly to define where the data will be saved. The data location is completely determined by specifying base_path, template and data_filename
DEMO [1]: SCAN_SAVING.base_path = "/tmp/data"
DEMO [2]: SCAN_SAVING.writer = "nexus"
DEMO [3]: SCAN_SAVING.template = "{date}/{session}/{mysubdir}"
DEMO [4]: SCAN_SAVING.date_format = "%y%b"
DEMO [5]: SCAN_SAVING.add("mysubdir", "sample1")
DEMO [6]: SCAN_SAVING.data_filename = "scan{scan_number}"
DEMO [7]: SCAN_SAVING.filename
Out [7]: '/tmp/data/20Feb/demo/sample1/scan{scan_number}.h5'
Note that each attribute can be a template string to be filled with other attributes from the SCAN_SAVING object.
Policy state¶
To get an overview of the current state of the data policy
DEMO [8]: SCAN_SAVING
Out [8]: Parameters (default) -
.base_path = '/tmp/data'
.data_filename = 'scan{scan_number}'
.user_name = 'denolf'
.template = '{date}/{session}/{mysubdir}'
.images_path_relative = True
.images_path_template = 'scan{scan_number}'
.images_prefix = '{img_acq_device}_'
.date_format = '%y%b'
.scan_number_format = '%04d'
.mysubdir = 'sample1'
.session = 'demo'
.date = '20Feb'
.scan_name = '{scan_name}'
.scan_number = '{scan_number}'
.img_acq_device = '<images_* only> acquisition device name'
.writer = 'nexus'
.data_policy = 'None'
.creation_date = '2020-02-08-12:04'
.last_accessed = '2020-02-08-12:05'
-------- --------- -----------------------------------------------------------------
exists filename /tmp/data/20Feb/demo/sample1/scan{scan_number}.h5
exists directory /tmp/data/20Feb/demo/sample1
-------- --------- -----------------------------------------------------------------