Skip to content

Data policy

The data policy can be enabled and configured in BLISS by adding a dedicated section in the BLISS configuration, either in the file:__init__.yml at the beamline configuration root

scan_saving:
    class: MyScanSaving
    ... # policy dependent configuration

or together with a session configuration (this is particularly useful when the same Beacon configuration is used by multiple endstations).

- class: Session
  name: my_session
  scan_saving:
    class: MyScanSaving
    ... # policy dependent configuration

BLISS currently provides BasicScanSaving and ESRFScanSaving. Adding new data policies is described here.

Basic data policy

The Basic data policy does not require any configuration. You can (but don’t have to) specify the data policy in the scan saving configuration.

scan_saving:
    class: BasicScanSaving

ESRF data policy

A minimal configuration requires enabling ESRFScanSaving in the scan saving configuration and a configuration of the communication with external ESRF data policy services (commonly referred to as ICAT).

In addition the data directories can be configured as well as the dataset metadata send to one of the data policy services.

Enable policy

The minimal scan saving configuration for the ESRF data policy:

scan_saving:
    class: ESRFScanSaving
    beamline: id00

Configure services

In order for BLISS to communicate with the ESRF data policy services, the following configuration should be added to file:__init__.yml located at beamline configuration root:

icat_servers:
    metadata_urls: [URL1, URL2]
    elogbook_url: URL3
    elogbook_token: elogbook-00000000-0000-0000-0000-000000000000
    elogbook_timeout: 0.1  # optional
    feedback_timeout: 0.1  # optional
    queue_timeout: 1  # optional
    disable: False # optional

When disable is True all e-logbook messages are lost but dataset metadata are kept in REDIS until enabled again or until switching to a different proposal.

The different timeouts are optional:

  • elogbook_timeout: time to wait for elogbook message confirmation
  • feedback_timeout: time to wait for retrieving ICAT feedback on the current proposal
  • queue_timeout: time to wait for connection to metadata_urls

Data diagram

Root directories

The ESRF data policy allows configuring the root directory based on proposal type:

scan_saving:
    class: ESRFScanSaving
    beamline: id00
    tmp_data_root: /data/{beamline}/tmp
    visitor_data_root: /data/visitor
    inhouse_data_root: /data/{beamline}/inhouse

Multiple mount points

Multiple mount points can be configured for each proposal type (visitor, inhouse and tmp). For example two mount points for inhouse proposals

scan_saving:
    ...
    inhouse_data_root:
        nfs: /data/{beamline}/inhouse
        lsb: /lsbram/{beamline}/inhouse

The active mount point can be selected in BLISS:

DEMO [1]: SCAN_SAVING.mount_point = "lsb"

The default mount point is SCAN_SAVING.mount_point == "" which selects the first mount point in the configuration.

The ICAT services need to access the data as well but they might not have access to all mount points. So a separate mount point for ICAT can be provided as well.

scan_saving:
    ...
    inhouse_data_root:
        nfs: /data/{beamline}/inhouse
        lsb: /lsbram/{beamline}/inhouse
    icat_inhouse_data_root: /data/{beamline}/inhouse

or equivalently

scan_saving:
    ...
    inhouse_data_root:
        nfs: /data/{beamline}/inhouse
        lsb: /lsbram/{beamline}/inhouse
    icat_inhouse_data_root:
        nfs: /data/{beamline}/inhouse
        lsb: /data/{beamline}/inhouse

Directory structure

Legacy directory structures can be enabled by

scan_saving:
    ...
    directory_structure_version: 1

The different versions are

  1. {base_path}/{proposal_dirname}/{beamline}/{proposal_session_name}/{collection_name}/{collection_name}_{dataset_name}
  2. {base_path}/{proposal_dirname}/{beamline}/{proposal_session_name}/raw/{collection_name}/{collection_name}_{dataset_name}
  3. {base_path}/{proposal_dirname}/{beamline}/{proposal_session_name}/RAW_DATA/{collection_name}/{collection_name}_{dataset_name}

Proposal session

There is one parameter that affects the selection of the proposal session when switching proposal

scan_saving:
    ...
    newproposal_now_offset: 24  # Default: 0 hours

When using newproposal, the proposal session is selected which started before now() + newproposal_now_offset hours. So by default Bliss does not select a proposal that starts in the future, it takes the closest one that started in the past. If you set newproposal_now_offset = 24 and a session starts within 24 hours from now, Bliss will select it.

Dataset name

When calling newdataset() Bliss should never allow you to use an existing directory.

When no dataset name is provided or the dataset name is a number (for example newdataset(10) or the equivalent newdataset("0010")), the dataset names are numbers and +1 is added until the name has not been used. As a result the default dataset names are "0001", "0002", etc.

When calling newdataset("myname") with argument, the dataset directory might already exist in which case a suffix is added. As a result the dataset names are "myname", "myname_0002", "myname_0003", etc. The enforce_dataset_suffix Beacon parameter can by used to ensure that the suffix is always added so that the dataset names are "myname_0001", "myname_0002", "myname_0003", etc. add this configuration.

scan_saving:
    ...
    enforce_dataset_suffix: true  # Default: false

Dataset metadata

Under the ESRF data policy, scans are grouped together in a dataset. Each dataset has metadata, which are sent to one of the data policy services and is meant for searching datasets in the data portal.

The ICAT database stores this metadata under a predefined set of database fields, which need to be mapped to properties from BLISS objects. This can be configured in the session configuration:

- class: Session
  name: my_session
  config-objects:
    ...
  icat-metadata:
    definitions: "https://gitlab.esrf.fr/icat/hdf5-master-config/-/raw/master/hdf5_cfg.xml"  # optional
    default:
      secondary_slit: $secondary_slits
      sample.positioners: [$sy, $sz]
      variables: $sx
      optics.positioners: [$robx, $roby]
      detector05: $lima_simulator
      detector06: $beamviewer
      detector07: $fluo_diode.counter
      detector08: $diode1
      detector09: $diode2
      attenuator02: $att2
    techniques:
      TOMO:
        detector01: $tomocam
      XRD:
        detector02: $diffcam
        attenuator01: $att1
      FLUO:
        detector03: $mca1  # metadata group provided by `HasMetadataForDataset.get_metadata()` of controller `mca1`
        detector04.name: $mca2.name  # metadata field provided by the `name` attribute of controller `mca2`

The ICAT database fields will be retrieved from the definitions URL, when specified, or from Bliss when missing.

All BLISS controllers in the session that implement the HasMetadataForDataset protocol will be used when gathering dataset metadata. There are several reasons why you would want to specify a controller explicitly under icat-metadata:

  • the controller is not part of the session (i.e. not listed under config-objects)
  • the controller does not have default metadata groups (HasMetadataForDataset.dataset_metadata_groups() == list())
  • you want to change the default metadata groups (e.g. ["secondary_slit"] instead of ["slits"])
  • you want to select specific controller attributes as metadata instead of what HasMetadataForDataset.dataset_metadata() returns
  • the controller only needs to be included for a specific technique

The metadata of a dataset are a combination of metadata from

  • the controllers under default
  • optionally, the controllers under one or more techniques (see SCAN_SAVING.dataset.techniques on how to select techniques)
  • the controllers in the BLISS session that are not specified explicitly under icat-metadata and with default metadata groups (HasMetadataForDataset.dataset_metadata_groups() != list())

The keys that are allowed can be listed by using demo_session.icat_metadata.available_icat_groups (to be used for controllers) or demo_session.icat_metadata.available_icat_fields (to be used for controller attributes). You can use the ending only when it is unique (for example, secondary_slit can be used instead of the full key instrument.secondary_slit).

DEMO_SESSION [1]: print(demo_session.icat_metadata.available_icat_groups)

    ['SAXS',
    'MX',
    'EM',
    'PTYCHO',
    'PTYCHO.Axis1',
    'PTYCHO.Axis2',
    'FLUO',
    'FLUO.measurement',
    'TOMO',
    'MRT',
    'HOLO',
    'WAXS',
    'sample',
    'sample.notes',
    'sample.positioners',
    'sample.patient',
    'sample.environment',
    'sample.environment.sensors',
    'instrument',
    'instrument.variables',
    'instrument.positioners',
    'instrument.monochromator',
    'instrument.monochromator.crystal',
    'instrument.source',
    'instrument.primary_slit',
    'instrument.secondary_slit',
    'instrument.slits',
    'instrument.xraylens01',
    'instrument.xraylens02',
    'instrument.xraylens03',
    'instrument.xraylens04',
    'instrument.xraylens05',
    'instrument.xraylens06',
    'instrument.xraylens07',
    'instrument.xraylens08',
    'instrument.xraylens09',
    'instrument.xraylens10',
    'instrument.attenuator01',
    'instrument.attenuator01.positioners',
    'instrument.attenuator02',
    'instrument.attenuator02.positioners',
    'instrument.attenuator03',
    'instrument.attenuator03.positioners',
    'instrument.attenuator04',
    'instrument.attenuator04.positioners',
    'instrument.attenuator05',
    'instrument.attenuator05.positioners',
    'instrument.attenuator06',
    'instrument.attenuator06.positioners',
    'instrument.attenuator07',
    'instrument.attenuator07.positioners',
    'instrument.attenuator08',
    'instrument.attenuator08.positioners',
    'instrument.attenuator09',
    'instrument.attenuator09.positioners',
    'instrument.attenuator10',
    'instrument.attenuator10.positioners',
    'instrument.attenuator11',
    'instrument.attenuator11.positioners',
    'instrument.attenuator12',
    'instrument.attenuator12.positioners',
    'instrument.attenuator13',
    'instrument.attenuator13.positioners',
    'instrument.attenuator14',
    'instrument.attenuator14.positioners',
    'instrument.attenuator15',
    'instrument.attenuator15.positioners',
    'instrument.insertion_device',
    'instrument.insertion_device.gap',
    'instrument.insertion_device.taper',
    'instrument.optics',
    'instrument.optics.positioners',
    'instrument.environment',
    'instrument.environment.sensors',
    'instrument.detector01',
    'instrument.detector01.positioners',
    'instrument.detector01.rois',
    'instrument.detector02',
    'instrument.detector02.positioners',
    'instrument.detector02.rois',
    'instrument.detector03',
    'instrument.detector03.positioners',
    'instrument.detector03.rois',
    'instrument.detector04',
    'instrument.detector04.positioners',
    'instrument.detector04.rois',
    'instrument.detector05',
    'instrument.detector05.positioners',
    'instrument.detector05.rois',
    'instrument.detector06',
    'instrument.detector06.positioners',
    'instrument.detector06.rois',
    'instrument.detector07',
    'instrument.detector07.positioners',
    'instrument.detector07.rois',
    'instrument.detector08',
    'instrument.detector08.positioners',
    'instrument.detector08.rois',
    'instrument.detector09',
    'instrument.detector09.positioners',
    'instrument.detector09.rois',
    'instrument.detector10',
    'instrument.detector10.positioners',
    'instrument.detector10.rois',
    'notes']