We are starting with hdf5 because our initial test case is the NWB format for neurophysiology data. This is a challenging initial test case because the data is heterogeneous, large, and the specification is written in an idiosyncratic format specification.

HDF has three primary types of objects:

  • Groups

  • Datasets - contain the raw values in the file

  • Attributes - metadata about groups or datasets.

Datasets have additional properties:

  • Datatypes: binary representation of the data

  • Dataspaces: Layout of individual data elements

  • Properties: Additional information about the representation of the dataset, eg. chunked or contiguous

These map naturally onto triplets, where each group or dataset is a subject, attributes are properties and objects.