HDF5

openPMD supports writing to and reading from HDF5 .h5 files. For this, the installed copy of openPMD must have been built with support for the HDF5 backend. To build openPMD with support for HDF5, use the CMake option -DopenPMD_USE_HDF5=ON. For further information, check out the installation guide, build dependencies and the build options.

I/O Method

HDF5 internally either writes serially, via POSIX on Unix systems, or parallel to a single logical file via MPI-I/O.

Backend-Specific Controls

The following environment variables control HDF5 I/O behavior at runtime.

Environment variable

Default

Description

OPENPMD_HDF5_INDEPENDENT

ON

Sets the MPI-parallel transfer mode to collective (OFF) or independent (ON).

OPENPMD_HDF5_ALIGNMENT

1

Tuning parameter for parallel I/O, choose an alignment which is a multiple of the disk block size.

OPENPMD_HDF5_CHUNKS

auto

Defaults for H5Pset_chunk: "auto" (heuristic) or "none" (no chunking).

H5_COLL_API_SANITY_CHECK

unset

Debug: Set to 1 to perform an MPI_Barrier inside each meta-data operation.

HDF5_USE_FILE_LOCKING

TRUE

Work-around: Set to FALSE in case you are on an HPC or network file system that hang in open for reads.

OMPI_MCA_io

unset

Work-around: Disable OpenMPI’s I/O implementation for older releases by setting this to ^ompio.

OPENPMD_HDF5_INDEPENDENT: by default, we implement MPI-parallel data storeChunk (write) and loadChunk (read) calls as none-collective MPI operations. Attribute writes are always collective in parallel HDF5. Although we choose the default to be non-collective (independent) for ease of use, be advised that performance penalties may occur, although this depends heavily on the use-case. For independent parallel I/O, potentially prefer using a modern version of the MPICH implementation (especially, use ROMIO instead of OpenMPI’s ompio implementation). Please refer to the HDF5 manual, function H5Pset_dxpl_mpio for more details.

OPENPMD_HDF5_ALIGNMENT This sets the alignment in Bytes for writes via the H5Pset_alignment function. According to the HDF5 documentation: For MPI IO and other parallel systems, choose an alignment which is a multiple of the disk block size. On Lustre filesystems, according to the NERSC documentation, it is advised to set this to the Lustre stripe size. In addition, ORNL Summit GPFS users are recommended to set the alignment value to 16777216(16MB).

OPENPMD_HDF5_CHUNKS This sets defaults for data chunking via H5Pset_chunk. Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.

H5_COLL_API_SANITY_CHECK: this is a HDF5 control option for debugging parallel I/O logic (API calls). Debugging a parallel program with that option enabled can help to spot bugs such as collective MPI-calls that are not called by all participating MPI ranks. Do not use in production, this will slow parallel I/O operations down.

HDF5_USE_FILE_LOCKING: this is a HDF5 1.10.1+ control option that disables HDF5 internal file locking operations (see HDF5 1.10.1 release notes). This mechanism is mainly used to ensure that a file that is still being written to cannot (yet) be opened by either a reader or another writer. On some HPC and Jupyter systems, parallel/network file systems like GPFS are mounted in a way that interferes with this internal, HDF5 access consistency check. As a result, read-only operations like h5ls some_file.h5 or openPMD Series open can hang indefinitely. If you are sure that the file was written completely and is closed by the writer, e.g., because a simulation finished that created HDF5 outputs, then you can set this environment variable to FALSE to work-around the problem. You should also report this problem to your system support, so they can fix the file system mount options or disable locking by default in the provided HDF5 installation.

OMPI_MCA_io: this is an OpenMPI control variable. OpenMPI implements its own MPI-I/O implementation backend OMPIO, starting with OpenMPI 2.x . This backend is known to cause problems in older releases that might still be in use on some systems. Specifically, we found and reported a silent data corruption issue that was fixed only in OpenMPI versions 3.0.4, 3.1.4, 4.0.1 and newer. There are also problems in OMPIO with writes larger than 2GB, which have only been fixed in OpenMPI version 3.0.5, 3.1.5, 4.0.3 and newer. Using export OMPI_MCA_io=^ompio before mpiexec/mpirun/srun/jsrun will disable OMPIO and instead fall back to the older ROMIO MPI-I/O backend in OpenMPI.

Selected References

  • GitHub issue #554

  • Axel Huebl, Rene Widera, Felix Schmitt, Alexander Matthes, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, and Michael Bussmann. On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective, ISC High Performance 2017: High Performance Computing, pp. 15-29, 2017. arXiv:1706.00522, DOI:10.1007/978-3-319-67630-2_2