The openPMD API provides utilities to quickly configure and run benchmarks in a flexible fashion. The starting point for configuring and running benchmarks is the class template Benchmark<DatasetFillerProvider>.

#include "openPMD/benchmark/mpi/Benchmark.hpp"

An object of this class template allows to preconfigure a number of benchmark runs to execute, each run specified by:

  • The compression configuration, consisting itself of the compression string and the compression level.

  • The backend to use, specified by the filename extension (e.g. “h5”, “bp”, “json”, …).

  • The type of data to write, specified by the openPMD datatype.

  • The number of ranks to use, not greater than the MPI size. An overloaded version of addConfiguration() exists that picks the MPI size.

  • The number n of iterations. The benchmark will effectively be repeated n times.

The benchmark object is globally (i.e. by its constructor) specified by:

  • The base path to use. This will be extended with the chosen backend’s filename extension. Benchmarks might overwrite each others’ files.

  • The total extent of the dataset across all MPI ranks.

  • The BlockSlicer, i.e. an object telling each rank which portion of the dataset to write to and read from. Most users will be content with the implementation provided by OneDimensionalBlockSlicer that will simply divide the dataset into hyperslabs along one dimension, default = 0. This implementation can also deal with odd dimensions that are not divisible by the MPI size.

  • A DatasetFillerProvider. DatasetFiller<T> is an abstract class template whose job is to create the write data of type T for one run of the benchmark. Since one Benchmark object allows to use several datatypes, a DatasetFillerProvider is needed to create such objects. DatasetFillerProvider is a template parameter of the benchmark class template and should be a templated functor whose operator()<T>() returns a shared_ptr<DatasetFiller<T>> (or a value that can be dynamically casted to it). For users seeking to only run the benchmark with one datatype, the class template SimpleDatasetFillerProvider<DF> will lift a DatasetFiller<T> to a DatasetFillerProvider whose operator()<T'>() will only successfully return if T and T' are the same type.

  • The MPI Communicator.

The class template RandomDatasetFiller<Distr, T> (where by default T = typename Distr::result_type) provides an implementation of the DatasetFiller<T> that lifts a random distribution to a DatasetFiller. The general interface of a DatasetFiller<T> is kept simple, but an implementation should make sure that every call to DatasetFiller<T>::produceData() takes roughly the same amount of time, thus allowing to deduct from the benchmark results the time needed for producing data.

The configured benchmarks are run one after another by calling the method Benchmark<...>::runBenchmark<Clock>(int rootThread). The Clock template parameter should meet the requirements of a trivial clock. Although every rank will return a BenchmarkReport<typename Clock::rep>, only the report of the previously specified root rank will be populated with data, i.e. all ranks’ data will be collected into one report.

Example Usage

#include <openPMD/openPMD.hpp>
#include <openPMD/benchmark/mpi/MPIBenchmark.hpp>
#include <openPMD/benchmark/mpi/RandomDatasetFiller.hpp>
#include <openPMD/benchmark/mpi/OneDimensionalBlockSlicer.hpp>

#if openPMD_HAVE_MPI
#   include <mpi.h>

#include <iostream>
#include <string>
#include <vector>

#if openPMD_HAVE_MPI
inline void
print_help( std::string const program_name )
    std::cout << "Usage: " << program_name << "\n";
    std::cout << "Run a simple parallel write and read benchmark.\n\n";
    std::cout << "Options:\n";
    std::cout << "    -w, --weak    run a weak scaling (default: strong scaling)\n";
    std::cout << "    -h, --help    display this help and exit\n";
    std::cout << "    -v, --version output version information and exit\n";
    std::cout << "\n";
    std::cout << "Examples:\n";
    std::cout << "    " << program_name << " --weak  # for a weak-scaling\n";
    std::cout << "    " << program_name << "  # for a strong scaling\n";

inline void
print_version( std::string const program_name )
    std::cout << program_name << " (openPMD-api) "
              << openPMD::getVersion() << "\n";
    std::cout << "Copyright 2017-2021 openPMD contributors\n";
    std::cout << "Authors: Franz Poeschel, Axel Huebl et al.\n";
    std::cout << "License: LGPLv3+\n";
    std::cout << "This is free software: you are free to change and redistribute it.\n"
                 "There is NO WARRANTY, to the extent permitted by law.\n";

int main(
    int argc,
    char *argv[]
    using namespace std;

    // CLI parsing
    std::vector< std::string > str_argv;
    for( int i = 0; i < argc; ++i ) str_argv.emplace_back( argv[i] );
    bool weak_scaling = false;

    for (int c = 1; c < int(argc); c++) {
        if (std::string("--help") == argv[c] || std::string("-h") == argv[c]) {
            return 0;
        if (std::string("--version") == argv[c] || std::string("-v") == argv[c]) {
            return 0;
        if (std::string("--weak") == argv[c] || std::string("-w") == argv[c]) {
            weak_scaling = true;

    if (argc > 2) {
        std::cerr << "Too many arguments! See: " << argv[0] << " --help\n";
        return 1;

    // For simplicity, use only one datatype in this benchmark.
    // Note that a single Benchmark object can be used to configure
    // multiple different benchmark runs with different datatypes,
    // given that you provide it with an appropriate DatasetFillerProvider
    // (template parameter of the Benchmark class).
    using type = uint64_t;
    openPMD::Datatype dt = openPMD::determineDatatype<type>();

    int rank, size;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );
    const unsigned scale_up = weak_scaling ? unsigned( size ) : 1u;

    // Total (in this case 3D) dataset across all MPI ranks.
    // Will be the same for all configured benchmarks.
    openPMD::Extent total{
        100 * scale_up,

    // The blockslicer assigns to each rank its part of the dataset. The rank will
    // write to and read from that part. OneDimensionalBlockSlicer is a simple
    // implementation of the BlockSlicer abstract class that will divide the
    // dataset into hyperslab along one given dimension.
    // If you wish to partition your dataset in a different manner, you can
    // replace this with your own implementation of BlockSlicer.
    auto blockSlicer = std::make_shared<openPMD::OneDimensionalBlockSlicer>(0);

    // Set up the DatasetFiller. The benchmarks will later inquire the DatasetFiller
    // to get data for writing.
    std::uniform_int_distribution<type> distr(
    openPMD::RandomDatasetFiller<decltype(distr)> df{distr};

    // The Benchmark class will in principle allow a user to configure
    // runs that write and read different datatypes.
    // For this, the class is templated with a type called DatasetFillerProvider.
    // This class serves as a factory for DatasetFillers for concrete types and
    // should have a templated operator()<T>() returning a value
    // that can be dynamically casted to a std::shared_ptr<openPMD::DatasetFiller<T>>
    // The openPMD API provides only one implementation of a DatasetFillerProvider,
    // namely the SimpleDatasetFillerProvider being used in this example.
    // Its purpose is to leverage a DatasetFiller for a concrete type (df in this example)
    // to a DatasetFillerProvider whose operator()<T>() will fail during runtime if T does
    // not correspond with the underlying DatasetFiller.
    // Use this implementation if you only wish to run the benchmark for one Datatype,
    // otherwise provide your own implementation of DatasetFillerProvider.
    openPMD::SimpleDatasetFillerProvider<decltype(df)> dfp{df};

    // Create the Benchmark object. The file name (first argument) will be extended
    // with the backends' file extensions.
    openPMD::MPIBenchmark<decltype(dfp)> benchmark{

    // Add benchmark runs to be executed. This will only store the configuration and not
    // run the benchmark yet. Each run is configured by:
    // * The compression scheme to use (first two parameters). The first parameter chooses
    //   the compression scheme, the second parameter is the compression level.
    // * The backend (by file extension).
    // * The datatype to use for this run.
    // * The number of iterations. Effectively, the benchmark will be repeated for this many
    //   times.
    benchmark.addConfiguration("", 0, "bp", dt, 10);
#if openPMD_HAVE_HDF5
    benchmark.addConfiguration("", 0, "h5", dt, 10);

    // Execute all previously configured benchmarks. Will return a MPIBenchmarkReport object
    // with write and read times for each configured run.
    // Take notice that results will be collected into the root rank's report object, the other
    // ranks' reports will be empty. The root rank is specified by the first parameter of runBenchmark,
    // the default being 0.
    auto res =

    if( rank == 0 )
        for( auto it = res.durations.begin();
             it != res.durations.end();
             it++ )
            auto time = it->second;
            std::cout << "on rank " << std::get<res.RANK>(it->first)
                      << "\t with backend "
                      << std::get<res.BACKEND>(it->first)
                      << "\twrite time: "
                      << std::chrono::duration_cast<std::chrono::milliseconds>(
                      ).count() << "\tread time: "
                      << std::chrono::duration_cast<std::chrono::milliseconds>(
                      ).count() << std::endl;

int main(void)
    return 0;