Benchmarks

Parallel benchmark 8

Build based on the helper functions in the benchmark utilities, this benchmark executes a simple parallel read-write test.

In particular, this test case writes and reads a 3D array of type uint64_t, sliced 1D along the first dimension.

openPMD::Extent total{
    100 * scale_up, // sliced along this axis over MPI ranks
    100,
    1000
};

The benchmark writes 10 iterations, meaning that in the strong-scaling case, always around 3/4 GB of data are produced. In the weak-scaling case, the data scales as \(N * 3/4 \mathrm{GiB}\) with \(N\) as the number of participating MPI ranks.

By default, the benchmarks executes as strong-scaling unless the -w/--weak option is passed as a command-line argument to the executable.

Parallel benchmarks 8a & 8b

The following examples show parallel reading and writing of domain-decomposed data with MPI.

The Message Passing Interface (MPI) is an open communication standard for scientific computing. MPI is used on clusters, e.g. large-scale supercomputers, to communicate between nodes and provides parallel I/O primitives.

Writing

Source: examples/8a_benchmark_write_parallel.cpp

This benchmark writes a few meshes and particles, either 1D, 2D or 3D.

The meshes are viewed as grid of mini blocks. As an example, we assume the mini blocks dimension are [16, 32, 32].

Next we define the grid based on the mini block. say, [32, 32, 16]. Then our actual mesh size is [16x32, 32x32, 32x16].

Here is a sample input file (“w.input”):

dim=3
balanced=true
ratio=1
steps=10
minBlock=16 32 32
grid=32 32 16

With the above input file, we will create an openPMD file with the above mesh using

3D data
balanced load
particle to mesh ratio = 1
10 iteration steps

Note: All files generated are group based. i.e. One file per iteration.

To run:

./8a_benchmark_write_parallel w.input

then the file generated are: ../samples/8a_parallel_3Db_*

Optional input parameter: pack

Often a processor will write out a few small blocks. Using the example above, if the writer side uses 1024 processors, each processor will handle 16 blocks from the 32x32x16 grid. These 16 blocks per processor can be packed from a selection of 1x1x16 blocks, or 2x2x4, etc. The pack parameter specifies this selection, e.g., "pack=1 1 16" or "pack=2 2 4". Without specifying this parameter, a default will be applied. This parameter does not expected to impact the performance of writing, it will likely make a difference for certain reading patterns if the underlying storage is using subfiles.

Optional input parameter: encoding The supported iteration encodings are either f(ile), g(roup), v(ariable). By default, we use variable encoding.

Reading

Source: examples/8b_benchmark_read_parallel.cpp

This benchmark is to read from the files written by 8a.

The options are: a file prefix, and a read pattern

For example, if the files are in the format of /path/8a_parallel_3Db_%07T.bp the input can be simply: /path/8a_parallel_3Db <options>

otherwise, please use the full name of the file.

While openPMD-api supports more than one file types, this benchmark intents to read just one type. By default, ADIOS2 file is assumed. If it is not the desired file type, one can hint with an environment variable, e.g. export OPENPMD_BENCHMARK_USE_BACKEND=HDF5

The Read options intent to measure overall processing time in the following categories:

Metadata only (option = m)
or data retrieval (after metadata loaded)

The data retrieval is furthur divided into:

slice the “rho” mesh
(options = sx/sy/sz depends on which direction. e.g. sx implies x=0.)
slice on the 3D magnetic field(e.g. find values for “Bx”, “By” and “Bz”)
(options = fx/fy/fz depends on which direction. e.g. fx implies x=0.)

So here are the options one can use to read a file:

m
sx
sy
sz
fx
fy
fz

For example, To read files generated by the above write commmand, metadata only:

./8b_benchmark_read_parallel ../samples/8a_parallel_3Db m

More complicated Writing options (Applies to ADIOS BP)

The ADIOS BP files uses subfiles to store data from each rank. We have an option to provide hint on how data should be divided per rank in the command line: the order of options are:

grid of minimal blocks|balance|particle2mesh ratio
minial blocks
use multiple blocks
num of timesteps,
dimensions
hint on work load arrangement.

Example: “mpirun -n 4 ./8a_benchmark_write_parallel 400801 16016 1 5 3 4004002 “

Here 4 ranks are used to write a 3D mesh, minimal block is [16,16,16], grid of minimal block is [8,4,4], so the actual mesh = [16x8, 16x4, 16x4]. Number of timestep = 5.

The hint is asking each rank to work on a [16x2, 16x4, 16x4] block. It precisely cover the mesh with 4 ranks, so will be applied.

Benchmark Utilities

Further benchmarks are fund in utilities.