Command Line Tools

openPMD-api installs command line tools alongside the main library. These terminal-focused tools help to quickly explore, manage or manipulate openPMD data series.

`openpmd-ls`

List information about an openPMD data series.

The syntax of the command line tool is printed via:

openpmd-ls --help

With some pip-based python installations, you might have to run this as a module:

python3 -m openpmd_api.ls --help

`openpmd-pipe`

Redirect openPMD data from any source to any sink.

Any Python-enabled openPMD-api installation with enabled CLI tools comes with a command-line tool named openpmd-pipe. Naming and use are inspired from the piping concept known from UNIX shells.

With some pip-based python installations, you might have to run this as a module:

python3 -m openpmd_api.pipe --help

The fundamental idea is to redirect data from an openPMD data source to another openPMD data sink. This concept becomes useful through the openPMD-api’s ability to use different backends in different configurations; openpmd-pipe can hence be understood as a translation from one I/O configuration to another one.

Note

openpmd-pipe is (currently) optimized for streaming workflows in order to minimize the number of back-and-forth communications between writer and reader. All data load operations are issued in a single flush() per iteration. Data is loaded directly loaded into backend-provided buffers of the writer (if supported by the writer), where again only one flush() per iteration is used to put data to disk again. This means that the peak memory usage will be roughly equivalent to the data size of each single iteration.

The reader Series is configured by the parameters --infile and --inconfig which are both forwarded to the filepath and options parameters of the Series constructor. The writer Series is likewise controlled by --outfile and --outconfig.

Use of MPI is controlled by the --mpi and --no-mpi switches. If left unspecified, MPI will be used automatically if the MPI size is greater than 1.

Note

Required parameters are --infile and --outfile. Otherwise also refer to the output of --openpmd-pipe --help.

When using MPI, each dataset will be sliced into roughly equally-sized hyperslabs along the dimension with highest item count for load distribution across worker ranks.

If you are interested in further chunk distribution strategies (e.g. node-aware distribution, chunking-aware distribution) that are used/tested on development branches, feel free to contact us, e.g. on GitHub.

The remainder of this page discusses a select number of use cases and examples for the openpmd-pipe tool.

Conversion between backends

Converting from ADIOS2 to HDF5:

$ openpmd-pipe --infile simData_%T.bp --outfile simData_%T.h5

Converting from the ADIOS2 BP3 engine to the (newer) ADIOS2 BP5 engine:

$ openpmd-pipe --infile simData_%T.bp --outfile simData_%T.bp5

# or e.g. via inline TOML specification (also possible: JSON)
$ openpmd-pipe --infile simData_%T.bp --outfile output_folder/simData_%T.bp \
     --outconfig 'adios2.engine.type = "bp5"'
# the config can also be read from a file, e.g. --outconfig @cfg.toml
#                                          or   --outconfig @cfg.json

Converting between iteration encodings

Converting to group-based iteration encoding:

$ openpmd-pipe --infile simData_%T.h5 --outfile simData.h5

Converting to variable-based iteration encoding (not yet feature-complete):

# e.g. specified via inline JSON
$ openpmd-pipe --infile simData_%T.bp --outfile simData.bp \
    --outconfig '{"iteration_encoding": "variable_based"}'

Capturing a stream

Since the openPMD-api also supports streaming/staging I/O transports from ADIOS2, openpmd-pipe can be used to capture a stream in order to write it to disk. In the ADIOS2 SST engine, a stream can have any number of readers. This makes it possible to intercept a stream in a data processing pipeline.

$ cat << EOF > streamParams.toml
[adios2.engine.parameters]
DataTransport = "fabric"
OpenTimeoutSecs = 600
EOF

$ openpmd-pipe --infile streamContactFile.sst --inconfig @streamParams.toml \
    --outfile capturedStreamData_%06T.bp

# Just loading and discarding streaming data, e.g. for performance benchmarking:
$ openpmd-pipe --infile streamContactFile.sst --inconfig @streamParams.toml \
    --outfile null.bp --outconfig 'adios2.engine.type = "nullcore"'

Defragmenting a file

Due to the file layout of ADIOS2, especially mesh-refinement-enabled simulation codes can create file output that is very strongly fragmented. Since only one load_chunk() and one store_chunk() call is issued per MPI rank, per dataset and per iteration, the file is implicitly defragmented by the backend when passed through openpmd-pipe:

$ openpmd-pipe --infile strongly_fragmented_%T.bp --outfile defragmented_%T.bp

Post-hoc compression

The openPMD-api can be directly used to compress data already when originally creating it. When however intending to compress data that has been written without compression enabled, openpmd-pipe can help:

$ cat << EOF > compression_cfg.json
{
  "adios2": {
    "dataset": {
      "operators": [
        {
          "type": "blosc",
          "parameters": {
            "clevel": 1,
            "doshuffle": "BLOSC_BITSHUFFLE"
          }
        }
      ]
    }
  }
}
EOF

$ openpmd-pipe --infile not_compressed_%T.bp --outfile compressed_%T.bp \
    --outconfig @compression_cfg.json

Starting point for custom transformation and analysis

openpmd-pipe is a Python script that can serve as basis for custom extensions, e.g. for adding, modifying, transforming or reducing data. The typical use case would be as a building block in a domain-specific data processing pipeline.