First Write

Step-by-step: how to write scientific data with openPMD-api?

Include / Import

After successful installation, you can start using openPMD-api as follows:

C++14

#include <openPMD/openPMD.hpp>

// example: data handling
#include <numeric>  // std::iota
#include <vector>   // std::vector

namespace io = openPMD;

Python

import openpmd_api as io

# example: data handling
import numpy as np

Open

Write into a new openPMD series in myOutput/data_<00...N>.h5. Further file formats than .h5 (HDF5) are supported: .bp (ADIOS1/ADIOS2) or .json (JSON).

C++14

auto series = io::Series(
    "myOutput/data_%05T.h5",
    io::Access::CREATE);

Python

series = io.Series(
    "myOutput/data_%05T.h5",
    io.Access.create)

Iteration

Grouping by an arbitrary, positive integer number <N> in a series:

C++14

auto i = series.iterations[42];

Python

i = series.iterations[42]

Attributes

Everything in openPMD can be extended and user-annotated. Let us try this by writing some meta data:

C++14

series.setAuthor(
    "Axel Huebl <axelhuebl@lbl.gov>");
series.setMachine(
    "Hall Probe 5000, Model 3");
series.setAttribute(
    "dinner", "Pizza and Coke");
i.setAttribute(
    "vacuum", true);

Python

series.author = \
    "Axel Huebl <axelhuebl@lbl.gov>"
series.machine = "Hall Probe 5000, Model 3"
series.set_attribute(
    "dinner", "Pizza and Coke")
i.set_attribute(
    "vacuum", True)

Data

Let’s prepare some data that we want to write. For example, a magnetic field slice \(\vec B(i, j)\) in two spatial dimensions with three components \((B_x, B_y, B_z)^\intercal\) of which the \(B_y\) component shall be constant for all \((i, j)\) indices.

C++14

std::vector<float> x_data(
    150 * 300);
std::iota(
    x_data.begin(),
    x_data.end(),
    0.);

float y_data = 4.f;

std::vector<float> z_data(x_data);
for( auto& c : z_data )
    c -= 8000.f;

Python

x_data = np.arange(
    150 * 300,
    dtype=np.float
).reshape(150, 300)



y_data = 4.

z_data = x_data.copy() - 8000.

Record

An openPMD record can be either structured (mesh) or unstructured (particles). We prepared a vector field in 2D above, which is a mesh:

C++14

// record
auto B = i.meshes["B"];

// record components
auto B_x = B["x"];
auto B_y = B["y"];
auto B_z = B["z"];

auto dataset = io::Dataset(
    io::determineDatatype<float>(),
    {150, 300});
B_x.resetDataset(dataset);
B_y.resetDataset(dataset);
B_z.resetDataset(dataset);

Python

# record
B = i.meshes["B"]

# record components
B_x = B["x"]
B_y = B["y"]
B_z = B["z"]

dataset = io.Dataset(
    x_data.dtype,
    x_data.shape)
B_x.reset_dataset(dataset)
B_y.reset_dataset(dataset)
B_z.reset_dataset(dataset)

Units

Let’s describe this magnetic field \(\vec B\) in more detail. Independent of the absolute unit system, a magnetic field has the physical dimension of [mass (M)1 \(\cdot\) electric current (I)-1 \(\cdot\) time (T)-2].

Ouch, our magnetic field was measured in cgs units! Quick, let’s also store the conversion factor 10-4 from Gauss (cgs) to Tesla (SI).

C++14

// unit system agnostic dimension
B.setUnitDimension({
    {io::UnitDimension::M,  1},
    {io::UnitDimension::I, -1},
    {io::UnitDimension::T, -2}
});

// conversion to SI
B_x.setUnitSI(1.e-4);
B_y.setUnitSI(1.e-4);
B_z.setUnitSI(1.e-4);

Python

# unit system agnostic dimension
B.unit_dimension = {
    io.Unit_Dimension.M:  1,
    io.Unit_Dimension.I: -1,
    io.Unit_Dimension.T: -2
}

# conversion to SI
B_x.unit_SI = 1.e-4
B_y.unit_SI = 1.e-4
B_z.unit_SI = 1.e-4

Tip

Annotating the physical dimension (unitDimension) of a record allows us to read data sets with arbitrary names and understand their purpose simply by dimensional analysis. The dimensional base quantities in openPMD are length (L), mass (M), time (T), electric current (I), thermodynamic temperature (theta), amount of substance (N), luminous intensity (J) after the international system of quantities (ISQ). The factor to SI (unitSI) on the other hand allows us to convert values between absolute unit systems.

Register Chunk

We can write record components partially and in parallel or at once. Writing very small data one by one is is a performance killer for I/O. Therefore, we register all data to be written first and then flush it out collectively.

C++14

B_x.storeChunk(
    io::shareRaw(x_data),
    {0, 0}, {150, 300});
B_z.storeChunk(
    io::shareRaw(z_data),
    {0, 0}, {150, 300});

B_y.makeConstant(y_data);

Python

B_x.store_chunk(x_data)


B_z.store_chunk(z_data)



B_y.make_constant(y_data)

Attention

After registering a data chunk such as x_data and y_data, it MUST NOT be modified or deleted until the flush() step is performed!

Flush Chunk

We now flush the registered data chunks to the I/O backend. Flushing several chunks at once allows to increase I/O performance significantly. After that, the variables x_data and y_data can be used again.

C++14

series.flush();

Python

series.flush()

Close

Finally, the Series is fully closed (and newly registered data or attributes since the last .flush() is written) when its destructor is called.

C++14

// destruct series object,
// e.g. when out-of-scope

Python

del series