First Write¶

Step-by-step: how to write scientific data with openPMD-api?

Include / Import¶

After successful installation, you can start using openPMD-api as follows:

C++11¶

#include <openPMD/openPMD.hpp>

// example: data handling
#include <numeric>  // std::iota
#include <vector>   // std::vector

namespace api = openPMD;

Python¶

import openpmd_api as api

# example: data handling
import numpy as np

Open¶

Write into a new openPMD series in myOutput/data_<00...N>.h5. Further file formats than .h5 (HDF5) are supported: .bp (ADIOS1) or .json (JSON).

C++11¶

auto series = api::Series(
    "myOutput/data_%05T.h5",
    api::AccessType::CREATE);

Python¶

series = api.Series(
    "myOutput/data_%05T.h5",
    api.Access_Type.create)

Iteration¶

Grouping by an arbitrary, positive integer number <N> in a series:

C++11¶

auto i = series.iterations[42];

Python¶

i = series.iterations[42]

Attributes¶

Everything in openPMD can be extended and user-annotated. Let us try this by writing some meta data:

C++11¶

series.setAuthor(
    "Axel Huebl <a.huebl@hzdr.de>");
series.setMachine(
    "Hall Probe 5000, Model 3");
series.setAttribute(
    "dinner", "Pizza and Coke");
i.setAttribute(
    "vacuum", true);

Python¶

series.set_author(
    "Axel Huebl <a.huebl@hzdr.de>")
series.set_machine(
    "Hall Probe 5000, Model 3")
series.set_attribute(
    "dinner", "Pizza and Coke")
i.set_attribute(
    "vacuum", True)

Data¶

Let’s prepare some data that we want to write. For example, a magnetic field \(\vec B(i, j)\) slice in two dimensions with three components \((B_x, B_y, B_z)^\intercal\) of which the \(B_y\) component shall be constant for all \((i, j)\) indices.

C++11¶

std::vector<float> x_data(
    150 * 300);
std::iota(
    x_data.begin(),
    x_data.end(),
    0.);

float y_data = 4.f;

std::vector<float> z_data(x_data);
for( auto& c : z_data )
    c -= 8000.f;

Python¶

x_data = np.arange(
    150 * 300,
    dtype=np.float
).reshape(150, 300)



y_data = 4.

z_data = x_data.copy() - 8000.

Record¶

An openPMD record can be either structured (mesh) or unstructured (particles). We prepared a vector field in 2D above, which is a mesh:

C++11¶

// record
auto B = i.meshes["B"];

// record components
auto B_x = B["x"];
auto B_y = B["y"];
auto B_z = B["z"];

auto dataset = api::Dataset(
    api::determineDatatype<float>(),
    {150, 300});
B_x.resetDataset(dataset);
B_y.resetDataset(dataset);
B_z.resetDataset(dataset);

Python¶

# record
B = i.meshes["B"]

# record components
B_x = B["x"]
B_y = B["y"]
B_z = B["z"]

dataset = api.Dataset(
    x_data.dtype,
    x_data.shape)
B_x.reset_dataset(dataset)
B_y.reset_dataset(dataset)
B_z.reset_dataset(dataset)

Units¶

Ouch, our measured magnetic field data is in Gauss! Quick, let’s store the conversion factor to SI (Tesla).

C++11¶

// conversion to SI
B_x.setUnitSI(1.e-4);
B_y.setUnitSI(1.e-4);
B_z.setUnitSI(1.e-4);

// unit system agnostic dimension
B.setUnitDimension({
    {api::UnitDimension::M,  1},
    {api::UnitDimension::I, -1},
    {api::UnitDimension::T, -2}
});

Python¶

# conversion to SI
B_x.set_unit_SI(1.e-4)
B_y.set_unit_SI(1.e-4)
B_z.set_unit_SI(1.e-4)

# unit system agnostic dimension
B.set_unit_dimension({
    api.Unit_Dimension.M:  1,
    api.Unit_Dimension.I: -1,
    api.Unit_Dimension.T: -2
})

Tip

Annotating the dimensionality of a record allows us to read data sets with arbitrary names and understand their purpose simply by dimensional analysis.

Register Chunk¶

We can write record components partially and in parallel or at once. Writing very small data one by one is is a performance killer for I/O. Therefore, we register all data to be written first and then flush it out collectively.

C++11¶

B_x.storeChunk(
    api::shareRaw(x_data),
    {0, 0}, {150, 300});
B_z.storeChunk(
    api::shareRaw(z_data),
    {0, 0}, {150, 300});

B_y.makeConstant(y_data);

Python¶

B_x.store_chunk(x_data)

B_z.store_chunk(z_data)

B_y.make_constant(y_data)

Attention

After registering a data chunk such as x_data and y_data, it MUST NOT be modified or deleted until the flush() step is performed!

Flush Chunk¶

We now flush the registered data chunks to the I/O backend. Flushing several chunks at once allows to increase I/O performance significantly. After that, the variables x_data and y_data can be used again.

C++11¶

series.flush();

Python¶

series.flush()

Close¶

Finally, the Series is fully closed (and newly registered data or attributes since the last .flush() is written) when its destructor is called.

C++11¶

// destruct series object,
// e.g. when out-of-scope

Python¶

del series