First Read
Step-by-step: how to read openPMD data?
We are using the examples files from openPMD-example-datasets (example-3d.tar.gz
).
Include / Import
After successful installation, you can start using openPMD-api as follows:
C++17
#include <openPMD/openPMD.hpp>
// example: data handling & print
#include <vector> // std::vector
#include <iostream> // std::cout
#include <memory> // std::shared_ptr
namespace io = openPMD;
Python
import openpmd_api as io
# example: data handling
import numpy as np
Open
Open an existing openPMD series in data<N>.h5
.
Further file formats than .h5
(HDF5) are supported:
.bp
(ADIOS2) or .json
(JSON).
C++17
auto series = io::Series(
"data_%T.h5",
io::Access::READ_ONLY);
Python
series = io.Series(
"data_%T.h5",
io.Access.read_only)
Tip
Replace the file ending .h5
with a wildcard .%E
to let openPMD autodetect the ending from the file system.
Use the wildcard %T
to match filename encoded iterations.
Tip
More detailed options can be passed via JSON or TOML as a further constructor parameter.
Try {"defer_iteration_parsing": true}
to speed up the first access.
(Remember to explicitly it.open()
iterations in that case.)
Iteration
Grouping by an arbitrary, positive integer number <N>
in a series.
Let’s take the iteration 100
:
C++17
auto i = series.iterations[100];
Python
i = series.iterations[100]
Attributes
openPMD defines a kernel of meta attributes and can always be extended with more. Let’s see what we’ve got:
C++17
std::cout << "openPMD version: "
<< series.openPMD() << "\n";
if( series.containsAttribute("author") )
std::cout << "Author: "
<< series.author() << "\n";
Python
print("openPMD version: ",
series.openPMD)
if series.contains_attribute("author"):
print("Author: ",
series.author)
Record
An openPMD record can be either structured (mesh) or unstructured (particles). Let’s read an electric field:
C++17
// record
auto E = i.meshes["E"];
// record components
auto E_x = E["x"];
Python
# record
E = i.meshes["E"]
# record components
E_x = E["x"]
Tip
You can check via i.meshes.contains("E")
(C++) or "E" in i.meshes
(Python) if an entry exists.
Units
Even without understanding the name “E” we can check the dimensionality of a record to understand its purpose.
C++17
// unit system agnostic dimension
auto E_unitDim = E.unitDimension();
// ...
// io::UnitDimension::M
// conversion to SI
double x_unit = E_x.unitSI();
Python
# unit system agnostic dimension
E_unitDim = E.unit_dimension
# ...
# io.Unit_Dimension.M
# conversion to SI
x_unit = E_x.unit_SI
Note
This example is not yet written :-)
In the future, units are automatically converted to a selected unit system (not yet implemented).
For now, please multiply your read data (x_data
) with x_unit
to covert to SI, otherwise the raw, potentially awkwardly scaled data is taken.
Register Chunk
We can load record components partially and in parallel or at once. Reading small data one by one is is a performance killer for I/O. Therefore, we register all data to be loaded first and then flush it in collectively.
C++17
// alternatively, pass pre-allocated
std::shared_ptr< double > x_data =
E_x.loadChunk< double >();
Python
# returns an allocated but
# invalid numpy array
x_data = E_x.load_chunk()
Attention
After registering a data chunk such as x_data
for loading, it MUST NOT be modified or deleted until the flush()
step is performed!
You must not yet access x_data
!
One can also request to load a slice of data:
C++17
Extent extent = E_x.getExtent();
extent.at(2) = 1;
std::shared_ptr< double > x_slice_data =
E_x.loadChunk< double >(
io::Offset{0, 0, 4}, extent);
Python
# we support slice syntax, too
x_slice_data = E_x[:, :, 4]
Don’t forget that we still need to flush()
.
Flush Chunk
We now flush the registered data chunks and fill them with actual data from the I/O backend.
Flushing several chunks at once allows to increase I/O performance significantly.
Only after that, the variables x_data
and x_slice_data
can be read, manipulated and/or deleted.
C++17
series.flush();
Python
series.flush()
Data
We can now work with the newly loaded data in x_data
(or x_slice_data
):
C++17
auto extent = E_x.getExtent();
std::cout << "First values in E_x "
"of shape: ";
for( auto const& dim : extent )
std::cout << dim << ", ";
std::cout << "\n";
for( size_t col = 0;
col < extent[1] && col < 5;
++col )
std::cout << x_data.get()[col]
<< ", ";
std::cout << "\n";
Python
extent = E_x.shape
print(
"First values in E_x "
"of shape: ",
extent)
print(x_data[0, 0, :5])
Close
Finally, the Series is closed when its destructor is called.
Make sure to have flush()
ed all data loads at this point, otherwise it will be called once more implicitly.
C++17
series.close()
Python
series.close()