DASK
The Python bindings of openPMD-api provide direct methods to load data into the parallel, DASK data analysis ecosystem.
How to Install
Among many package managers, PyPI ships the latest packages of DASK:
python3 -m pip install -U dask
python3 -m pip install -U pyarrow
How to Use
The central Python API calls to convert to DASK datatypes are the ParticleSpecies.to_dask
and Record_Component.to_dask_array
methods.
s = io.Series("samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"]
# the default schedulers are local/threaded. We can also use local
# "processes" or for multi-node "distributed", among others.
dask.config.set(scheduler='processes')
df = electrons.to_dask()
type(df) # ...
E = s.iterations[400].meshes["E"]
E_x = E["x"]
darr_x = E_x.to_dask_array()
type(darr_x) # ...
# note: no series.flush() needed
Example
A detailed example script for particle and field analysis is documented under as 11_particle_dataframe.py
in our examples.
See a video of openPMD on DASK in action in pull request #963 (part of openPMD-api v0.14.0 and later).