Pandas
The Python bindings of openPMD-api provide direct methods to load data into the Pandas data analysis ecosystem.
Pandas computes on the CPU, for GPU-accelerated data analysis see RAPIDS.
How to Install
Among many package managers, PyPI ships the latest packages of pandas:
python3 -m pip install -U pandas
Dataframes
The central Python API call to convert to openPMD particles to a Pandas dataframe is the ParticleSpecies.to_df
method.
import openpmd_api as io
s = io.Series("samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"]
df = electrons.to_df()
type(df) # pd.DataFrame
print(df)
# note: no series.flush() needed
One can also combine all iterations in a single dataframe like this:
df = s.to_df("electrons")
# like before but with a new column "iteration" and all particles
print(df)
openPMD to ASCII
Once converted to a Pandas dataframe, export of openPMD data to text is very simple. We generally do not recommend this because ASCII processing is slower, uses significantly more space on disk and has less precision than the binary data usually stored in openPMD data series. Nonetheless, in some cases and especially for small, human-readable data sets this can be helpful.
The central Pandas call for this is DataFrame.to_csv.
# creates a electrons.csv file
df.to_csv("electrons.csv", sep=",", header=True)
openPMD as SQL Database
Once converted to a Pandas dataframe, one can query and process openPMD data also with SQL syntax as provided by many databases.
A project that provides such syntax is for instance pandasql.
python3 -m pip install -U pandasql
or one can export into an SQL database.
Example
A detailed example script for particle and field analysis is documented under as 11_particle_dataframe.py
in our examples.