Overview¶
pywgrib2_xr uses wgrib2 as a background engine. It provides following functionality:
A low-level interface to wgrib2 C-API. This is the one to use to write GRIB2 files.
A GRIB2 backend to xarray, similar to cfgrib. The decoded messages are assembled as xarray datasets.
Remapping the resulting datasets to a different grid or to an arbitrary set of locations. It uses Fortran library iplib, which is a part of wgrib2 distribution.
Low-level interface¶
Suppose the task is to write grib file containing surface temperature and wind from a large grib file. This can be achieved with a wgrib2 call:
wgrib2 nam.t12z.awak3d18.tm00.grib2 -match \
':(TMP:2 m above ground|[U|V]GRD:10 m above ground):' \
-out /tmp/subset.grib2
The equivalent Python code is:
import pywgrib2_xr as pywgrib2
in_file = 'nam.t12z.awak3d18.tm00.grib2'
out_file = '/tmp/subset.grib2'
match_str = ':(TMP:2 m above ground|[U|V]GRD:10 m above ground):'
pywgrib2.wgrib(in_file, '-inv', '/dev/null', '-match', match_str, '-grib', out_file)
pywgrib2.free_files(in_file, out_file)
We can check the output file contain requested data. The wgrib2 call:
wgrib2 /tmp/subset.grib2
translates to Python as:
with pywgrib2.MemoryBuffer() as buf:
pywgrib2.wgrib(out_file, '-inv', buf)
buf.get('s')
The class MemoryBuffer
used in this example is a wrapper
around wgrib2 C-API functions
int wgrib2_get_mem_buffer(unsigned char *my_buffer, size_t size, int n)
and
int wgrib2_set_mem_buffer(const unsigned char *my_buffer, size_t size, int n)
to
transfer data between the calling program and the API “superfunction”
int wgrib2(int argc, const char **argv)
. The argument 's'
means buf.get()
should return a string.
Section Writing GRIB2 files shows a an example of storing calculated weather element in a GRIB2 file.
Xarray backend¶
Conversion of GRIB2 messages to an xarray dataset is a task that can not have a fully satisfactory solution. Each GRIB2 message is a self-contained unit, essentially a 2-dimensional data, plus attributes, while an xarray dataset is a collection of multidimensional arrays that may share some dimensions.
The first difficulty are element, or variable names. Model output usually contains
many geopotential height variables: pressure or sigma levels, cloud base/top,
tropopause, etc. All have the same name, HGT
(NCEP mnemonic), and can only be
distinguishad by attributes. A dataset can have only one variable named HGT
.
cfgrib solves the issue by forcing different heights to be in separate datasets.
This leads to proliferation of number of datasets for each model output.
pywgrib2_xr uses attributes to define variable name. Thus variables
HGT.isobaric
and HGT.cloud_ceiling
can both coexist in the same dataset.
The next issue is dimension handling. Different variables can have different time and vertical dimensions. Temperature is provided for all forecast times, accumulated precipitation is not available at initial (analysis) time. Wind and vorticity do not have the same number of vertical levels. cfgrib needs to put those elements in separate datasets.
The third issue stems from the way GRIB2 messages are distributed.
MSC Datamart makes one file per message while
NCEP
files group messages by reference and forecast times. Each file can contain hundreds
of messages. xarray has a concept of a datastore, which essentially
means a single file storing the data. A dataset is build from datastore.
Those datasets can then be combined, there are two combining
functions: xarray.combine_by_coords()
and xarray.combine_nested()
.
The latter, the most flexible, requires user to partition input files into
nested lists. This is not simple and may not be even possible when some
of the files are missing. Consider the following example:
import xarray as xr
from pywgrib2_xr.utils import localpath
f_10_0_00 = localpath('CMC_glb_TMP_ISBL_1000_ps30km_2020012500_P000.grib2')
f_10_3_00 = localpath('CMC_glb_TMP_ISBL_1000_ps30km_2020012500_P003.grib2')
f_7_0_00 = localpath('CMC_glb_TMP_ISBL_700_ps30km_2020012500_P000.grib2')
f_7_3_00 = localpath('CMC_glb_TMP_ISBL_700_ps30km_2020012500_P003.grib2')
ds = xr.open_mfdataset([[f_10_0_00, f_10_3_00], [f_7_0_00, f_7_3_00]],
engine='cfgrib', combine='nested',
concat_dim=['level', 'step'])
ds
<xarray.Dataset>
Dimensions: (level: 2, step: 2, x: 247, y: 200)
Coordinates:
time datetime64[ns] 2020-01-25
* step (step) timedelta64[ns] 00:00:00 03:00:00
isobaricInhPa (level) int64 1000 700
latitude (y, x) float64 dask.array<chunksize=(200, 247), meta=np.ndarray>
longitude (y, x) float64 dask.array<chunksize=(200, 247), meta=np.ndarray>
valid_time (step) datetime64[ns] 2020-01-25 2020-01-25T03:00:00
Dimensions without coordinates: level, x, y
Data variables:
t (step, level, y, x) float32 dask.array<chunksize=(1, 1, 200, 247), meta=np.ndarray>
Attributes:
GRIB_edition: 2
GRIB_centre: cwao
GRIB_centreDescription: Canadian Meteorological Service - Montreal
GRIB_subCentre: 0
Conventions: CF-1.7
institution: Canadian Meteorological Service - Montreal
history: 2020-10-29T19:45:43 GRIB to CDM+CF via cfgrib-0....
This works as expected. But if a file is missing, the above code will fail:
ds = xr.open_mfdataset([[f_10_0_00, f_10_3_00], [f_7_0_00]], engine='cfgrib',
combine='nested', concat_dim=['level', 'step'])
The supplied objects do not form a hypercube because sub-lists do not have consistent lengths along dimension0
pywgrib2_xr attempts to solve this problem by the concept of a template,
borrowed from AWIPS1.
The template defines logical dataset structure. The logical dataset contains messages
sharing horizontal grid and reference time
, coded in
Section 1
of a GRIB2 message. Vertical and time coordinates may differ. Conceptually this means
that the logical dataset corresponds to one model run.
To build a template, one has to have a coplete set of files for one model run, In most cases an archive will contain such a set. Once the template is built, the logical datasets can be concatenated over dimension reference time. Missing files will result in respective chunks set to NaNs.
What this means, however, is that pywgrib2_xr cannot be simply implemented as
another backend for xarray. It does attempt to have the same interface,
the function pywgrib2_xr.open_dataset()
is ported from the backends
module of xarray.
There is no need for open_mfdataset()
, the logic of combining input files
is included in open_dataset()
:
from datetime import timedelta
import pywgrib2_xr as pywgrib2
from pywgrib2_xr.utils import localpath
f_10_0_00 = localpath('CMC_glb_TMP_ISBL_1000_ps30km_2020012500_P000.grib2')
f_10_3_00 = localpath('CMC_glb_TMP_ISBL_1000_ps30km_2020012500_P003.grib2')
f_7_0_00 = localpath('CMC_glb_TMP_ISBL_700_ps30km_2020012500_P000.grib2')
f_7_3_00 = localpath('CMC_glb_TMP_ISBL_700_ps30km_2020012500_P003.grib2')
f_10_0_12 = localpath('CMC_glb_TMP_ISBL_1000_ps30km_2020012512_P000.grib2')
f_10_3_12 = localpath('CMC_glb_TMP_ISBL_1000_ps30km_2020012512_P003.grib2')
f_7_0_12 = localpath('CMC_glb_TMP_ISBL_700_ps30km_2020012512_P000.grib2')
f_7_3_12 = localpath('CMC_glb_TMP_ISBL_700_ps30km_2020012512_P003.grib2')
f_12Zrun = [f_10_0_12, f_10_3_12, f_7_0_12, f_7_3_12] # complete set for template
f_all = [f_10_0_00, f_10_3_00, f_7_0_00] + f_12Zrun # simulate missing file
tmpl = pywgrib2.make_template(f_12Zrun, vertlevels='isobaric')
ds = pywgrib2.open_dataset(f_all, tmpl)
ds
<xarray.Dataset>
Dimensions: (isobaric1: 2, reftime: 2, time1: 2, x: 247, y: 200)
Coordinates:
longitude (y, x) float64 dask.array<chunksize=(200, 247), meta=np.ndarray>
latitude (y, x) float64 dask.array<chunksize=(200, 247), meta=np.ndarray>
* x (x) float64 -2.61e+06 -2.58e+06 ... 4.74e+06 4.77e+06
* y (y) float64 -5.97e+06 -5.94e+06 ... -3.022e+04 -216.1
* isobaric1 (isobaric1) int64 100000 70000
* time1 (time1) timedelta64[ns] 00:00:00 03:00:00
* reftime (reftime) datetime64[ns] 2020-01-25 2020-01-25T12:00:00
polar_stereographic int64 ...
Data variables:
TMP.isobaric (reftime, time1, isobaric1, y, x) float32 dask.array<chunksize=(1, 2, 2, 200, 247), meta=np.ndarray>
Attributes:
Projection: polar_stereographic
Originating centre: 54 - Canadian Meteorological Service - Montreal (...
Originating subcentre: 0
History: Created by pywgrib2_xr-0.2.1
ds['TMP.isobaric'].sel({'reftime': '2020-01-25T00:00:00',
'time1': timedelta(hours=3),
'isobaric1': 70000}).values
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
The decoder handles the following grids:
Equidistant Cylindrical, also known as latitude-longitude
Rotated latitude-longitude
Mercator
Polar Stereographic
Lambert Conformal Conic
Gaussian
Space View
Creation af a dataset from GRIB2 files is a three stage process:
Create inventory for each input file.
Create template.
Read all input files.
The first step was done implicitly in the above example. When the same GRIB2 files are processed multiple times, it makes sense (to save time) to save each inventory to a disk file. The whole process is described in the User Guide.
Remapping¶
pywgrib2_xr comes with interpolation library iplib which allows to remap dataset
to different grid, or to a set of arbitrary points.
Remapping are implemented as methods of xarray data accessor
Wgrib2DatasetAccessor
, registered as an attribute wgrib2
.
The next example shows how to remap dataset to a set of locations.
lons = [-77.03, -150.02, -78.62]
lats = [38.85, 61.17, 43.57]
ids = ['KDCA', 'PANC', 'CYYZ']
sites = pywgrib2.Point(lons, lats, ('airport', ids, {}))
ds2 = ds.wgrib2.location(sites)
tmp = ds2['TMP.isobaric'].compute()
tmp.sel(airport='CYYZ')
<xarray.DataArray 'TMP.isobaric' (reftime: 2, time1: 2, isobaric1: 2)>
array([[[279.4096 , 268.39636],
[280.62656, nan]],
[[275.98114, 266.69205],
[275.85538, 264.9516 ]]], dtype=float32)
Coordinates:
points int64 0
longitude float64 -78.62
latitude float64 43.57
airport <U4 'CYYZ'
* reftime (reftime) datetime64[ns] 2020-01-25 2020-01-25T12:00:00
* time1 (time1) timedelta64[ns] 00:00:00 03:00:00
* isobaric1 (isobaric1) int64 100000 70000
Attributes:
short_name: TMP
long_name: Temperature
units: K
grid_mapping: points
iplib supports all grids handled by the decoder with the exception of Space View.
CF support¶
pywgrib2_xr does not follow CF conventions at this time. Standard names are set
only for coordinate variables, not data. Also, composite units are as provided by
wgrib2 code. For example, speed units are m/s
, not m s-1
, as mandated
here.
import cf_xarray
ds.cf.describe()
Axes:
X: ['x']
Y: ['y']
Z: ['isobaric1']
T: []
Coordinates:
longitude: ['longitude']
latitude: ['latitude']
vertical: ['isobaric1']
time: []
Cell Measures:
area: unsupported
volume: unsupported
Standard Names:
forecast_period: ['time1']
projection_x_coordinate: ['x']
projection_y_coordinate: ['y']
reference_time: ['reftime']