Inventory¶
wgrib2 has many options to list content of a GRIB2. Those are identified by
a tag inv
in the second column of wgrib2 help screen. The default short
listing is a sequence of elements separated by a colon:
wgrib2 hrrrAK.grib2
1:0:d=2020021000:VIS:surface:24 hour fcst:
2:754142:d=2020021000:LCDC:low cloud layer:24 hour fcst:
3:1212817:d=2020021000:MCDC:middle cloud layer:24 hour fcst:
4:1569231:d=2020021000:HCDC:high cloud layer:24 hour fcst:
5:1782227:d=2020021000:TCDC:entire atmosphere:24 hour fcst:
6:2164301:d=2020021000:HGT:cloud base:24 hour fcst:
7:3174056:d=2020021000:HGT:cloud ceiling:24 hour fcst:
8:4445291:d=2020021000:HGT:cloud top:24 hour fcst:
An extended listing is obtained by adding option -Match_inv
:
wgrib2 hrrAK.grib2 -Match_inv
1:0:D=20200210000000:VIS:surface:24 hour fcst::VIS:n=1:npts=1193781:var0_2_1_7_19_0:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=0 dec_scale=2 bin_scale=0 nbits=10:vt=2020021100:
2:754142:D=20200210000000:LCDC:low cloud layer:24 hour fcst::LCDC:n=2:npts=1193781:var0_2_1_7_6_3:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=0 dec_scale=0 bin_scale=-3 nbits=11:vt=2020021100:
3:1212817:D=20200210000000:MCDC:middle cloud layer:24 hour fcst::MCDC:n=3:npts=1193781:var0_2_1_7_6_4:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=0 dec_scale=0 bin_scale=-3 nbits=11:vt=2020021100:
4:1569231:D=20200210000000:HCDC:high cloud layer:24 hour fcst::HCDC:n=4:npts=1193781:var0_2_1_7_6_5:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=0 dec_scale=0 bin_scale=-3 nbits=12:vt=2020021100:
5:1782227:D=20200210000000:TCDC:entire atmosphere:24 hour fcst::TCDC:n=5:npts=1193781:var0_2_1_7_6_1:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=0 dec_scale=0 bin_scale=-3 nbits=10:vt=2020021100:
6:2164301:D=20200210000000:HGT:cloud base:24 hour fcst::HGT:n=6:npts=1193781:var0_2_1_7_3_5:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=16.8566 dec_scale=0 bin_scale=-3 nbits=18:vt=2020021100:
7:3174056:D=20200210000000:HGT:cloud ceiling:24 hour fcst::HGT:n=7:npts=1193781:var0_2_1_7_3_5:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=30.9375 dec_scale=0 bin_scale=-3 nbits=19:vt=2020021100:
8:4445291:D=20200210000000:HGT:cloud top:24 hour fcst::HGT:n=8:npts=1193781:var0_2_1_7_3_5:pdt=0:d=2020021000:start_FT=20200211000000:end_FT=20200211000000:scaling ref=31.1188 dec_scale=0 bin_scale=-3 nbits=19:vt=2020021100:
This type of listing is used to select messages, either implicitly, as shown in Overview, or explicitly, by saving it to an inventory file:
import pywgrib2_xr as pywgrib2
from pywgrib2_xr.utils import remotepath
in_file = remotepath('nam.t12z.awak3d18.tm00.grib2')
out_file = '/tmp/subset2.grib2'
inv_file = '/tmp/nam.t12z.awak3d18.tm00.inv'
pywgrib2.wgrib(in_file, '-inv', inv_file, '-Match_inv')
match_str = ':(TMP:2 m above ground|[U|V]GRD:10 m above ground):'
pywgrib2.wgrib(in_file, '-i_file', inv_file, '-inv', '/dev/null',
'-match', match_str, '-grib', out_file)
pywgrib2.free_files(in_file, out_file)
When the GRIB2 files are accessed frequently it makes sense to create permanent inventory files. Decoding of message metadata is done only once. Subsequent message selection is done by reading inventory file, which is much faster.
The above inventories can be useful with the low-level interface is used.
pywgrib2_xr uses option -pyinv
which is a new feature in wgrib2 v3.0.0:
wgrib2 nam.t12z.awak3d18.tm00.grib2 -d 1 -pyinv
1:0:PRMSL:mean sea level:18 hour fcst:pyinv={'discipline':0,'centre':'7 - US National Weather Service - NCEP (WMC)','subcentre':'0','mastertab':2,'localtab':1,'reftime':'2018-02-20T12:00:00','npts':235025,'nx':553,'ny':425,'gdtnum':20,'gdtmpl':[6,0,0,0,0,0,0,553,425,30000000,187000000,56,60000000,225000000,11250000,11250000,0,64],'long_name':'Pressure Reduced to MSL','units':'Pa','pdt':0,'parmcat':3,'parmnum':1,'start_ft':'2018-02-21T06:00:00','end_ft':'2018-02-21T06:00:00','bot_level_code':101,'bot_level_value':0,'top_level_code':255}
The inventory is a list of MetaData
, each element corresponding
to a line like the above. MetaData
contains also path to the GRIB2 file. This warrants
some careful consideration. When GRIB2 files are accessed via NFS, the mount points do
not have to be the same. Therefore the file
attribute has to be filled when the
inventory is read from a file.
Inventory is created by a call to make_inventory()
,
It can be saved to a file by save_inventory()
.
To retrieve previously saved inventory, use load_inventory()
.
Another function load_or_make_inventory()
combines
functionality of the three functions in one call. If an inventory exists, it is
retrieved from a file, otherwise created, and optionally saved.
inv = pywgrib2.make_inventory(in_file)
pywgrib2.save_inventory(inv, in_file)
Note that the second argument to save_inventory()
is path to the GRIB2 file.
The inventory file name is created by the function. In this example the inventory is
written to the same location as the GRIB2 file, the file name is that of GRIB2 file
with the suffix .binv
(Blosc-compressed inventory). If the GRIB2 file resides
on a read only medium, the inventory must be saved somewhere else.
save_inventory()
accepts argument directory
to specify location of
the output file.
To handle archives with a tree structure, when GRIB2 file names are not unique, inventory file name is a hash of the path to the GRIB2 file. To illustrate the concept, assume that the archive is grouped by date.
ls -l /archive/*
/archive/20200912:
total 38572
-rw-r--r-- 1 root root 39496058 Sep 12 22:25 nam.t00z.afwaca00.tm00.grib2
/archive/20200913:
total 39456
-rw-r--r-- 1 root root 40401894 Sep 13 22:29 nam.t00z.afwaca00.tm00.grib2
One can create private inventory in /tmp/nam
and access it as follows:
import glob
import pywgrib2_xr as pywgrib2
gribfiles = glob.glob('/archive/*/*.grib2')
for file in gribfiles:
pywgrib2.save_inventory(pywgrib2.make_inventory(file), file, directory='/tmp/nam')
# Retrieve saved inventory for 13 of September
inv = pywgrib2.load_inventory('/archive/20200913/nam.t00z.afwaca00.tm00.grib2',
directory='/tmp/nam')
str(inv).split('\n')[:3]
The inventory files are blosc-compressed pickles. Compression saves both space (~8 ratio) and time (~15% on write):
ls /tmp/nam
19764fbe34d658b7e1ec1126d31d2737.binv 388fbfdf6e3f866d1b1f182bc795aab4.binv
pywgrib2_xr provides script pywgrib2 that can list content of an inventory file.
Internally, inventory is saves a a class with the following attributes:
* file: str
* offset: str
* varname: str
* level_str: str
* time_str: str
* discipline: int
* centre: str
* subcentre: str
* mastertab: int
* localtab: int
* long_name: str
* units: str
* pdt: int
* parmcat: int
* parmnum: int
* bot_level_code: int
* bot_level_value: float
* top_level_code: int
* top_level_value: Optional[float]
* reftime: datetime
* start_ft: datetime
* end_ft: datetime
* npts: int
* nx: int
* ny: int
* gdtnum: int
* gdtmpl: List[int]
Note that reftime
, start_ft
and end_ft
are decoded to
datetime.datetime
. Also, bot_level_value
and top_level_value
are scaled.
The latter might be missing if top_level_code
is set to 255, i.e. for single
surface. As a shortcut, bot_level_code
and bot_level_value
can be accessed
as level_code
and level_value
.