Links

TempestExtremes @ GitHub

Overview

TempestExtremes is a growing collection of detection and characterization algorithms for large climate datasets, leveraging C++ for rapid throughput and a command line interface that maximizes flexibility of each kernel. The tracking kernels in this package have been already used for tracking and characterizing tropical cyclones (TCs), extratropical cyclones (ETCs), monsoonal depressions, atmospheric blocks, atmospheric rivers, and mesoscale convective systems (MCSs). By considering multiple extremes within the same framework, we can study the joint characteristics of extremes while minimizing the total data burden.


User Guide

For questions related to the usage of TempestExtremes (including feature requests) please contact Paul Ullrich (paullrich@ucdavis.edu).

Last updated September 16, 2020.

Variable Expressions
DetectNodes *
StitchNodes
NodeFileEditor
NodeFileFilter
NodeFileCompose
DetectBlobs *
StitchBlobs
BlobStats
GenerateConnectivityFile
Climatology *
FourierFilter

* Executables marked with an asterisk can be compiled and run in parallel using MPI.


Variable Expressions

In many of the TempestExtremes executables, instances where variables are specified (typically through --var or the var argument of subcommands) functional expressions or dimension indices can be used in place of actual variable names. These instances are referred to as variable expressions, and are indicated as <variable> in the executable's argument type specification.

Variable expressions refer to NetCDF spatial variables, namely variables whose last dimension (for unstructured data) or last two dimensions (for structured data). The record dimension (typically time), if used for this particular variable, must be the first dimension of the variable. Unstructured grid support is enabled through specifiation of the --in_connect argument, which identifies the connectivity between grid cells. If --in_connect is not specified, a structured grid is assumed using latitude and longitude coordinates found in the input data. Examples of unstructured spatial data variables are Z850(ncol), Z850(time,ncol), Z850(level,ncol), surf_type(time,flavor,ncol). Examples of structured spatial data variables are Z850(lon,lat), Z850(time,lon,lat), Z(level,lon,lat), surf_type(time,flavor,lon,lat).

Indexed Variables: If a variable has auxiliary dimensions (dimensions that are not the record or horizontal spatial dimensions), the specific index to use in a given calculation must be specified explicitly via a parenthetical 0-based index. For example, if the flavor dimension in variable surf_type(time,flavor,lon,lat) has 16 possible values, then the variable expression surf_type(0) refers to the hyperslab surf_type(time,0,lon,lat). For multiple auxiliary dimensions, the unstructured variable UCD(flavor,level,ncol) accessed via UCD(2,3) refers to the 1D hyperslab UCD(2,3,ncol).

Functional Operations: Variable expressions can also include functional operations which are calculated internally during processing. Functional operations are prefixed with an underscore and evaluated pointwise. Currently supported operations are as follows.

_VECMAG(var1,var2)

The vector magnitude of two fields, equal to sqrt(var1^2 + var2^2).

_ABS(var1)

The absolute value of the variable var1.

_SIGN(var1)

The sign function of var1, equal to 1 if the value is positive, -1 if the value is negative, and 0 if the value is zero.

_ALLPOS(var1,var2,...)

Equal to 1 if all variables in parentheses are greater than zero (0 otherwise).

_SUM(var1,var2,...)

The sum of all variable fields in parentheses. Variables may also be substituted for floating point values (e.g. _SUM(PSL,10000.0)).

_AVG(var1,var2,...)

The average of all variable fields in parentheses. Variables may also be substituted for floating point values (e.g. _AVG(PSL,1.0)).

_DIFF(var1,var2)

The difference var1 - var2. Variables may also be substituted for floating point values (e.g. _DIFF(PSL,100000.0)).

_PROD(var1,var2,...)

The product of all variable fields in parentheses. Variables may also be substituted for floating point values (e.g. _PROD(PSL,2.0)).

_DIV(var1,var2)

The quotient var1 / var2. Division by zero is not checked. Variables may also be substituted for floating point values (e.g. _DIV(PSL,2.0).

_MIN(var1,var2,...)

The minimum of all variable fields in parentheses. Variables may also be substituted for floating point values (e.g. _PROD(PSL,2.0)).

_MAX(var1,var2,...)

The maximum of all variable fields in parentheses. Variables may also be substituted for floating point values (e.g. _PROD(PSL,2.0)).

_SQRT(var1)

The square root of the variable var1.

_COND(criteria,iftrue,iffalse)

Conditional operator. If criteria is greater than zero then return iftrue. If criteria is less than or equal to zero then return iffalse.

_LAT()

The latitude at each point (in degrees).

_F()

The Coriolis parameter at each point, equal to 2 x Omega x sin(latitude), where Omega = 7.2921e-5.

_LAPLACIAN{npts,dist}(var)

The discrete Laplacian of the field var, evaluated using npts equiangular points at a distance of dist degrees great-circle distance. For example, _LAPLACIAN{16,5.0}(IVT) calculates the approximate Laplacian using 16 radial points at a distance of 5.0 degrees great-circle distance from each grid point.

_CURL{npts,dist}(vec_east,vec_north)

The discrete curl of the vector field (vec_east,vec_north), evaluated using npts equiangular points at a distance of dist degrees great-circle distance.

_DIVERGENCE{npts,dist}(vec_east,vec_north)

The discrete divergence of the vector field (vec_east,vec_north), evaluated using npts equiangular points at a distance of dist degrees great-circle distance.

_GRADMAG{npts,dist}(var)

The discrete magnitude of the gradient of the field var, evaluated using npts equiangular points at a distance of dist degrees great-circle distance.

_VECDOTGRAD{npts,dist}(vec_east,vec_north,gradvar)

The discrete vector dot gradient, i.e., (vec_east,vec_north) dot grad (gradvar), evaluated using npts equiangular points at a distance of dist degrees great-circle distance.

_MEAN{dist}(var)

The discrete mean of all points within distance dist of each point. Warning: this can be slow if used improperly.


DetectNodes [MPI]

Overview

DetectNodes is used for the detection of nodal features. This executable is analogous to the "map" step in the "MapReduce" framework. Candidate points are selected based on information at a single time slice. Typically DetectNodes is followed by StitchNodes to connect candidate points in time.

Arguments

  --in_data <string> ""

A list of input data files in NetCDF format, separated by semicolons.

  --in_data_list <string> ""

A text file containing the --in_data argument for a sequence of processing operations (one per line). When parallelization is enabled, the list of data files are distributed as equally as possible among available processors.

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --diag_connect <bool>

When the data is on a structured grid, consider grid cells to be connected in the diagonal (across the vertex).

  --out <string> ""

The output nodefile to write from the detection procedure. Used when --in_data is specified.

  --out_file_list <string> ""

A text file containing an equal number of lines to --in_data_list specifying the output nodefiles from each input datafile.

  --searchbymin <variable> "" (default PSL)

The input variable to use for initially selecting candidate points (defined as local minima). At least one (and at most one) of --searchbymin or --searchbymax must be specified.

  --searchbymax <variable> ""

The input variable to use for initially selecting candidate points (defined as local maxima). At least one (and at most one) of --searchbymin or --searchbymax must be specified.

  --minlon <double> [0.0]

The minimum longitude for candidate points.

  --maxlon <double> [0.0]

The maximum longitude for candidate points. As longitude is a periodic dimension, when --regional is not specified --minlon may be larger than --maxlon. If --maxlon and --minlon are equal then these arguments are ignored.

  --minlat <double> [0.0]

The minimum latitude for candidate points.

  --maxlat <double> [0.0]

The maximum latitude for candidate points. If --maxlat and --minlat are equal then these arguments are ignored.

  --minabslat <double> [0.0]

The minimum absolute value of the latitude for candidate points. This argument has no effect if set to zero.

  --mergedist <double> [0.0] (degrees)

DetectNodes merges candidate points with a distance (in degrees great-circle-distance) shorter than the specified value. Among two candidates within the merge distance, only the candidate with the lowest value of the --searchbymin field or highest value of the --searchbymax field are retained.

  --closedcontourcmd <string> "<cmd1>;<cmd2>;..."

Eliminate candidates if they do not have a closed contour. The closed contour is determined by breadth first search: if any paths exist from the candidate point (or nearby minima/maxima if minmaxdist is specified) that reach the specified distance before achieving the specified delta then we say no closed contour is present. Closed contour commands are separated by a semicolon. Each closed contour command takes the form "var,delta,dist,minmaxdist. These arguments are as follows.

var <variable> is the name of the variable used for the contour search.

dist <double> is the great-circle distance (in degrees) from the pivot within which the closedcontour criteria must be satisfied.

delta <double> is the amount by which the field must change from the pivot value. If positive (negative) the field must increase (decrease) by this value along the contour.

minmaxdist <double> is the great-circle distance away from the candidate to search for the minima/maxima. If delta is positive (negative), the pivot is a local minimum (maximum).

  --noclosedcontourcmd <string> "<cmd1>;<cmd2>;..."

As --closedcontourcmd, except it eliminates candidates if a closed contour is present.

  --thresholdcmd <string> "<cmd1>;<cmd2>;..."

Eliminate candidates that do not satisfy a threshold criteria (there must exist a point within a given distance of the candidate that satisfies a given equality or inequality). Search is performed by breadth-first search over the grid. Threshold commands are separated by a semicolon. Each threshold command takes the form "var,op,value,dist". These arguments are as follows.

var <variable> is the name of the variable used for the thresholding.

op <string> is the operator that must be satisfied for threshold (options include >,>=,<,<=,=,!=).

value <double> is the value on the right-hand-side of the comparison.

dist <double> is the great-circle distance away from the candidate to search for a point that satisfies the threshold.

  --outputcmd <string> "<cmd1>;<cmd2>;..."

Include additional columns in the output file. Each output command takes the form "var,op,dist". These arguments are as follows.

var <variable> is the name of the variable used for output.

op <string> is the operator that is applied over all points within the specified distance of the candidate (options include max, min, avg, maxdist, mindist).

dist <double> is the great-circle distance away from the candidate wherein the operator is applied.

  --timestride <integer> [1]

[Deprecated] Only examine discrete times at the given stride. Consider --timefilter instead.

  --timefilter <string> ""

A regular expression used to match only those time values to be retained. Several default values are available as well as follows.

3hr filter every 3 hourly (equivalent to "....-..-.. (00|03|06|09|12|15|18|21):00:00").

6hr filter every 6 hourly (equivalent to "....-..-.. (00|06|12|18):00:00").

daily filter daily (equivalent to "....-..-.. 00:00:00").

  --regional <bool>

Used to indicate that a given latitude-longitude grid should not be periodic in the longitudinal direction.

  --out_header <bool>

If present, output a header at the beginning of the output file indicating the columns of the file.

  --verbosity <integer> [0]

Set the verbosity level of execution.

Examples

  DetectNodes --in_data "$uvfile;$tpfile;$hfile" --out $outf--searchbymin PRMSL_L101 --mergedist 2.0
      --closedcontourcmd "PRMSL_L101,200.,4,0;TMP_L100,-0.4,8.0,1.1"
      --outputcmd "PRMSL_L101,max,0;_VECMAG(U_GRD_L100,V_GRD_L100),max,4;HGT_L1,max,0"

An example configuration for detecting tropical cyclones in CFSR data.


StitchNodes

Overview

StitchNodes is used to connect nodal features together in time, producing paths associated with singular features. Additional filtering of the output of DetectNodes can be applied based on the temporal features of these paths.

Arguments

  --in <string> ""

The input nodefile (typically output from from DetectNodes).

  --out <string> ""

The output file containing the filtered list of candidates in plain text format.

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --in_fmt <string> ""

A comma-separated list of names of the auxiliary columns within the input nodefile (namely, the list must not include the indexing columns or time columns).

  --format <string> "" [deprecated]

A comma-separated list describing the auxiliary columns of the input file that includes the indexing columns (typically either "i,.." or "i,j,..") but does not include the time columns.

  --range <double> [0.0]

The maximum distance between candidates along a path (in great-circle degrees).

  --mintime <string> "1"

The minimum length of a path either in terms of number of discrete times or as a duration, e.g. "24h". Note that the duration of a path is computed as the difference between the final time and initial time, so a "24h" duration correspond to 5 time steps in 6-hourly data (i.e. 0h,6,12,18,24UTC).

  --min_endpoint_dist <double> [0.0]

The minimum great-circle distance between the first candidate on a path and the last candidate (in degrees).

  --min_path_dist <double> [0.0]

The minimum accumulated great-circle distance between nodes in a path (in degrees).

  --maxgap <integer> [0]

The number of allowed missing points between spatially proximal candidate nodes while still considering them part of the same path.

  --thresholdcmd <string> "<cmd1>;<cmd2>;..."

Filter paths based on the number of times where a particular threshold is satisfied. Threshold commands are separated by a semicolon. Each threshold command takes the form "col,op,value,count". These arguments are as follows.

col <variable> is the name of the column to use for thresholding, as specified in --format.

op <string> is the operator that must be satisfied for threshold (options include >,>=,<,<=,=,!=,|>=,|<=).

value <double> is the value on the right-hand-side of the comparison.

count <string> is either the minimum number of time slices where the threshold must be satisfied or the instruction "all", "first", or "last". Here "all" is used to indicate the threshold must be satisfied at all points along the path, "first" is used to indicate the threshold must be satisfied only at the first point along the path, and "last" is used to indicate the threshold must be satisfied only at the last point along the path.

  --timestride <integer> [1]

The frequency of times that should be considered. (e.g. if set to 1, every time step is stitched; if set to 2, every other time step is stitched; etc.)

  --out_file_format <string> "gfdl" (gfdl|csv|csvnohead)

The format of the output nodefile. Options are gfdl (standard GFDL format), csv (comma-separated values with header), and csvnohead (comma-separated values with no header).


NodeFileEditor

Overview

NodeFileEditor enables the modification of nodefiles through the calculation of auxiliary quantities or filtering of existing quantities.

Arguments

  --in_nodefile <string> ""

The filename of the input nodefile (typically output from DetectNodes or StitchNodes).

  --in_nodefile_type <string> ["SN"] [DN|SN]

The type of nodefile indicated by --in_nodefile: "DN" if from DetectNodes or "SN" if from StitchNodes.

  --in_data <string> ""

The input data file (if needed for supplemental calculations).

  --in_data_list <string> ""

A list of input data files (if needed for supplemental calculations).

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --diag_connect <bool>

When the data is on a structured grid, consider grid cells to be connected in the diagonal (across the vertex).

  --regional <bool>

Used to indicate that a given latitude-longitude grid should not be periodic in the longitudinal direction.

  --in_fmt <string> ""

A comma-separated list of names of the columns within the input nodefile.

  --out_fmt <string> ""

A comma-separated list of names of the columns within the output nodefile.

  --out_nodefile <string> ""

The filename of the output nodefile.

  --out_nodefile_format <string> [gfdl|csv|csvnohead]

The format of the output nodefile. Options are gfdl (standard GFDL format), csv (comma-separated values with header), and csvnohead (comma-separated values with no header).

  --timefilter <string> ""

A regular expression used to match only those time values to be retained. Several default values are available as well as follows.

3hr filter every 3 hourly (equivalent to "....-..-.. (00|03|06|09|12|15|18|21):00:00").

6hr filter every 6 hourly (equivalent to "....-..-.. (00|06|12|18):00:00").

daily filter daily (equivalent to "....-..-.. 00:00:00").

  --colfilter <string> "<cmd1>;<cmd2>;..."

Filter lines in the nodefile based on a prescribed threshold. Each output command takes the form "col,op,value". These arguments are as follows.

col <string> is the name of the column to be used, as specified in --in_fmt.

op <string> is the operator that is applied over all points within the specified distance of the candidate (options include >,>=,<,<=,=,!=).

value <double> is the value on the right-hand-side of the comparison.

  --calculate <string> "<cmd1>;<cmd2>;..."

Perform functional operations on rows of the nodefile. Commands are specified in the form output=function(arguments), separated by semi-colons and evaluated from left to right. The output refers to a new column value associated with each node. Available commands are as follows.

eval_ace(<u variable name>, <v variable name>, <radius>) Evaluate the (instantaneous) cyclone energy in terms of instantaneous zonal velocity (u), meridional velocity (v), and a prescribed radius. The u and v variable refer to variable names from the input data file(s).

eval_acepsl(<psl variable name>, <radius>) Evaluate the (instantaneous) cyclone energy in terms of instantaneous sea level pressure, and a prescribed radius. The u and v variable refer to variable names from the input data file(s). The radius is specified in great-circle degrees and may be a fixed value or refer to a column header.

eval_ike(<u variable name>, <v variable name>, <radius>) Evaluate the (instantaneous) kinetic energy in terms of instantaneous zonal velocity (u), meridional velocity (v), and a prescribed radius. The u and v variable refer to variable names from the input data file(s). The radius is specified in great-circle degrees and may be a fixed value or refer to a column header.

eval_pdi(<u variable name>, <v variable name>, <radius>) Evaluate the (instantaneous) potential dissipation index in terms of instantaneous zonal velocity (u), meridional velocity (v), and a prescribed radius. The u and v variable refer to variable names from the input data file(s). The radius is specified in great-circle degrees and may be a fixed value or refer to a column header.

radial_profile(<variable name>, <bin count>, <bin width>) Calculate a radial profile of the specified variable at each time slice around the nodal feature point. Grid point values are binned into bin count bins where each bin has a width of bin width great-cirlce degrees.

radial_wind_profile(<u variable name>, <v variable name>, <bin count>, <bin width>) Calculate a radial profile of the azimuthal wind profile at each time slice around the nodal feature point. Grid point values are binned into bin count bins where each bin has a width of bin width great-cirlce degrees.

lastwhere(<profile column name>, <op>, <value>) Calculate the last element of an array (typically calculated with radial_profile or radial_wind_profile) that satisfies a given threshold. Options for op include >,>=,<,<=,=,!=).

value(<profile column name>, <index>) Calculate the value of a profile array at the specified distance using linear interpolation.

max_closed_contour_delta(<variable name>, <radius>) Calculate the largest value that satisfies the closed contour criteria around the nodal feature with specified radius.

region_name(<filename>) Determine the region of a given nodal feature in terms of its longitude-latitude coordinates. The filename refers to a region file containing polygons in latitude-longitude space.

  --lonname <string> ""

Name of the longitude variable in the data files.

  --latname <string> ""

Name of the latitude variable in the data files.


NodeFileFilter

Overview

NodeFileFilter is used for filtering spatial data using nodefile information. For each input datafile and timeslice, the nodefile is used to identify nodal features present at that time. Filtering is then performed via one or more of --bydist, --bycontour, or --nearbyblobs.

Arguments

  --in_nodefile <string> ""

The filename of the input nodefile (typically output from DetectNodes or StitchNodes).

  --in_nodefile_type <string> ["SN"] [DN|SN]

The type of nodefile indicated by --in_nodefile: "DN" if from DetectNodes or "SN" if from StitchNodes.

  --in_fmt <string> ""

A comma-separated list of names of the columns within the input nodefile.

  --in_data <string> ""

The input data file (if needed for supplemental calculations).

  --in_data_list <string> ""

A text file containing a list of input data files (if needed for supplemental calculations).

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --diag_connect <bool>

When the data is on a structured grid, consider grid cells to be connected in the diagonal (across the vertex).

  --regional <bool>

Used to indicate that a given latitude-longitude grid should not be periodic in the longitudinal direction.

  --out_data <string> ""

The output data file, if --in_data is used.

  --out_data_list <string> ""

A list of output data files, of the corresponding output for each file of --in_data_list.

  --var <variable> ""

A comma-separated list of variables to filter and write to the output.

  --maskvar <string> ""

The name of the variable to write to the output file containing the binary mask.

  --preserve <string> "var1;var2;..."

A comma-separated list of variables to copy from the input data to the output data.

  --fillvalue <string> [<value>|nan|att]

Add the specified _FillValue attribute to all output variables. If set to att then the value of this attribute will be inherited from the input data, if present.

  --bydist <double> [0.0]

Unmask regions within the specified great-circle distance of each node.

  --bycontour <string> [var,delta,dist,minmaxdist]

Unmask regions within a closed contour based around the nodal feature point (or associated minima/maxima if minmaxdist is non-zero). The filter criteria are analogous to --closedcontour from DetectNodes and based on the following four arguments.

var <variable> is the variable to use for determining the presence of the closed contour.

delta <double> is the depth (if positive) or height (if negative) of the closed contour.

dist <double> is the maximum distance to use for searching out the closed contour.

minmaxdist <double> is the great-circle distance away from the candidate to search for the minima/maxima. If delta is positive (negative), the pivot is a local minimum (maximum).

  --nearbyblobs <string> [var,dist,op,value[,maxdist]]

Unmask regions where a particular field satisfies a given threshold, as long as that threshold is satisfied within a given dist of the nodal feature. For each point found, a breadth-first search is applied to find all points connected to that point that also satisfy the threshold. The search is only performed within maxdist (if specified). The arguments are as follows.

var <variable> is the variable to use for the thresholding operation.

dist <double> is the distance where the algorithm searches for seed points that satisfy the threshold.

op <string> is the operator used to determine if the threshold is satisfied (options include >,>=,<,<=,=,!=).

value <double> is the value on the right-hand-side of the comparison.

maxdist <double> is the maximum distance to search for points connected to the seed points.

  --invert <bool>

Invert the mask generated by --bydist, --bycontour, or --nearbyblobs.


NodeFileCompose

Overview

NodeFileCompose is used for taking snapshots of fields or compositing fields either around a nodal feature or over a particular geographic region.

Arguments

  --in_nodefile <string> ""

The filename of the input nodefile (typically output from DetectNodes or StitchNodes).

  --in_nodefile_type <string> ["SN"] [DN|SN]

The type of nodefile indicated by --in_nodefile: "DN" if from DetectNodes or "SN" if from StitchNodes.

  --in_fmt <string> ""

A comma-separated list of names of the columns within the input nodefile.

  --in_data <string> ""

The input data file.

  --in_data_list <string> ""

A text file containing a list of input data files.

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --diag_connect <bool>

When the data is on a structured grid, consider grid cells to be connected in the diagonal (across the vertex).

  --regional <bool>

Used to indicate that a given latitude-longitude grid should not be periodic in the longitudinal direction.

  --out_grid <string> [XY|RAD|RLL]

The type of grid to use as output. Options are as follows.

XY: Cartesian stereographic projection with eastwards X coordinate vector and northwards Y coordinate vector. Grid spacing is equidistant.

RAD: Radial stereographic projection with azimuthal coordinate vector and radial coordinate vector. Azimuthal grid spacing is equiangular; radial grid spacing is equidistant.

RLL: Regular longitude-latitude projection, only available for fixed position grids (where --fixlon and --fixlat are specified).

  --out_data <string> ""

The output data file.

  --var <variable> ""

A comma-separated list of variables that will be processed by this command.

  --varout <string> ""

An optional comma-separated list of variables of the same length as --var that replaces these variables in the output data. This argument is useful if the input variables are the product of functional operations and using function names as NetCDF variables is undesired. By default this argument is identical to --var.

  --snapshots

Output a concatenated array to --out_data consisting of snapshots of the regridded data at each time slice. The output will be written to the output data file with variable name "<varout>_snapshots" where <op> is the specfiied operation.

  --op <string> [mean|min|max,...]

A comma-separated list of reduction operations to perform on the data. The output will be written to the output data file with variable name "<varout>_<op>" where <op> is the specfiied operation.

mean: Calculate the grid point mean over all composited snapshots.

min: Calculate the grid point minimum over all composited snapshots.

max: Calculate the grid point maximum over all composited snapshots.

  --histogram <string> [var,offset,binsize]

Bin data at each grid point from variable var into a histogram. Bins are equal sized with width binsize, and with the first bin occuping the interval [offset,offset+binsize]. Results are stored as a sparse array with name "<varout>_hist".

  --dx <double> [0.5]

The horizontal grid spacing of the XY grid or the radial grid spacing on the RAD grid (in great circle degrees). Also the horizontal grid spacing of the RLL grid (in degrees).

  --resx <integer> [11]

The number of grid cells in each coordinate direction on the XY or RLL grid, or the number of grid cells in the radial direction on the RAD grid.

  --resa <integer> [16]

The number of grid cells in the azimuthal direction on the RAD grid.

  --fixlon <double> [-999.0]

The fixed geographic longitude (center of the grid) for composites or snapshots. If set to -999. then this argument is ignored.

  --fixlat <double> [-999.0]

The fixed geographic longitude (center of the grid) for composites or snapshots. If set to -999. then this argument is ignored.

  --max_time_delta <string> ""

A string of the form [#d][#h][#m][#s] indicating the maximum time difference where data could be substituted when not available at the time specified in the nodefile. For instance, if the nodefile indicates a nodal feature is present at 06:00 and data is only available at 05:00 and 07:30, then a --max_time_delta of 2h would permit either of these times to be substituted for the missing data at 06:00. When multiple data points are available, the closest data point will be used (in this case 05:00).

  --lonname <string> ""

Name of the longitude variable in the data files.

  --latname <string> ""

Name of the latitude variable in the data files.


DetectBlobs [MPI]

Overview

DetectBlobs is used for identifying areal features. This executable is analogous to the "map" step in the "MapReduce" framework. Candidate regions are selected based on information at a single time slice. Features are marked using a binary mask and output stored in NetCDF format.

Arguments

  --in_data <string> ""

The input data file.

  --in_data_list <string> ""

A text file containing a list of input data files. Only one of --in_data or --in_data_list may be specified.

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --diag_connect <bool>

When the data is on a structured grid, consider grid cells to be connected in the diagonal (across the vertex).

  --out <string> ""

The output NetCDF file containing the binary mask, used if --in_data is specified.

  --out_list <string> ""

A text file containing, one per line, a list of output NetCDF files corresponding to the input files specified in --in_data_list.

  --thresholdcmd <string> "<cmd1>;<cmd2>;..."

Tag grid points that satisfy a threshold criteria (there must exist a point within a given distance of each grid point that satisfies a given equality or inequality). Threshold commands are separated by a semicolon. Each threshold command takes the form "var,op,value,dist". These arguments are as follows.

var <variable> is the name of the variable used for the thresholding.

op <string> is the operator that must be satisfied for threshold (options include >,>=,<,<=,=,!=).

value <double> is the value on the right-hand-side of the comparison.

dist <double> is the great-circle distance away from the candidate to search for a point that satisfies the threshold. Search is performed by breadth-first search over the grid. Note that this argument, if specified, can greatly slow down processing particularly for larger values.

  --filtercmd <string> "<cmd1>;<cmd2>;..."

Filter out contiguous regions (blobs) that do not satisfy a minimum count of points satisfying a threshold. Filter commands are specified as "var,op,value,count". These arguments are as follows.

var <variable> is the name of the variable used for the filtering.

op <string> is the operator that must be satisfied for filtering (options include >,>=,<,<=,=,!=).

value <double> is the value on the right-hand-side of the comparison.

count <integer> is the number of grid points within each blob that must satisfy the filter (only used when var is a variable).

  --geofiltercmd <string> "<cmd1>;<cmd2>;..."

Filter out contiguous regions (blobs) that do not satisfy some geometric property. Geometric filter commands are specified as "prop,op,value". These arguments are as follows.

prop <string> is one of "area" or "areafrac".

op <string> is the operator that must be satisfied for filtering (options include >,>=,<,<=,=,!=).

value <double> is the value on the right-hand-side of the comparison.

  --outputcmd <string> "<cmd1>;<cmd2>;..."

Include additional data in the output file. Each output command takes the form "var,varout". These arguments are as follows.

var <variable> is the name of the input variable used for output.

varout <string> is the name of the variable to write in the output file.

  --timefilter <string> ""

A regular expression used to match only those time values to be retained. Several default values are available as well as follows.

3hr filter every 3 hourly (equivalent to "....-..-.. (00|03|06|09|12|15|18|21):00:00").

6hr filter every 6 hourly (equivalent to "....-..-.. (00|06|12|18):00:00").

daily filter daily (equivalent to "....-..-.. 00:00:00").

  --minlat <double> [0.0]

The minimum latitude for tagged points.

  --maxlat <double> [0.0]

The maximum latitude for tagged points. If --maxlat and --minlat are equal then these arguments are ignored.

  --minabslat <double> [0.0]

The minimum absolute value of the latitude for tagged points. This argument has no effect if set to zero.

  --regional <bool>

Used to indicate that a given latitude-longitude grid should not be periodic in the longitudinal direction.

  --tagvar <string> ""

The name of the output variable containing the binary tags.

  --lonname <string> ""

Name of the longitude variable in the data files.

  --latname <string> ""

Name of the latitude variable in the data files.

  --verbosity <integer> [0]

Set the verbosity level of execution.


StitchBlobs

Overview

StitchBlobs is used for tracking areal features in time, assigning connected features a unique global id. Given input as a time-dependent binary mask variable, blobs that overlap in sequential time steps will be assigned the same global identifier. The stitching algorithm first builds a graph with contiguous blobs at each time slice representing nodes, and edges where two sequential areal objects are deemed to be connected in time. Connected sub-graphs are then tagged with the same global identifier.

Arguments

  --in <string> ""

The input data file containing a binary mask variable (typically output from DetectBlobs).

  --in_list <string> ""

A text file containing a list of input data files. Only one of --in or --in_list may be specified.

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --diag_connect <bool>

When the data is on a structured grid, consider grid cells to be connected in the diagonal (across the vertex).

  --out <string> ""

The output NetCDF file containing the tagged blobs, used if --in is specified.

  --out_list <string> ""

A text file containing, one per line, a list of output NetCDF files corresponding to the input files specified in --in_list.

  --var <variable> ""

The name of the variable in the input file providing the binary tag.

  --outvar <string> ""

The name of the variable in the output file to contain the global feature id.

  --minsize <integer> [1]

The minimum number of grid points per time slice for a given areal feature.

  --mintime <integer> [1]

The minimum number of time slices for a given tracked area feature. Features that are not present for at least --mintime are removed.

  --min_overlap_prev <double> [0.0] (%)

Given areal features at sequential times t1 < t2 that overlap at at least one grid point, --min_overlap_prev denotes the minimum fractional area of the feature at time t1 that must be overlapped by the feature at time t2 for the two features to be considered connected in time.

  --max_overlap_prev <double> [100.0] (%)

Given areal features at sequential times t1 < t2 that overlap at at least one grid point, --max_overlap_prev denotes the maximum fractional area of the feature at time t1 that must be overlapped by the feature at time t2 for the two features to be considered connected in time.

  --min_overlap_next <double> [0.0] (%)

Given areal features at sequential times t1 < t2 that overlap at at least one grid point, --min_overlap_next denotes the minimum fractional area of the feature at time t2 that must be overlapped by the feature at time t1 for the two features to be considered connected in time.

  --max_overlap_next <double> [100.0] (%)

Given areal features at sequential times t1 < t2 that overlap at at least one grid point, --max_overlap_next denotes the maximum fractional area of the feature at time t2 that must be overlapped by the feature at time t1 for the two features to be considered connected in time.

  --restrict_region <string> "lat0,lat1,lon0,lon1,count"

Filter out blobs that are not present in the specified longitude-latitude region [lon0,lon1]x[lat0,lat1] for at least count time slices.

  --regional <bool>

Used to indicate that a given latitude-longitude grid should not be periodic in the longitudinal direction.

  --minlat <double> [-90.0]

The minimum latitude for tagged points.

  --maxlat <double> [90.0]

The maximum latitude for tagged points. If --maxlat and --minlat are equal then these arguments are ignored.

  --minlon <double> [0.0]

The minimum latitude for tagged points.

  --maxlon <double> [360.0]

The maximum latitude for tagged points. If --maxlon and --minlon are equal then these arguments are ignored.

  --lonname <string> ""

Name of the longitude variable in the data files.

  --latname <string> ""

Name of the latitude variable in the data files.

  --thresholdcmd <string> "<cmd1>;<cmd2>;..."

Filter out areal features that do not satisfy a given threshold at each time slice. Thresholds are specified as "quantity,value". These arguments are as follows.

quantity <string> is one of minarea/maxarea (minimum/maximum area in steradians) or minarealfraction/maxarealfraction (minimum/maximum fraction of a longitude-latitude grid box covered by feature).

value <double> is the numerical value associated with the quantity.

  --verbosity <integer> [0]

Set the verbosity level of execution.

  --flatten

Instead of outputing global identifiers, instead simply output a binary value to indicate the presence of a stitched object.


BlobStats

Overview

BlobStats is used for summarizing the properties of areal features, either those tracked by DetectBlobs or StitchBlobs.

Arguments

  --in_file <string> ""

The input data file containing a binary or indexed mask variable (typically output from DetectBlobs or StitchBlobs).

  --in_list <string> ""

A text file containing a list of input data files. Only one of --in_file or --in_list may be specified.

  --findblobs <string> ""

If data is a binary mask variable, this argument will use the flood fill algorithm to identify contiguous patches and assign them a unique index at each time. This does not connect blobs together in time, which requires use of StitchBlobs.

  --in_connect <string> ""

A connectivity file that describes the unstructured grid.

  --diag_connect <bool>

When the data is on a structured grid, consider grid cells to be connected in the diagonal (across the vertex).

  --regional <bool>

Used to indicate that a given latitude-longitude grid should not be periodic in the longitudinal direction.

  --out_file <string> ""

The output text file containing summary statistics about each blob.

  --var <string> ""

The name of the binary or indexed mask variable in the input file(s).

  --sumvar <string> ""

A comma delimited list of variables which are summed over each blob and written to the output, weighted by the associated gridpoint area.

  --out <string> ""

A comma delimited list of output quantities to write to the summary file. Options are as follows

minlat: southernmost latitude within the blob.

maxlat: northernmost latitude within the blob.

minlon: westernmost longitude within the blob.

maxlon: easternmost longitude within the blob.

meanlon: average of minlon and maxlon.

meanlat: average of minlat and maxlat.

centlon: centroid longitude, calculated in Cartesian coordinates with area weighting.

centlat: centroid latitude calculated in Cartesian coordinates with area weighting.

area: total area of the blob.

  --out_headers <bool>

Output headers to describe each column in the data.

  --out_fulltime <bool>

Output the full timestamp on each line.

GenerateConnectivityFile

Overview

Given an unstructured mesh in either SCRIP or Exodus format, generate a connectivity file that can be used as input to --in_connect.

Arguments

  --in_mesh <string> ""

The input mesh file.

  --in_concave <bool>

A flag indicating that some grid cells in the mesh are concave polygons.

  --out_type <string> [FV|CGLL|DGLL]

A string indicating the type of connectivity to generate. Options include FV (standard finite volume mesh), CGLL (continuous Galerkin using Gauss-Lobatto-Legendre nodes, typical for spectral element model output), or DGLL (discontinuous Galerkin using Gauss-Lobatto-Legendre nodes, typical for discontinuous Galerkin model output).

  --out_np <integer> [4]

If --out_type is CGLL or DGLL, this refers to the number of points per element.

  --out_connect <string> ""

The output connectivity file.


Climatology [MPI]

Overview

Calculate certain climatological statistics over a dataset, including the long-term daily mean (LTDM).

Arguments

  --in_data <string> ""

The input data file.

  --in_data_list <string> ""

A text file containing a list of input data files. Only one of --in_data or --in_data_list may be specified.

  --out_data <string> ""

The processed output data file.

  --var <string> ""

The name of the variable in the input file providing the binary tag.

  --memmax <string> "2G"

The maximum amount of memory to use for storage of the working data arrays. This value is specified as one of #K (kilobytes), #M (megabytes), or #G (gigabytes).

  --period <string> "daily"

The type of climatological mean to calculate. Options include daily, monthly, seasonal, or annual.

  --type <string> "mean"

The type of statistical calculation to perform. Options include mean or meansq (for mean of squares, used in calculating the variance).

  --include_leap_days <bool>

Include leap days in the statistical calculation. By default leap days are skipped since the sample size tends to be smaller.

  --missingdata <bool>

If set this indicates that the input data contains missing values.

  --temp_file_path <string> "."

The path to use for storing temporary files.

  --keep_temp_files <bool>

If set to true, temporary files will not be deleted after the calculation completes.

  --verbose <bool>

Enable verbose output.


FourierFilter

Overview

Apply a discrete Fourier filter along the specified dimension.

Arguments

  --in_data <string> ""

The input data file.

  --out_data <string> ""

The output data file.

  --var <string> ""

A comma-delimited list of variable(s) to apply the filtering to.

  --preserve <string> ""

A comma-delimited list of variable(s) that should be copied from input file to output directly.

  --dim <string> ""

The name of the dimension to apply the filter to.

  --modes <integer> ""

The number of modes to retain from filtering. A value of 1 indicates that only the mean should be maintained, 2 indicates the mean and the longest wavelength mode are maintained, etc.