SyCLoPS User Manual¶
The SyCLoPS paper: The System for Classification of Low-Pressure Systems (SyCLoPS): An All-in-One Objective Framework for Large-scale Datasets (Han & Ullrich, 2025). Link to the article: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024JD041287
Documentation about the TempestExtremes(TE) software can be found HERE.
Some more details about the TE commands in SyCLoPS can be found in the supporting information (SI) text S6 of the SyCLoPS manuscript and in TE_commands.sh
on Zenodo. All the other files mentioned below are also available via SyCLoPS @ Zenodo.
Known issues
TempestExtremes now lacks the full ability to calculate parameters with missing value (e.g., 1e20) data. Therefore, it is now not directly applicable to some model outputs and reanalyses that have missing values where the data level is below the surface. We are working on this issue and users can expect a newer TE version with fixes in the near future.
In
SyCLoPS_classifier.py
, the alternative criteria for the regional data mode have typos (this bug does not affect the default classification process):On line 336, it should be
df_stlc=dfin[~(cond_hal) & ~(cond_dry) & ~(cond_trop) & (cond_cv) & ((dfin.WS200PMX>=25+WS200PMXadj)|(dfin.DEEPSHEAR>11)) & (cond_tlc)]
On line 338, it should be
df_stlc=dfin[~(cond_hal) & ~(cond_dry) & ~(cond_trop) & (cond_cv) & ((dfin.WS200PMX<25+WS200PMXadj)|(dfin.DEEPSHEAR<=11)) & (cond_tlc)]
Please correct these two lines before running the script with the regional model data mode. We may fix the typos in the next version of SyCLoPS.
Table of Contents:
- Look-up Tables and the Classification Flowchart (1.1-1.6)
- Tips for running SyCLoPS (2.1-2.7)
- Tips for using SyCLoPS ()
The two main steps to implement SyCLoPS are:
Run
TE_commands.sh
either line-by-line (recommended) or as a whole script (when everything is ready to go).Execute
python SyCLoPS_classifier.py
to run the main classifier program. This program preprocesses the data output by TE, computes the LPSAREA parameter and the track information for labeling the quasi-stationary (QS) tracks, performs the main classification process, and outputs the classified catalog. Please change the constant variables in the first section (e.g., number of processors to use, filenames) of the script to your desired settings before running it.
There are several optional steps for SyCLoPS applications which are discussed in section 3.2 of this manual.
If this manual does not answer your questions about SyCLoPS, please contact the author Yushan Han at yshhan@ucdavis.edu
1. Look-up Tables and the Classification Flowchart¶
This section reproduces several tables and the SyCLoPS workflow diagram from the SyCLoPS manuscript with some more details.
1.1 Variable requirements¶
Variable Name | Pressure Level (hPa) |
---|---|
U-component Wind (U) | 925, 850, 200$^{c}$ |
V-component Wind (V) | 925, 850, 200$^{c}$ |
Temperature (T) | 850 |
Relative Humidity (R)$^{a}$ | 850, 100$^{d}$ |
Mean Sea Level Pressure (MSL) | Sea Level |
Geopotential (Z) | Surface, 925, 850$^{e}$ , 700, 500, 300$^{c}$ |
Relative Vorticity (VO)$^{b}$ | 500 |
a. Specific humidity can be converted to R with additional temperature information.
b. VO can also be calculated from U and V at 500 hPa if not directly available. See comments in "TE_commands.sh".
c. These default values can be replaced with 250 hPa. See Appendix B for details.
d. The daily frequency for R at 100 hPa is sufficient to maintain good performance of the SyCLoPS classification.
e. Z at 850 hPa is optional if the data set has missing/fill values where the data plane intersects the surface. See the high-alitude altitude condition described in 1.4.
P.S. The optional 10 m U and V component wind variables are used to calculate the maximum surface wind speed (WS) of an Low-Pressure System (LPS) as reference information in the classified catalog. They do not play a role in the detection and classification process.
1.2 LPS Initialism Table¶
Initialism |
Full Term |
Definition |
---|---|---|
HAL | High-altitude Low | LPSs found at high altitudes without a warm core |
THL | Thermal low | Shallow systems featuring a dry and warm lower core |
HATHL | High-altitude Thermal Low | LPSs found at high altitudes with a warm core |
DOTHL | Deep (Orographic) Thermal Low | Non-shallow LPSs featuring a dry and warm lower core often driven by topography |
TC | Tropical Cyclone | LPSs that would be named in IBTrACS |
TD | Tropical Depression | Tropical systems that have developed a weak upper-level warm core and are strong enough to be recorded as TDs in IBTrACS |
TLO | Tropical Low | Non-shallow tropical systems that fall short of TD requirements |
MD | Monsoon Depression | TDs developing in monsoonal environment. A monsoon environment is considered to be dominated by westerly winds (resulting in asymmetric wind fields in monsoon LPS) and very humid Labeled as "TD(MD)" in the classified catalog. TDs that fall short of the monsoonal system condition are labeled "TD" |
ML | Monsoon Low | TLOs developing in monsoonal environment. Labeled as "TLO(ML)" in the classified catalog. TLOs that fall short of the monsoonal system condition are labeled "TLO" |
MS | Monsoonal System | Monsoon LPSs (MDs plus MLs) |
TLC | Tropical-Like Cyclone | Non-tropical LPSs that resemble TCs (typically smaller than TCs). For example, they can have gale-force sustained surface wind, well-organized convection (sometimes with an eyewall) and a deep warm core |
SS (STLC) | Subtropical Storm (Subtropical Tropical-Like Cyclone) | A type of TLC in the subtropics, represented by Mediterranean hurricanes |
PL (PTLC) | Polar Low (Polar Tropical-Like Cyclone) | A type of TLC typically found north of the polar front |
SC | Subtropical Cyclone | A type of LPS that is typically associated with a upper-level cut-off low south of the polar jet and has a shallow warm core |
EX | Extratropical Cyclone | Most typical non-tropical cyclones |
DS | Disturbance | Shallow LPSs or waves with weak surface circulations. DSD, DST and DSE are dry, tropical and extratropical DSs |
QS | Quasi-stationary | LPSs that stay relatively localized as labeled by the QS track condition |
1.3 The Input LPS Catalog Column Table (for SyCLoPS_input.parquet)¶
Column | Unit |
Description |
---|---|---|
TID | - | LPS track ID (0-based) in both the input and classified catalog |
ISOTIME | - | UTC timestamp (ISO time) of the LPS node in both catalogs |
LON | ° | Longitude of the LPS node in both catalogs |
LAT | ° | Latitude of the LPS node in both catalogs |
MSLP | Pa | Mean sea level pressure at the LPS node in both catalogs |
MSLPCC20 | Pa | Greatest positive closed contour delta of MSLP over a 2.0° Great Circle Distance (GCD), representing the core region of an LPS |
MSLPCC55 | Pa | Greatest positive closed contour delta of MSLP over a 5.5° GCD |
DEEPSHEAR | $\mathrm{m\:s^{-1}}$ | Average deep-layer wind speed shear between 200 hPa and 850 hPa over a 10.0° GCD |
UPPTKCC | $\mathrm{m^{2}\:s^{-2}}$ | Greatest negative closed contour delta of the upper-level thickness between 300 hPa and 500 hPa over a 6.5° GCD, referenced to the maximum value within 1.0° GCD |
MIDTKCC | $\mathrm{m^{2}\:s^{-2}}$ | Greatest negative closed contour delta of the middle-level thickness between 500 hPa and 700 hPa over a 3.5° GCD, referenced to the maximum value within 1.0° GCD |
LOWTKCC$^{a}$ | $\mathrm{m^{2}\:s^{-2}}$ | Greatest negative closed contour delta of the lower-level thickness between 700 hPa and 925 hPa over a 3.5° GCD, referenced to the maximum value within 1.0° GCD |
Z500CC | $\mathrm{m^2\:s^{-2}}$ | Greatest positive closed contour delta of geopotential at 500 hPa over a 3.5° GCD referenced to the minimum value within 1.0° GCD |
VO500AVG | $\mathrm{s^{-1}}$ | Average relative vorticity over a 2.5° GCD |
RH100MAX | % | Maximum relative humidity at 100 hPa within 2.5° GCD |
RH850AVG | % | Average relative humidity over a 2.5° GCD at 850 hPa |
T850 | K | Air temperature at 850 hPa at the LPS node |
Z850 | $\mathrm{m^2\:s^{-2}}$ | Geopotential at 850 hPa at the LPS node |
ZS | $\mathrm{m^2\:s^{-2}}$ | Geopotential at the surface at the LPS node |
U850DIFF | $\mathrm{m\:s^{-1}\:sr}$ | Difference between the weighted area mean of positive and negative values of 850 hPa U-component wind over a 5.5° GCD |
WS200PMX | $\mathrm{m\:s^{-1}}$ | Maximum poleward value of 200 hPa wind speed within 1.0° GCD longitude |
RAWAREA$^{b}$ | $\mathrm{km^2}$ | The raw defined size (see appendix E) of the LPS |
LPSAREA | $\mathrm{km^2}$ | The adjusted defined size of the LPS in both catalogs (see appendix E) |
a. 925 hPa may be replaced by 850 hPa if data at this level are not available in some datasets.
b. This parameter in the column is for user reference only. It does not affect any results in the SyCLoPS LPS node or track classification.
1.4 Classification Condition Table¶
Condition Name | Conditions |
---|---|
High-altitude Condition$^{a}$ | Z850 > ZS |
Dryness Condition | RH850AVG < 60% |
Cyclonic Condition | VO500AVG >= (<) 0 $\mathrm{s^{-1}}$ if LAT >= (<) 0° |
Tropical Condition | RH100MAX > 20%; DEEPSHEAR < 18 $\mathrm{m\:s^{-1}}$; T850 > 280 K |
Transition Condition | Tropical Conditon = True; DEEPSHEAR $>$ 10 $\mathrm{m\:s^{-1}}$ or RH100MAX < 55% |
TC Condition | MSLPCC20 > 215 Pa; LOWTKCC < 0 $\mathrm{m^2\:s^{-2}}$; UPPTKCC < -107.8 $\mathrm{m^2\:s^{-2}}$ |
TD Condition | MSLPCC55 > 160 Pa; UPPTKCC < 0 $\mathrm{m^2\:s^{-2}}$} |
MS Condition | U850DIFF > 0 $\mathrm{m\:s^{-1}}$; RH850AVG > 85% |
TLC Condition$^{b}$ | MSLPCC20 > 190 Pa; LOWTKCC and MIDTKCC $<$ 0 $\mathrm{m^2\:s^{-2}}$;(LPSAREA < 5.5 × 105 km^2; LPSAREA > 0 km^2) or (MSLPCC20 > 420 Pa; MSLPCC20 : MSLPCC55 > 0.5) |
SC Condition | LOWTKCC < 0 $\mathrm{m^2\:s^{-2}}$; Z500CC $>$ 0 $\mathrm{m^2\:s^{-2}}$;WS200PMXc > 30 $\mathrm{m\:s^{-1}}$ |
TC Track Condition | At least 8 (8+) 3-hourly TC-labeled nodes in an LPS track |
MS Track Condition | 10+ 3-hourly "TLO(ML)" or "TD(MD)"-labeled nodes |
SS Track Condition | 2+ 3-hourly TLC-labeled nodes ("SS(STLC)" or "PL(PTLC)") |
PL Track Condition | 2+ 3-hourly TLC-labeled nodes ("SS(STLC)" or "PL(PTLC)") |
QS Track Condition | See SI text S3 for details |
a. It can be as simple as checking the availability of T850 (or Z850) data (have null/missing values or not) in some records.
b. See section 5.3 in the SyCLoPS paper for a possible alternative.
c. The WS200PMX criteria used in this framework may be supplemented by other parameters in some regional models. See Supporting Information (SI) text S4 for details.
1.5 The Classified LPS Catalog Column Table (for SyCLoPS_classified.parquet)¶
Column | Unit |
Description |
---|---|---|
TID | - | LPS track ID (0-based) in both the input and classified catalog |
ISOTIME | - | UTC timestamp (ISO time) of the LPS node in both catalogs |
LON | ° | Longitude of the LPS node in both catalogs |
LAT | ° | Latitude of the LPS node in both catalogs |
MSLP | Pa | Mean sea level pressure at the LPS node in both catalogs |
WS* | $\mathrm{m\:s^{-1}}$ | Maximum wind speed at the 10-m level within 2.0° GCD |
Full_Name | - | The full LPS name based on the classification |
Short_Label | - | The assigned LPS label (the abbreviation of the full name) |
Tropical_Flag | - | 1 if the LPS is designated as a tropical system, otherwise 0 |
Transition_Zone | - | 1 if the LPS is in the defined transition zone, otherwise 0 |
Track_Info | - | "TC", "MS", "SS(STLC)", "PL(PTLC)", "QS" denoted for TC, MS, SS, PL and QS tracks; "EXT", "TT" denoted for extratropical and tropical transition completion nodes |
IKE* | $\mathrm{TJ}$ | The integrated kinetic energy computed based on the LPS size blobs that are used to define RAWAREA |
* These two columns are for user reference only. They do not affect any results in the SyCLoPS LPS node or track classification.
1.6 SyCLoPS Classification Flowchart and Assigned Labels and Full Names¶
Section numbers in the figure refers to the section numbers in the SyCLoPS manuscript.

2. Tips for running SyCLoPS¶
2.1 Installing TE¶
Please refer to the TE GitHub page for instructions on how to install TE: https://github.com/ClimateGlobalChange/tempestextremes
It is recommended that you download and compile the TE source using MAKE, as TE may not be updated frequently on the conda channel.
Please refer to the TE documentation for detailed explanations of each function, argument, and operation: https://climate.ucdavis.edu/tempestextremes.php
To install and run TE in parallel, please make sure that your computer has an MPI implementation installed (e.g., Open MPI). When using MPI in TE commands, simply add
srun -n 128
ormpirun -np 128
to the beginning of some TE commands to enable parallel computation. For more details, see the MPI documentation provided online or by your supercomputer host.DetectNodes, DetectBlobs and VariableProcessor support MPI computation in TE version 2.2.3. The parallelization is achieved by writing the inputfile as a list of files, each containing variables of a time slice. See the next subsection for details.
If you have any questions about installing the TE software, please contact Paul Ullrich at paullrich@ucdavis.edu.
2.2 Operations in TE_commands.sh¶
In
TE_commands.sh
, we recommend to read this part of the manual and the comments in the shell script and run the commands one-by one.First, set the installation path of TE on your computer with
TEMPESTEXTREMESDIR=~/tempestextremes/bin
.Prepare a list of files (in txt format) containing all the required variables (see table in 1.1) required by the first DetectNodes operation as the
$inputfile
. The files in the list should be arranged in time slices, like this:Variable1_TimeSlice1;Variable2_TimeSlice1;Variable3_TimeSlice1;...
Variable1_TimeSlice2;Variable2_time_slice2;Variable3_TimeSlice2;...
Variable1_TimeSlice3;Variable2_time_slice3;Variable3_TimeSlice3;...- The
$outputfile
txt file should contain the same number of lines as the inputfile. So it should have a filename for each time slice on each line. In SyCLoPS version 4, we provided a sample$inputfile
using ERA5 files on Nersc's Permultter. Below is a sample shell script to list 4 different variables (Z, MSL, U and ZS (the constant surface geopotential)) with different time slices to generate the input file along with a corresponding output file (the outputfile is generated simultaneously):
- The
'''
ERA5DIR=/global/cfs/projectdirs/m3522/cmip6/ERA5
mkdir -p LPS
rm -rf ERA5_example_in.txt # the input file
rm -rf ERA5_example_out.txt # the output file
for f in $ERA5DIR/e5.oper.an.pl/*; do
# In this example ERA5 directory, variables are stored in folders named by years and months (e.g., 202001,202002)
yearmonth=$(basename $f)
year=${yearmonth:0:4}
echo "..${yearmonth}"
if [[ $year -gt '1978' ]] && [[ $year -lt '2023' ]]
then
for zfile in $f/*128_129_z*; do
zfilebase=$(basename $zfile)
yearmonthday=${zfilebase:32:8}
mslfile=`ls $ERA5DIR/e5.oper.an.sfc/${yearmonth}/*128_151_msl*`
ufile=`ls $ERA5DIR/e5.oper.an.pl/${yearmonth}/*128_131_u.*${yearmonthday}*`
zsfile=./e5.oper.invariant.Zs.ll025sc.nc
echo "$zfile;$mslfile;$ufile;$zsfile" >> ERA5_example_in.txt
echo "LPS/era5.LPS.node.${yearmonthday}.txt" >> ERA5_example_out.txt
done
fi
done
'''
Please also note that TE uses the time series of the first file in a row of a list (in the example above, the $zfile or geopotential file) to determine the time slices to look for for the rest of the files in that row. Therefore, it's recommended to put the variable file with the shortest time period at the beginning of each row to avoid raising errors. For example, if the geopotential file is divided into days and other varialbles' files are divided into months, then the geopotential file should be placed at the beginning of each row.
Here's another example shell script to list 4 different variables (Z, MSL, U10 and ZS, the constant surface geopotential) with different time slices for the inputfile in a customized data directory with filenames containing data's time period (e.g., 20100101):
'''
#!/bin/bash
DIR="/path/to/your/folder"
# Extract unique dates from filenames
dates=$(ls "$DIR" | grep -oP '\d{8}' | sort -u)
for date in $dates; do
echo "msl_${date}.nc;u10_${date}.nc;z_${date}.nc;zs_${date}"
done
'''
If you use MPI in applicable TE commands, each thread will take one time slice (row) in the
$inputfile
at a time and output a corresponding output file. If you have only one time slice to run, you can change the--in_data_list ...
/in_list ...
and--out_file_list ...
/out_data_list ...
/out_list ...
to the corresponding single sequence arguments (e.g.,--in_data Variable1_TimeSlice1;Variable2_TimeSlice1;Variable3_TimeSlice1
and--out_data TimeSlcie1_ouput.txt
), please refer to the TE documentation/terminal prompts for the specific argument name for each operation.latname and lonname need to be specified in the following commands only if the latitude and longitude variables in the given dataset use names other than the default "lat" and "lon". Specify
--logdir
to store temporary log files in the desired folder.DetectNodes: This command detects candidate LPS nodes and computes the 15 parameters needed for the classification:
This step can be time consuming. It's highly recommended to run this command in parallel, feeding it with a list of files ordered by time slices.
The time dimension in the invariant surface geopotential file for ZS/Z0 should be removed (averaged) prior to the following procedures. It can be achieved by something like: "ncwa -a time ZS_in.nc ZS_out.nc".
"WS" (the near-surface maximum wind speed within 2.0 GCD of the LPS node) is an optional parameter for reference purporse. Add
_VECMAG(VAR_10U,VAR_10V),max,2.0
to the end of the--outputcmd
if you want to output this parameter. "Z850" is also not needed if your data has missing values where the 850 hPa data level is below the surface.If you use a different time resolution than the default 3 hours used in the Zenodo SyCLoPS dataset, you should change "3hr" in the
--time_filter
to, for example, "6hr" (and the same for any operations that follow), or you may delete this argument if the data itself has a 6-hourly time resolution._CURL{16,2.5}(u(500hPa),v(500hPa)),min,0
(if using 25-50km resolution models) or_CURL{8,2.5}(u(500hPa),v(500hPa)),min,0
(if using lower resolution models) can replace "VO(500hPa),avg,2.5" for VO500AVG if the relative vorticity (VO) is not directly available. The results will be slightly different, but close enough for the purpose. Another option is to calculate VO at 500 hPa using U and V at 500 hPa and precede normally.
StitchNodes: This command stitches all detected nodes in sequence with parameters formatted in a csv file.
- If you are using a 6-hour detection rate in DetectNodes, you may consider either doubling the "4.0" in the
--range
argument (used for the default 3-hourly resolution) to "8.0" and adding a new--prioritize MSLP
argument at the end of the command, or increasing the--range
argument to "6.0" instead. The--prioritize
argument will prevent false connections at the supposed end of a track when "range" is greater than "mergedist" (the node merging distance) in DetectNodes (See SI text S6 for further details). 6.0 GCD range is also sufficient to cover the vast majority of the fast-moving extratropical cyclones. Follow these general rules when using data with different time resolutions.
- If you are using a 6-hour detection rate in DetectNodes, you may consider either doubling the "4.0" in the
The following TE commands, which are reserved for computing LPSAREA and generating blob files, can be omitted if you are not labeling tropical-like cyclones (TLCs, which include polar lows and subtropical storms) and are not interested in analyzing LPS precipitation/size blobs. When running SyCLoPS_classifier.py
, there is an option to skip the LPSAREA calculation and simplify the extratropical branch labels to SC, EX, and DSE only. Skipping these procedures saves about 50% of the SyCLoPS runtime.
8. VariableProcessors: There are two operations involving VariableProcessors. First is to compute a smoothed 850 hPa relative vorticity (RV) field and second is to revert the sign of RV in the southern Hemisphere to get the cyclonic RV.
These commands can be run in parallel.
Your inputfile should conatin files of U and V at 850 hPa for the first VariableProcessor operation.
If you are specifically interested in LPSs close to mountainous regions above 850 hPa level, it's recommended that you use 700 hPa data instead at this step for labeling LPS size blobs and precipitation blobs.
DetectBlobs: This step is to generate LPS size blobs. A similar to generate LPS precipitation blobs is introdcued in the
TE_optional.sh
file on Zenodo.This step can be time consuming. It's highly recommended to run this command in parallel, feeding it with a list of files ordered by time slices.
Your inputfile should include the cyclonic vorticity files output by the second VariableProcessor operation and files of U and V at 925 hPa. Make sure that each line has variable files of the same time range.
Again if you are specifically interested in LPSs on land or near high mountains, it might be the best to use 700 hPa U and V for the relative vorticity threshold and 850 or 700 hPa U and V for the wind speed threshold.
BlobStats: Generate useful informationo for calculating LPSAREA and tagging blobs with labels.
- This step cannot be run in parallel but could potentially be time-consuming if "sumvar" is used to compute IKE for each LPS node. One may opt to use the GNU parallel tool to run multiple commands simultaneously, each using a single thread. To accomplish this, users should first create a txt file containing a list of TE commands to be parallelized (e.g. breaking down into years) and a list of corresponding input and output files. Here is a sample bash script to generate a list of BlobStats commands (with IKE calculation for each blob) to be parallelized in years:
'''
for year in {1979..2022}
do
echo "${TEMPESTEXTREMESDIR}/BlobStats --in_list "DetectBlobs_size_output_{year}.txt" \
--findblobs --out_file ERA5_size_blob_stats_{year}.txt --var 'block_tag' \
--out 'centlon,centlat,minlat,maxlat,minlon,maxlon,area' \
--sumvar '_PROD(_SUM(_POW(U(925hPa),2),_POW(V(925hPa),2)),0.5)' \
--out_fulltime --latname latitude --lonname longitude"
done
'''
Then run
parallel -j 32 < blobstats_commands_list.txt
(32 is the number of threads to use) to start the paralled BlobStats processes.To compute IKE, one should provide the ouput NetCDF file from the DetectBlobs as well as 925 hPa U and V files in all the
StitchBlob_size_input_{year}.txt
file.
2.3 Tips on applying SyCLoPS to climate model outputs¶
Most high-resolution climate model outputs do not have all the required variables at 3 hours resolution, but they are mostly available at 6-hourly resiolution, which is good enough for tracking most LPSs. However, relative humidity (RH) at 100 hPa is usally only available at a daily-mean basis. In the SyCLoPS paper, we show that using daily-mean relative humidity or 6-hourly detection rate will not lower SyCLoPS performance (except that TLC detection skill decreases as detection rate decreases).
To use daily-mean relative humidity as the input, one must oversample the daily-mean RH files before putting them into TE, since TE's DetectNodes assumes that all the input files have the same data frequency that matches your detection rate (in this case, 6-hourly). For example, a typical daily-mean RH file contains only one data point per day, usually at T12:00 or T00:00 of each day. You would need to resample it to 4 data points per day at 00,06,12,18 UTC to match your 6-hourly detection frequency. You can replicate the daily average four times for each date, or you can do a linear interpolation between time steps. TE may update a feature to address this inconvenience in the future.
Another option is to use specific humidity and temperature to calculate relative humidity since some models have 6-hourly specific humdity data available. Check your model's daily-mean RH data's metadata to find out the formula it uses to calculae RH at very low temperatures.
Climate model outputs may not contain the 300 hPa and 200 hPa data used in the default settings of SyCLoPS, but they typically have 250 hPa data available. We have also shown in the paper that using 250 hPa data instead of 200 hPa and 300 hPa does not degrade SyCLoPS performance with some minor adjustments (see Appendix B of the paper). In this senario, just type "Y(Yes)" when asked if you want to use 250 hPa data instead of the default 200/300 hPa when running
SyCLoPS_classifier.py
.
2.4 Tips on applying SyCLoPS to regional model outputs/data¶
Because of the nature of the closed contour critera used in SyCLoPS and TE, false LPS tracks will be detected near the four edges of the regional domain. Hence, it is recommended to define a ~2° buffer zone sourrounding the four boundaries of your regional domain to remove tracks in that zone in a post-processing process.
Another option is to define
--minlat
,--maxlat
,--minlon
,--maxlon
in the DetectNodes command. If your domain boundaries are not parallel to latitude or longitude, you can create a mask file as input to DetectNodes to define a buffer zone of your domain (with the grid labeled "1" as the buffer zone). Then add something like--thresholdcmd BUFFER_ZONE_MASK,=,1,0.0
to the DetectNodes command.When running
SyCLoPS_classifier.py
, type "Y(Yes)" when asked if you are running with regional model data, so that the program will opt to use the alternative criteria designed for regional models (see SyCLoPS Supporting Information S4 for details) at several points in the classification process. [Check for known issues at the beginning of the manual that may affect this part].
3. SyCLoPS Catalogs Usages and Applications¶
3.1 How to select different types of LPS nodes and tracks in the classified catalog:¶
To open the classified catalog:
import numpy as np
import pandas as pd
ClassifiedCata='SyCLoPS_classified.parquet' # your path to the classified catalog
dfc=pd.read_parquet(ClassifiedCata) # open the parquet format file. PyArrow package requireed.
dfc
TID | LON | LAT | ISOTIME | MSLP | WS | Full_Name | Short_Label | Tropical_Flag | Transition_Zone | Track_Info | LPSAREA | IKE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 56.75 | 70.00 | 1979-01-01 00:00:00 | 97686.00 | 12.66622 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 1182834 | 160.5 |
1 | 0 | 56.75 | 69.75 | 1979-01-01 03:00:00 | 97869.81 | 12.43663 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 1039577 | 139.0 |
2 | 0 | 57.50 | 69.50 | 1979-01-01 06:00:00 | 98085.94 | 12.29883 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 952895 | 126.0 |
3 | 0 | 57.75 | 69.25 | 1979-01-01 09:00:00 | 98294.25 | 11.26188 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 877000 | 117.0 |
4 | 0 | 59.25 | 69.25 | 1979-01-01 12:00:00 | 98454.31 | 10.92470 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 863035 | 118.5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7781101 | 379301 | 336.00 | -60.50 | 2022-12-31 09:00:00 | 97146.94 | 12.96320 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 1131118 | 186.5 |
7781102 | 379301 | 337.25 | -60.75 | 2022-12-31 12:00:00 | 97272.25 | 13.02440 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 1037363 | 167.5 |
7781103 | 379301 | 339.00 | -61.00 | 2022-12-31 15:00:00 | 97358.38 | 13.75519 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 1065815 | 160.0 |
7781104 | 379301 | 340.00 | -60.75 | 2022-12-31 18:00:00 | 97431.12 | 13.85164 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 950077 | 131.0 |
7781105 | 379301 | 341.25 | -60.75 | 2022-12-31 21:00:00 | 97573.38 | 14.35034 | Extratropical Cyclone | EX | 0.0 | 0.0 | Track | 551042 | 81.0 |
7781106 rows × 13 columns
If desired, the input and ouput (classified) catalogs can also be combined to produce a larger catalog:
# InputCata='SyCLoPS_input.parquet'
# dfin=pd.read_parquet(InputCata)
# dfc=pd.concat([dfc,dfin],axis=1)
Task 1. Select a single type of LPS node (e.g., TC):
dftc=dfc[dfc.Short_Label=='TC']
Task 2. Select two types of LPS node (e.g., EX and SC):
dfexsc=dfc[(dfc.Short_Label=='EX') | (dfc.Short_Label=='SC')]
Task 3. Select two types of TLC node (including SS(STLC) and PL(PTLC)):
dftlc=dfc[dfc.Short_Label.str.contains('TLC')]
dftlc
TID | LON | LAT | ISOTIME | MSLP | WS | Full_Name | Short_Label | Tropical_Flag | Transition_Zone | Track_Info | LPSAREA | IKE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
24 | 1 | 347.00 | 68.00 | 1979-01-02 00:00:00 | 100267.10 | 20.49998 | Subtropical Tropical-like Cyclone (Subtropical... | SS(STLC) | 0.0 | 0.0 | Track | 369392 | 62.5 |
53 | 2 | 359.50 | 58.00 | 1979-01-01 21:00:00 | 101263.30 | 13.46975 | Subtropical Tropical-like Cyclone (Subtropical... | SS(STLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 102890 | 3.0 |
55 | 2 | 1.50 | 56.00 | 1979-01-02 03:00:00 | 101188.20 | 15.56540 | Subtropical Tropical-like Cyclone (Subtropical... | SS(STLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 124718 | 14.0 |
57 | 2 | 3.25 | 54.00 | 1979-01-02 09:00:00 | 100931.10 | 17.09763 | Subtropical Tropical-like Cyclone (Subtropical... | SS(STLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 171076 | 11.5 |
58 | 2 | 3.75 | 53.00 | 1979-01-02 12:00:00 | 100894.20 | 18.27007 | Polar Low (Extratropical Tropical-like Cyclone) | PL(PTLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 196833 | 7.5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7780923 | 379281 | 229.00 | 51.00 | 2022-12-30 21:00:00 | 98289.44 | 14.45700 | Polar Low (Extratropical Tropical-like Cyclone) | PL(PTLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 340496 | 51.0 |
7780924 | 379281 | 230.00 | 51.00 | 2022-12-31 00:00:00 | 98490.81 | 14.61593 | Polar Low (Extratropical Tropical-like Cyclone) | PL(PTLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 266978 | 30.5 |
7780925 | 379281 | 230.50 | 51.25 | 2022-12-31 03:00:00 | 98671.88 | 14.43227 | Polar Low (Extratropical Tropical-like Cyclone) | PL(PTLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 223329 | 25.5 |
7780926 | 379281 | 231.00 | 51.50 | 2022-12-31 06:00:00 | 98916.38 | 13.51526 | Polar Low (Extratropical Tropical-like Cyclone) | PL(PTLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 362436 | 20.5 |
7780927 | 379281 | 231.25 | 51.50 | 2022-12-31 09:00:00 | 99244.94 | 12.39370 | Polar Low (Extratropical Tropical-like Cyclone) | PL(PTLC) | 0.0 | 0.0 | Track_SS(STLC)_PL(PTLC) | 280448 | 17.0 |
208514 rows × 13 columns
Task 4. Select all nodes in TC trcaks and get the track IDs (TID) of all TC tracks:
dftc2=dfc[dfc.Track_Info.str.contains('TC')]
tctid=pd.unique(dftc2.TID)
print(tctid)
[ 17 21 59 ... 378821 378875 379061]
Task 5. Select all TC nodes in MS tracks:
dftcms=dfc[(dfc.Short_Label=='TC') & (dfc.Track_Info.str.contains('MS'))]
Task 6. Select all DST and all TLO (including TLO and TLO(ML)) nodes in TC tracks:
dftc3=dfc[((dfc.Short_Label=='DST')|(dfc.Short_Label.str.contains('TLO'))) & (dfc.Track_Info.str.contains('TC'))]
Task 7. Select LPS track IDs (TIDs) that have at least 5 non-tropical LPS nodes that are not DSE:
dfex=dfc[(dfc.Tropical_Flag==0)&(dfc.Short_Label!='DSE')]
extrackid=pd.unique(dfex.TID)[dfex.groupby('TID')['TID'].count()>=5]
Task 8. Select Tracks that are both a TC track, SS track and PL(PTLC) track:
tcsspl_trackid=pd.unique(dfc[(dfc.Track_Info.str.contains('TC')) & (dfc.Track_Info.str.contains('SS')) & (dfc.Track_Info.str.contains('PL'))].TID)
dftcsspl=dfc[dfc.TID.isin(tcsspl_trackid)]
Task 9. Select all non-tropical (extratropical) LPS nodes:
dfexnode=dfc[(dfc.Tropical_Flag==0)]
Task 10. Select all tropical TCs nodes in TC tracks that are not undergoing extratropical transition:
dftc3=dfc[(dfc.Track_Info.str.contains('TC')) & (dfc.Tropical_Flag==1) & (dfc.Transition_Zone==0)]
Task 11. Select all tropical transition completion nodes:
dftt=dfc[dfc.Track_Info.str.contains('TT')]
Task 12. Select TC tracks that do not undergo extratropical transition
tcnoext_trackid=pd.unique(dfc[(dfc.Track_Info.str.contains('TC')) & ~(dfc.Track_Info.str.contains('EXT'))].TID)
Task 13. Select potential easterly wave (EW) nodes:
dfew=dfc[~(dfc.Track_Info.str.contains('M')) & ~(dfc.Track_Info.str.contains('Q')) & ~(dfc.Short_Label.str.contains('M')) & (dfc.Tropical_Flag==1)]
Task 14. Select all LPS nodes within a bounded region in January:
dflps=dfc[(dfc.LAT>=30) & (dfc.LAT<=50) & (dfc.LON>=280) & (dfc.LON<=350) & (dfc.ISOTIME.dt.month==1)]
Task 15. Select all PL nodes in PL(PTLC) tracks in the Nordic Seas from 1979 to 1999
dfpl=dfc[(dfc.LAT>=45) & (dfc.LAT<=85) & (((dfc.LON>=320) & (dfc.LON<360)) | ((dfc.LON>=0) & (dfc.LON<=70))) & \
(dfc.Track_Info.str.contains('PL')) & (dfc.Short_Label=='PL(PTLC)') & (dfc.ISOTIME.dt.year>=1979) & (dfc.ISOTIME.dt.year<=1999)]
dfpl.LON.hist()
<Axes: >
3.2 Other applications based on the classified catalog:¶
Here we introduce two additional usages of SyCLoPS: calculating intergataed kinetic energy (IKE) accumulation or precipitation contribution percentage for a specific type of LPS.
To perform this task, users need to run Blob_idtag.py
and TE_optional.sh
. Users can opt to run the additional TE commands within `Blob_idtag_app.py' (lines 13-15). The procedure can be divided into five steps:
- The additional TE commands in
TE_optional.sh
detect precipitation blobs and calculate blob statistics (properties) using BlobStats in addition to the size blobs already detected inTE_commands.sh
. - Both size and precipitation blobs are masked with a unique ID (1-based, e.g., 1,,2,3,4,5,...) through StitchBlobs.
- The blob-tagging Python script (
Blob_idtag.py
) pairs precipitation blobs to LPS nodes in the same way as we did for size blobs. - The Python script assigns tags (labels) to different blobs according to their paired labeled LPS nodes and the blobs IDs given by BlobStats. The assigned tags are then used to remask blobs with the tag numbers (e.g., 1=TC, 2=MS, 3=SS, 4=PL, 5=others) in the StitchBlobs's output nc files.
- Finally, run TE commands demonstrated in the "additional steps" in
TE_optional.sh
to extract 3-hourly precipitation and 925 hPa IKE at each grid point conatined within each size/precipitation blobs that are associated with a tag number (i.e., a type of LPS).
In step four, the Python script uses a tagging arrangement like we described in the last section of the SyCLoPS manuscript. However, there are many ways one can assign those tags. In the manuscript, we define TC blobs (with tag=1) as those blobs that are paired with TC nodes in TC tracks, which corresponds to these nodes in the classified catalog:
tcid=dfc[(dfc.Short_Label=='TC') & (dfc.Track_Info.str.contains('TC'))].index.values
blobtag=np.ones(len(dfc))*5 #5 = Other systems
blobtag[tcid]=1
# Subsequent codes in the Python script: ...
However, one can also define that blobs paried with all TC nodes (not only those in TC tracks) are considered TC blobs with tag=1:
tcid=dfc[dfc.Short_Label=='TC'].index.values
blobtag=np.ones(len(dfc))*5 #5 = Other systems
blobtag[tcid]=1
# Subsequent codes in the Python script: ...
One may also define that blobs paried with all tropical LPS nodes in TC tracks are considered TC blobs with tag=1:
tcid=dfc[(dfc.Tropical_Flag==1) & (dfc.Track_Info.str.contains('TC'))].index.values
blobtag=np.ones(len(dfc))*5 #5 = Other systems
blobtag[tcid]=1
# Subsequent codes in the Python script: ...
If you are using a multiple tag system (e.g. having tag = 1,2,3,4,and more), please be careful not to have overlapping paired LPS nodes among different tags (i.e., making them all mutually exclusive). The below example shows a bad practice:
tcid=dfc[(dfc.Tropical_Flag==1) & (dfc.Track_Info.str.contains('TC'))].index.values
msid=dfc[(dfc.Tropical_Flag==1) & (dfc.Track_Info.str.contains('MS'))].index.values
blobtag=np.ones(len(dfc))*5 #5 = Other systems
blobtag[tcid]=1 #1=TCs
blobtag[msid]=2 #2=MSs
# Subsequent codes in the Python script: ...
The above codes will produce overlapped LPS nodes within tcid
and msid
. This will cause some TC node IDs to be overwritten by the subsequent MS IDs.
You may also just ouput one kind of tag (e.g., just tag=1) for a group of LPSs:
msid=dfc[(dfc.Short_Label.str.contains('M')) & (dfc.Short_Label=='TC') & (dfc.Track_Info.str.contains('MS'))].index.values
blobtag=np.ones(len(dfc))*0 # Other systems are all labeled 0
blobtag[msid]=1 #1=MSs
Another example:
ssid=dfc[(dfc.Short_Label=='SS(STLC)') & (dfc.Track_Info.str.contains('SS')) & ~(dfc.Track_Info.str.contains('TC'))].index.values
blobtag=np.ones(len(dfc))*0 # Other systems are all labeled 0
blobtag[ssid]=1 #1=SSs
In the above two examples, blobs that are not tagged (masked) "1" will be tagged (masked) "0." In binary masking, "0" means that blobs are not detected. Hence, the final output NetCDF blob files will only contain blobs with tag (mask)=1 associated with the desired LPS group.
After tags are assigned to blobs as described in the Python script, they will be used to alter the original blobs masks in the NetCDF files output by StitchNodes. If one groups the blob ids in terms of the tag assigned, it will look something like this:
Tag number | Blob IDs |
---|---|
1 | 50, 139, 236, 337, 438, 553, 554, 663, ... |
2 | 46, 137, 235, 335, 434, 436, 550, 660, ... |
3 | 121, 244, 709, 719, 849, 861, 935, 1153, ... |
4 | 1261, 1324, 1431, 1535, 1637, 1748, 1753, 185, ... |
5 | 1, 2, 3, 4, 5, 6, 7, 8, ... |
The output nc files with these alternations will contain blobs with their assigned tag numbers. For example, if tags 1-5 are used, grid points in each blob will be either masked 1, 2, 3, 4 or 5.
Finally, after implementing the last step (step 5) in TE, one can easily calculate the final accumulated IKE of each type of LPS over a period of time by summing each time frame of the output NetCDF files. To calculate the precipitation contribution percentage of a type of LPS, one should first calculate the total precipitation over a period by summing each time frame of the 3-hourly (or other frequency) precipitation file without any blob masks. Then do the same procedure, but with the precipitation blob masks output by TE. Lastly, the (annual/seasonal) contribution percentage of a type of LPS can be easily performed.