Application of agglomerative hierarchical clustering method in precipitation forecast assessment

doi:10.11755/j.issn.1006-7639(2022)-04-0690

Abstract

Abstract:

For precipitation forecast products with different methods and time, a large number of evaluation results often exist together. At present, we’re still lacking effective measures on how to analyze comprehensively and systematically these results. In this study, the agglomerative hierarchical cluster analysis is introduced to classify and analyze the different evaluation results of different forecast products, based on a grid precipitation forecast dataset of each member of the national forecast technology and method competition of CMA from June to September 2019, the central station guide forecast (SCMOC) of the National Meteorological Center, the seamless analysis and forecasting leading-edge system forecast of Chinese Academy of Meteorological Sciences and objective forecast products of 31 provinces (municipalities and autonomous regions)， the global modelforecast of ECMWF (European Centre for Medium-Range Weather Forecasts) and NCEP (National Centers for Environmental Prediction). The results show that the agglomerative hierarchical clustering results can clearly distinguish their similarities and differences between different forecast products. The different evaluation indicators lead to different clustering results, but the forecast products with high similarity are still divided into a same subclass. The identification effect of four different inter-class similarity measurement methods on categories characteristics was different, and the Ward method was followed by Complete, Average and Single method from clear to fuzzy. In addition, the precipitation prediction ability for different administrative regions and forecast products was different, the accuracy of rain probability forecast in North China and East China was better than that in other regions, and most objective forecasts to rain probability and precipitation relative error were better than model forecast of ECMWF, while they to heavy precipitation were worse than ECMWF model, there are still greater difficulties in interpretation to heavy precipitation forecast.

Key words: agglomerative hierarchical clustering, intelligent grid forecast, precipitation forecast verification, similarity measurement methods, comprehensive analysis

摘要：

面对不同方法、不同时效的降水预报产品，往往同时存在大量的检验评估结果，如何较全面、系统地综合分析以便更好地认识各预报结果，目前仍然缺乏有效手段。本文以2019年6—9月全国智能预报技术方法交流大赛的网格预报数据及国家气象中心指导预报、中国气象科学研究院的无缝隙分析预报前沿系统预报产品及31个省(市、区)客观预报产品、欧洲中期天气预报中心(European Centre for Medium-Range Weather Forecasts， ECMWF)和美国国家环境预测中心的全球模式预报数据构成的样本集为例，采用凝聚层次聚类分析方法，对不同降水预报产品的不同检验评估结果进行归类分析。结果表明：凝聚层次聚类结果能够清晰反映样本集内降水预报产品的整体性能及其差异。基于不同数量的降水评估指标的聚类结果存在明显差异，但高相似度的预报产品均能划分为一个子类。不同的类间相似度度量方法能够影响样本类别特征差异的清晰程度，从清晰到模糊依次为Ward、Complete、Average、Single。不同行政区域、预报产品的降水预报能力表现不同，华北和华东地区的晴雨预报准确率高于其他区域，绝大部分客观预报在晴雨和降水量相对误差预报性能上优于ECMWF模式预报，但在强降水预报中客观预报的性能不及ECMWF，表明对于强降水预报的释用还存在较大困难。

关键词: 凝聚层次聚类, 智能网格预报, 降水预报检验, 相似度度量方法, 综合分析

CLC Number:

P456

QIAO Jinrong, YUAN Xinpeng, LIANG Xudong, XIE Yanxin. Application of agglomerative hierarchical clustering method in precipitation forecast assessment[J]. Journal of Arid Meteorology, 2022, 40(4): 690-699.

乔锦荣, 原新鹏, 梁旭东, 谢衍新. 凝聚层次聚类方法在降水预报评估中的应用[J]. 干旱气象, 2022, 40(4): 690-699.

Figures/Tables 11

Tab.1

省(市、区)	3 h内	24 h内
西藏、新疆、青海、宁夏	10	25
其他省份	20	50

Fig.1 Schematic diagram of hierarchical clustering method

Fig.2 Schematic diagram of three similarity measurement methods for hierarchical clustering

Fig.3 Hierarchical clustering structure and heat map for grid forecast samples based on Ward similarity method

Tab.2 Number of 3-hour forecast effect for 34 samples better than ECMWF product

检验指标	23：00	02：00	05：00	08：00	11：00	14：00	17：00	20：00
PC	24	23	24	25	30	31	28	26
MRE	28	30	30	31	32	31	31	29
TS	6	13	12	2	7	5	9	8
B	0	0	0	0	0	0	0	0
Bias	15	14	16	17	11	17	22	21

Fig.4 The difference of 3-hour rain probability percentage correct (a)， MRE (b)， TS score (c) and bias amplitude (d) between 34 forecast samples and ECMWF product and Bias score (e)

Tab.3 Provinces (cities and districts) for L1111 or L1112

L₁₁₁₁						L₁₁₁₂
华北	东北	华中	华东	西北		华东	华中
内蒙古天津河北山西	辽宁	湖北	山东	陕西宁夏新疆		浙江江苏安徽江西福建	河南

Fig.5 The change of 3-hour verification indexes for forecast samples with time in different regions (a) PC， (b) TS， (c) B， (d) MRE

Fig.6 Hierarchical clustering structure and heat map for 24 h test indexes

Tab.4 The correlation coefficients of clustering results between different similarity measure ment methods

方法	Single	Complete	Average	Ward
Single	1	0.76	0.96	0.44
Complete		1	0.79	0.60
Average			1	0.59
Ward				1

Fig.7 Comparison of clustering effect between Single (a)， Complete (b) and Ward methods

References 30

[1]	郝莹, 姚叶青, 郑媛媛, 等. 短时强降水的多尺度分析及临近预警[J]. 气象, 2012, 38(8):903-912.
[2]	俞小鼎. 短时强降水临近预报的思路与方法[J]. 暴雨灾害, 2013, 32(3):202-209.
[3]	常煜, 樊斌, 张小东. 内蒙古夏季不同气候区短时强降水检验及制定[J]. 气象科学, 2018, 38(2):229-236.
[4]	金荣花, 代刊, 赵瑞霞, 等. 我国无缝隙精细化网格天气预报技术进展与挑战[J]. 气象, 2019, 45(4):445-457.
[5]	MITTERMAIER M P, ROBERTS N M. Intercomparison of spatial forecast verification methods: identifying skillful spatial scales using the fractions skill score[J]. Weather and Forecasting, 2010, 25(1):343-354.
[6]	CHAKRABORTY A. The skill of ECMWF medium-range forecasts during the year of tropical convection 2008[J]. Monthly Weather Review, 2010, 138(10):3787-3805.
[7]	BROWNLEE K A. Statistical theory and methodology in science and engineering[M]. Second Edition. New York: John Wiley & Sons Inc., 1965.
[8]	WILKS D S. Statistical methods in the atmospheric sciences[M]. Third Edition. New York: Elsevier Academic Press, 2011.
[9]	SCHAEFER J T. The critical success index as an indicator of warning skill[J]. Weather and Forecasting, 1990, 5(4):570-575.
[10]	BRILL K F, MESINGER F. Applying a general analytic method for assessing bias sensitivity to bias-adjusted threat and equitable threat scores[J]. Weather and Forecasting, 2009, 24(6):1748-1754.
[11]	TEWELES S Jr, WOBUS H B. Verification of prognostic charts[J]. Bulletin of the American Meteorological Society, 1954, 35(10):455-463.
[12]	MIYAKODA K, HEMBREE G D, STRICKLER R F, et al. Cumulative results of extended forecast experiments I. model performance for winter cases[J]. Monthly Weather Review, 1972, 100(12):836-855.
[13]	VENUGOPAL V, BASU S, FOUFOULA-GEORGIOU E. A new metric for comparing precipitation patterns with an application to ensemble forecasts[J]. Journal of Geophysical Research: Atmospheres, 2005, 110, D08111, DOI:10.1029/2004JD005395.
[14]	ROBERT N M, LEAN H W. Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events[J]. Monthly Weather Review, 2008, 136(1):78-97.
[15]	BRIER G W. Verification of forecasts expressed in terms of probability[J]. Monthly Weather Review, 1950, 78(1):1-3.
[16]	EPSTEIN E S. A scoring system for probability forecasts of ranked categories[J]. Journal of Applied Meteorology, 1969, 8(6):985-987.
[17]	WILSON L J, BURROWS W R, LANZINGER A. A strategy for verification of weather element forecasts from an ensemble prediction system[J]. Monthly Weather Review, 1999, 127(6):956-970.
[18]	代刊, 朱跃建, 毕宝贵. 集合模式定量降水预报的统计后处理技术研究综述[J]. 气象学报, 2018, 76(4):493-510.
[19]	EDWARDS A W, CAVALLI-SFORZA L L. A method for cluster analysis[J]. Biometrics, 1965, 21(2):362-375.
[20]	苟浩锋. 基于聚类分析的兰州地区自动站降水特征分析[J]. 沙漠与绿洲气象, 2020, 14(1):108-114.
[21]	李海林, 张丽萍. 时间序列数据挖掘中的聚类研究综述[J]. 电子科技大学学报, 2022, 51(3):416-424.
[22]	田时中, 瞿振鑫. 基于DPSIR的长三角大气污染治理效果评估及影响因素研究[J]. 国土资源科技管理, 2022, 39(3):66-83.
[23]	RATTO G, BERRI G J, MARONNA R. On the application of hierarchical cluster analysis for synthesizing low‐level wind fields obtained with a mesoscale boundary layer model[J]. Meteorological Applications, 2014, 21(3):708-716.
[24]	石晓雪, 龚道溢, 董雪晨. 1979—2018年冬季京津冀区域大风日环流型的聚类分析[J]. 北京师范大学学报(自然科学版), 2020, 56(5):710-718.
[25]	周晓旭. 基于层次聚类的LSTM神经网络模型在江苏省降水量预测中的应用[D]. 济南: 山东大学, 2020.
[26]	BADR H S, ZAITCHIK B F, DEZFULI A K. A tool for hierarchical climate regionalization[J]. Earth Science Informatics, 2015, 8(4):949-958.
[27]	VOORHEES E M. Implementing agglomerative hierarchic clustering algorithms for use in document retrieval[J]. Information Processing and Management, 1986, 22(6):465-476.
[28]	WARD J H Jr. Hierarchical grouping to optimize an objective function[J]. Journal of the American Statistical Association, 1963, 58(301):236-244.
[29]	DANESHVAR M R M, BAGHERZADEH A, ALIJANI B. Application of multivariate approach in agrometeorological suitability zonation at northeast semiarid plains of Iran[J]. Theoretical and Applied Climatology, 2013, 114(1/2):139-152.
[30]	胡雷芳. 五种常用系统聚类分析方法及其比较[J]. 浙江统计, 2007(4):11-13.