中国农业气象

• 论文 • 上一篇    下一篇

点源时间序列数据缺失值的估值方法比较——以小流域气象和水文数据为例

甘蕾,周脚根,石锦,李希,沈健林,吕殿青,李裕元,吴金水   

  1. 1.湖南师范大学资源与环境科学学院,长沙 410081;2.中国科学院亚热带农业生态研究所亚热带农业生态过程重点实验室,长沙 410125;3.湖南农业大学工学院,长沙 410128
  • 收稿日期:2017-07-13 出版日期:2018-03-20 发布日期:2018-03-23
  • 作者简介:甘蕾(1992-),女,硕士生,主要从事水文生态与环境研究。E-mail:805150477@qq.com
  • 基金资助:

    国家科技支撑计划项目(2014BAD14B02);水利部公益性行业科研专项经费项目(201501055);湖南省地理学重点学科建设项目(20110101)

Performance Comparison of Different Interpolation Methods on Missing Values for Time Series Data——A Case Study of Meteorological and Hydrological Data in Subtropical Small Watershed

GAN Lei, ZHOU Jiao-gen, SHI Jin, LI Xi, SHEN Jian-lin, LV Dian-qing, LI Yu-yuan,WU Jin-shui   

  1. 1. College of Resources and Environmental Sciences, Hunan Normal University, Changsha 410081, China; 2. Key Laboratory of Agro- ecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125; 3. College of Engineering, Hunan Agricultural University, Changsha 410128
  • Received:2017-07-13 Online:2018-03-20 Published:2018-03-23

摘要:

对点源时间序列数据缺失值进行有效估值能提升其数据质量。为探究不同估值方法对点源时间序列数据缺失值的估值效果及其影响因素,以亚热带典型小流域长期定位观测的每日气象和水文数据(最高气温、最低气温、太阳辐射量、降雨量及地表径流量)为例,以均方根误差(RMSE)、绝对平均误差(MAE)和Pearson相关系数(r)为性能验证指标,比较了线性内插法(LIM)、K-最近邻插值法(KNNM)、样条插值法(SIM)、多项式插值法(PIM)和核密度估值法(KDEM)5种估值方法的估值性能差异及其主要影响因素。结果表明:(1)LIM、SIM和KDEM的估值性能总体上优于其它2种方法;(2)5种估值方法对气象数据(最高气温、最低气温和太阳辐射量)缺失值估值的RMSE为1.81~6.35,MAE为1.30~4.20,r为0.70~0.98(P<0.05),而对水文数据(降雨量和地表径流量)缺失值估值的RMSE为12.54~26.28,MAE为3.60~14.21,r为0.07~0.72。可见,各估值方法对气象数据的估值性能强于对水文数据;(3)上述数据集的变异系数(CV)与估值评估指标(RMSE、MAE及r)线性相关(P<0.05),是影响估值性能的重要因素。

关键词: 缺失值, 估值方法, 变异系数, 时间序列

Abstract:

The effective estimation of the missing values of time series data at the scale of point process could improve its data quality. The meteorological and hydrological data sets (daily maximum air temperature, daily minimum air temperature, daily solar radiation, daily rainfall and daily stream flow) were collected through the long-term field experiments in a typically small subtropical watershed in subtropical zone. The performance differences within five interpolation methods of linear interpolation method(LIM), K-Nearest neighbor interpolation method(KNNM), spline interpolation method(SIM), polynomial interpolation method(PIM) and kernel density estimation method(KDEM) were analyzed on the above-mentioned five data sets. The root mean square error(RMSE), absolute mean error(MAE) and Pearson correlation coefficient(r) were selected to evaluate the advantages and disadvantages of the five methods. The results showed that: (1) The estimation performance of LIM, SIM and KDEM was generally superior to the other two methods. (2) The estimation of the missing values of meteorological data (maximum temperature, minimum temperature and solar radiation) produced the varying values of the three evaluation indices with RMSE values of 1.81-6.35, MAE values of 1.30-4.20 and r values of 0.70-0.98 (P<0.05), respectively. In contrast, the estimation of missing values of hydrological data (rainfall and stream flow) had relatively high values of RMSE and MAE which were 12.51-26.28 and 3.60-14.21, respectively, and low values of r (0.07-0.72). So the above-mentioned interpolation methods generally produced better estimation of missing values of meteorological data sets than those of hydrological data. (3) Additionally, the coefficient of variation (CV) of the above data sets linearly correlated with the evaluation indices (RMSE, MAE and r) (P<0.05), and played an important role in affecting the valuation performance of the above-mentioned interpolation methods.

Key words: Missing values, Interpolation methods, Coefficient of variance, Time series