中国农业气象 ›› 2022, Vol. 43 ›› Issue (03): 229-239.doi: 10.3969/j.issn.1000-6362.2022.03.006

• 农业气象信息技术 栏目 • 上一篇    下一篇

一种基于特征曲线的自动土壤水分观测数据异常值检测方法

周笑天,陈益玲,李芸,李长军,张平,张茜茹   

  1. 山东省气象防灾减灾重点实验室/山东省气象信息中心,济南 250031
  • 收稿日期:2021-05-24 出版日期:2022-03-20 发布日期:2022-03-22
  • 通讯作者: 陈益玲,高级工程师,研究方向为气象数据质量控制与气象档案管理 E-mail:lotushumor@126.com
  • 作者简介:周笑天,E-mail:xtzhou1981@163.com
  • 基金资助:
    山东省发展和改革委员会“山东现代农业气象服务保障工程”[鲁发改农经(2017)97号]

An Outliers Detection Method for Automatic Soil Moisture Observation Data Based on Characteristic Curve

ZHOU Xiao-tian, CHEN Yi-ling, LI Yun, LI Chang-jun, ZHANG Ping, ZHANG Qian-ru   

  1. Key Laboratory for Meteorological Disaster Prevention and Mitigation of Shandong/Shandong Meteorological Information Centre, Jinan 250031, China
  • Received:2021-05-24 Online:2022-03-20 Published:2022-03-22

摘要: 以土壤水分时间序列特征提取和形态匹配为基本操作,提出了一种基于特征曲线的自动土壤水分观测数据异常值检测新方法。首先确定检测序列X和模板序列Y的长度和范围,利用经验模态分解(EMD)方法对序列X和Y进行分解,分别获得特征重构序列C和序列Q,然后利用动态时间归整(DTW)算法对重构序列做匹配对齐操作,分别形成序列C'和Q',通过序列C'和Q'计算获得变异序列D',并将序列D'中变异系数超过门限值threshold的异常元素或异常片段标记出来,最终实现检测序列X中异常点的定位。运行实例表明:(1)检测方法无需引入土壤物理常数和气象条件等外在影响因素,避免了土壤水分计算过程中加入高低阈值、变化率阈值等相关参数。(2)方法使用同一站点相同深度的土壤水分连续数据,无需多站数据对比,且对于检测序列X和模板序列Y没有严格的长度一致性要求,因而计算更加灵活,适用性较强。(3)方法流程清晰,输入和输出对象简单明确,较为适合进行计算机编程开发和业务化运行部署。

关键词: 特征曲线, 土壤水分, EMD, DTW, 异常点

Abstract: A new outliers detection method for automatic soil moisture observation data based on characteristic curve is proposed. The main and basic idea of this method was feature extraction and the morphological matching between two soil moisture time series, and the detailed operation processes were as follows: firstly, the method took X as the expected checking time series and took Y as the corrected template time series, and also gave the range and elements of these two series. Secondly, the method decomposed series X and Y by empirical mode decomposition (EMD) method to obtain the recomposition series C and Q respectively. In this process, series C was the total accumulation of IMFs of series X and series Q was the total accumulation of IMFs of series Y. Thirdly, the method obtained series C' and Q' by using dynamic time warping (DTW) algorithm which was designed to align series C and Q. Fourthly, the method obtained the variation series D' whose elements were calculated by the variation coefficient between series C' and Q', and then, the method also traversed each element of series D' and marked the elements whose value was greater than threshold as overruns. The threshold was obtained by comprehensive calculating the standard deviation of series X and Y. Finally, the outliers in the checking series X could be found through the mapping relationships between series X and series D'. The example showed that: (1) the method did not need to introduce external factors such as soil physical constants and meteorological conditions, and avoided adding relevant parameters such as high and low boundary and slope in the calculation process. (2) The method used the continuous soil moisture data of the same depth from the same station instead of multi-station data comparison, and had no strict length consistency requirements for series X and series Y, so the calculation was more flexible and applicable. (3) The routine of the method was clear, and all of the input processes and output processes in this method were specific. The method was suitable for computer programming and business operation. The technical route of this method might be provided for other agrometeorological data quality control research.

Key words: Characteristic curve, Soil moisture, EMD, DTW, Outliers