中国农业气象 ›› 2021, Vol. 42 ›› Issue (04): 330-343.doi: 10.3969/j.issn.1000-6362.2021.04.007

• 农业气象信息技术 栏目 • 上一篇    

森林生态系统涡度相关法碳通量长时间连续性缺失数据插补方法的比较

周宇,黄辉,张劲松,孟平,孙守家   

  1. 1.中国林业科学研究院林业研究所,北京 100091;2.国家林业和草原局林木培育重点实验室,北京 100091;3.南京林业大学南方现代林业协同创新中心,南京 210037
  • 收稿日期:2020-09-21 出版日期:2021-04-20 发布日期:2021-04-15
  • 通讯作者: 张劲松,研究员,研究方向为林业气象,E-mail: zhangjs@caf.ac.cn E-mail:zhangjs@caf.ac.cn
  • 作者简介:周宇,E-mail: zhouyucaf@126.com
  • 基金资助:
    中央级公益性科研院所基本科研业务费专项资金项目(CAFYBB2018ZA001;CAFYBB2017ZX002)

Comparison of Gap-filling Methods for Long-term Continuous Missing Data in Carbon Flux Observation by Eddy Covariance Method of Forest Ecosystem

ZHOU Yu, HUANG Hui, ZHANG Jin-song, MENG Ping, SUN Shou-jia   

  1. 1. Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China; 2. Key Laboratory of Tree Breeding and Cultivation, National Forestry and Grassland Administration, Beijing 100091; 3. Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037
  • Received:2020-09-21 Online:2021-04-20 Published:2021-04-15

摘要: 针对森林通量观测站涡度相关法碳通量观测普遍存在的长时间连续性数据缺失情景,为探究不同数据插补方法的有效性,以华北低丘山地栓皮栎人工混交林生态系统为例,以经EddyPro处理和质量控制的2017年3月1日-11月30日0.5h尺度净生态系统碳交换(NEE)数据为基准数据集,随机生成含有连续1、3、7、15和31d数据缺失的5类数据缺失集,重复10次,使用固定窗口平均昼夜变化法(MDV)、可变窗口平均昼夜变化法(MDC)、查表法(LUT)、非线性回归法(NLR)、边际分布采样法(MDS)、人工神经网络法(ANN)对缺失数据集进行插补,并将插补数据与实际观测数据进行对比,通过分析统计参数来评估不同方法的插补精度和稳定性,以评估不同方法的适用范围。结果表明:日间,当连续缺失少于15d时,ANN方法插补数据与实测数据间的R2(决定系数)相对较高,NLR方法的R2较低;LUT方法插补数据与实测数据间的相对均方根误差(RRMSE)较低,NLR方法的RRMSE较高。当缺失达到连续15d时,除NLR方法的R2显著较低(P<0.05)外,其它方法间R2差异不显著;LUT方法的RRMSE显著(P<0.05)较低,其它方法间RRMSE差异不显著。当缺失达到连续31d时,除NLR方法R2显著较低(P<0.05)外,各方法间R2和RRMSE无显著差异;MDV方法的平均绝对误差(MAE)出现较多异常值,各方法间的MAE开始出现分化的趋势。随着缺失片段长度的增加,除MDV方法外,各方法的R2呈下降趋势,连续1d缺失与连续31d缺失情景下插补所得NEE与实测NEE的R2差异显著(P<0.05);MDV和MDS方法的RRMSE呈增大趋势,连续1d缺失与连续31d缺失情景下的RRMSE差异显著(P<0.05),其它方法的RRMSE差异相对不显著。夜间,在各缺失情景下,ANN方法的R2较高,LUT方法的R2较低,二者之间差异显著(P<0.05);LUT方法的RRMSE最高,与其它方法存在显著差异(P<0.05)。在连续缺失大于31d的情景下,各方法的RRMSE差异均不显著。除LUT方法MAE显著(P<0.05)较高外,其它方法的MAE无明显差异。随着缺失片段长度的增加,MDC、MDS和ANN方法插补数据的R2呈下降趋势,MDV和LUT的R2始终无显著差异;各方法的RRMSE差异无显著变化。在对典型晴天0.5h尺度上NEE日变化趋势的还原方面,MDC方法性能相对较优。综上,NLR方法适用于气象数据完备、NEE数据连续缺失少于7d的情景;MDV或MDC方法适用于气象数据不可用或缺失严重、NEE数据连续缺失少于15d的情景;LUT和MDS方法则适用于气象数据缺失较少、NEE数据连续缺失少于15d的情景;ANN方法适用性相对较广,可用于气象数据缺失较少、NEE数据连续缺失长达31d的情景。

关键词: 涡度相关, 数据插补, 净生态系统碳交换, 固定窗口平均昼夜变化法(MDV), 可变窗口平均昼夜变化法(MDC), 查表法(LUT), 非线性回归法(NLR), 边际分布采样法(MDS), 人工神经网络法(ANN)

Abstract: There are often 20% to 65% data-missing in annual carbon flux observed by the eddy covariance method in the mountainous forest ecosystem, and there may also be continuous data-missing for a long period, as long as half a month, or even a month. To obtain complete and reliable flux data, reasonable imputation methods need to be adopted to impute the missing data. To explore the validity and performance of different gap-filling methods, five types of data-missing sets were generated with consequent 1 day, 3 days, 7 days, 15 days, 31 days data missing randomly and repeated 10 times, using the half-hourly NEE(Net Ecosystem Exchange) data in March 1st-November 30th, 2017 of a mixed Quercus variabilis plantation ecosystem in North China low-hills regions calculated by EddyPro as a benchmark dataset, then Mean Diurnal Variation with fixed window(MDV), Mean Diurnal Variation with variable window(MDC), Look-Up Table(LUT), Non-Linear Regression(NLR), Marginal Distribution Sampling(MDS), and Artificial Neural Network(ANN) were used to interpolate the artificial sets. By comparing the imputed data with the actual observed data, the interpolation accuracy, stability and scope of each method were evaluated through statistical parameters. The results indicated that the effect of interpolation at daytime was significantly better than that at night. During the daytime, when the consecutive missing was less than 15 days, the R2(coefficient of determination) between the interpolated NEE and the observed NEE of ANN was relatively higher, and that of NLR was lower, the Relative Root Mean Square Error(RRMSE) between the interpolated NEE and the observed NEE of LUT was lower, and that of NLR was higher. When the deletion reached 15 consecutive days, except for the significantly lower R2 of NLR(P<0.05), the difference of R2 among other methods was not significant; the RRMSE of LUT was significantly lower (P<0.05), and the difference of RRMSE between other methods was not significant. When the deletion reached 31 consecutive days, except for the significantly lower R2 of NLR(P<0.05), there was no significant difference in R2 and RRMSE among the methods. The Mean Absolute Error(MAE) of MDV had more outliers, and the MAE between the methods began to differentiate trend. As the length of the missing fragments increased, except for MDV, the R2 of other methods showed a downward trend and there was a significant difference between the consecutive 1d-data-missing and 31d-data-missing scenarios(P<0.05). Moreover, the RRMSE of MDV and MDS showed an increasing trend and there was a significant difference in RRMSE between the continuous 1d-data-missing and continuous 31d-data-missing scenarios(P<0.05), while the difference of RRMSE of other methods was relatively insignificant. At night, in each data-missing scenario, the R2 of ANN was higher, and that of LUT was lower, with a significant difference(P<0.05); the RRMSE of LUT was the highest, which was significantly different from other methods(P<0.05). In the scenario where the deletion was greater than 31 days, the difference of RRMSE of each method was not significant; except for LUT which had a significantly higher MAE(P<0.05), there was no significant difference in the MAE of other methods. As the length of the missing fragment increased, the R2 of MDC, MDS and ANN showed a downward trend, and there was always no significant difference in R2 between MDV and LUT; moreover, there was no significant change in the RRMSE difference of each method. The performance of the MDC method was relatively optimal in terms of restoring the daily change trend of NEE on the 0.5h scale of a typical sunny day. Due to the difference in interpolation strategies, the effects of different gap-filling methods were different. ANN generally worked well, while the NLR performed relatively poorly; LUT performed significantly better during the day than at night, with an underestimation of NEE at night. There was no significant difference between MDV, MDC and MDS. What’s more, the imputation effects of different gap-filling methods were related to the duration of continuous data missing. In conclusion, NLR is suitable for scenarios where weather data is complete and NEE data is missing for less than 7 days. MDV and MDC are suitable for weather data that is unavailable or missing severely, and NEE data is missing for less than 15 days, but MDC is preferred. LUT and MDS are suitable for weather scenarios where there are fewer data missing and NEE data missing continuously for less than 15 days. ANN has relatively wide applicability and can be used in scenarios where there are fewer meteorological data missing and NEE data missing continuously for up to 31 days. In addition to site factors, differences in time steps and window sizes selected by different gap-filling methods will also affect the result of the imputation of missing flux data, which in turn affects the applicability of each gap-filling method. As this study only considered a single site with one-year data of carbon flux, except winter, the actual missing distribution was ignored when constructing the artificial data-missing sets and the selected gap-filling methods had different time steps and window sizes, the result may not be applicable to all sites, but it can provide a reference for the selection of gap-filling methods for other sites. At the same time, the carbon flux data obtained by the above methods may be quite different from the actual, significantly overestimated, if the data-missing was caused by the influence of abnormal weather such as precipitation and dew, especially MDV and MDC which not considering meteorological factors. To accurately estimate this part of carbon flux, a better way may be combining the open-path eddy covariance observation system with the closed-path eddy covariance observation system to find out a corresponding data correction method.

Key words: Eddy covariance, Gap-filling, Net ecosystem carbon exchange, Mean Diurnal Variation(MDV) , Look-Up Table(LUT) , Non-Linear Regression(NLR) , Marginal Distribution Sampling(MDS) , Artificial Neural Network(ANN)