Chinese Journal of Agrometeorology ›› 2021, Vol. 42 ›› Issue (04): 330-343.doi: 10.3969/j.issn.1000-6362.2021.04.007

Previous Articles    

Comparison of Gap-filling Methods for Long-term Continuous Missing Data in Carbon Flux Observation by Eddy Covariance Method of Forest Ecosystem

ZHOU Yu, HUANG Hui, ZHANG Jin-song, MENG Ping, SUN Shou-jia   

  1. 1. Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China; 2. Key Laboratory of Tree Breeding and Cultivation, National Forestry and Grassland Administration, Beijing 100091; 3. Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037
  • Received:2020-09-21 Online:2021-04-20 Published:2021-04-15

Abstract: There are often 20% to 65% data-missing in annual carbon flux observed by the eddy covariance method in the mountainous forest ecosystem, and there may also be continuous data-missing for a long period, as long as half a month, or even a month. To obtain complete and reliable flux data, reasonable imputation methods need to be adopted to impute the missing data. To explore the validity and performance of different gap-filling methods, five types of data-missing sets were generated with consequent 1 day, 3 days, 7 days, 15 days, 31 days data missing randomly and repeated 10 times, using the half-hourly NEE(Net Ecosystem Exchange) data in March 1st-November 30th, 2017 of a mixed Quercus variabilis plantation ecosystem in North China low-hills regions calculated by EddyPro as a benchmark dataset, then Mean Diurnal Variation with fixed window(MDV), Mean Diurnal Variation with variable window(MDC), Look-Up Table(LUT), Non-Linear Regression(NLR), Marginal Distribution Sampling(MDS), and Artificial Neural Network(ANN) were used to interpolate the artificial sets. By comparing the imputed data with the actual observed data, the interpolation accuracy, stability and scope of each method were evaluated through statistical parameters. The results indicated that the effect of interpolation at daytime was significantly better than that at night. During the daytime, when the consecutive missing was less than 15 days, the R2(coefficient of determination) between the interpolated NEE and the observed NEE of ANN was relatively higher, and that of NLR was lower, the Relative Root Mean Square Error(RRMSE) between the interpolated NEE and the observed NEE of LUT was lower, and that of NLR was higher. When the deletion reached 15 consecutive days, except for the significantly lower R2 of NLR(P<0.05), the difference of R2 among other methods was not significant; the RRMSE of LUT was significantly lower (P<0.05), and the difference of RRMSE between other methods was not significant. When the deletion reached 31 consecutive days, except for the significantly lower R2 of NLR(P<0.05), there was no significant difference in R2 and RRMSE among the methods. The Mean Absolute Error(MAE) of MDV had more outliers, and the MAE between the methods began to differentiate trend. As the length of the missing fragments increased, except for MDV, the R2 of other methods showed a downward trend and there was a significant difference between the consecutive 1d-data-missing and 31d-data-missing scenarios(P<0.05). Moreover, the RRMSE of MDV and MDS showed an increasing trend and there was a significant difference in RRMSE between the continuous 1d-data-missing and continuous 31d-data-missing scenarios(P<0.05), while the difference of RRMSE of other methods was relatively insignificant. At night, in each data-missing scenario, the R2 of ANN was higher, and that of LUT was lower, with a significant difference(P<0.05); the RRMSE of LUT was the highest, which was significantly different from other methods(P<0.05). In the scenario where the deletion was greater than 31 days, the difference of RRMSE of each method was not significant; except for LUT which had a significantly higher MAE(P<0.05), there was no significant difference in the MAE of other methods. As the length of the missing fragment increased, the R2 of MDC, MDS and ANN showed a downward trend, and there was always no significant difference in R2 between MDV and LUT; moreover, there was no significant change in the RRMSE difference of each method. The performance of the MDC method was relatively optimal in terms of restoring the daily change trend of NEE on the 0.5h scale of a typical sunny day. Due to the difference in interpolation strategies, the effects of different gap-filling methods were different. ANN generally worked well, while the NLR performed relatively poorly; LUT performed significantly better during the day than at night, with an underestimation of NEE at night. There was no significant difference between MDV, MDC and MDS. What’s more, the imputation effects of different gap-filling methods were related to the duration of continuous data missing. In conclusion, NLR is suitable for scenarios where weather data is complete and NEE data is missing for less than 7 days. MDV and MDC are suitable for weather data that is unavailable or missing severely, and NEE data is missing for less than 15 days, but MDC is preferred. LUT and MDS are suitable for weather scenarios where there are fewer data missing and NEE data missing continuously for less than 15 days. ANN has relatively wide applicability and can be used in scenarios where there are fewer meteorological data missing and NEE data missing continuously for up to 31 days. In addition to site factors, differences in time steps and window sizes selected by different gap-filling methods will also affect the result of the imputation of missing flux data, which in turn affects the applicability of each gap-filling method. As this study only considered a single site with one-year data of carbon flux, except winter, the actual missing distribution was ignored when constructing the artificial data-missing sets and the selected gap-filling methods had different time steps and window sizes, the result may not be applicable to all sites, but it can provide a reference for the selection of gap-filling methods for other sites. At the same time, the carbon flux data obtained by the above methods may be quite different from the actual, significantly overestimated, if the data-missing was caused by the influence of abnormal weather such as precipitation and dew, especially MDV and MDC which not considering meteorological factors. To accurately estimate this part of carbon flux, a better way may be combining the open-path eddy covariance observation system with the closed-path eddy covariance observation system to find out a corresponding data correction method.

Key words: Eddy covariance, Gap-filling, Net ecosystem carbon exchange, Mean Diurnal Variation(MDV) , Look-Up Table(LUT) , Non-Linear Regression(NLR) , Marginal Distribution Sampling(MDS) , Artificial Neural Network(ANN)