Summary: | Machine learning techniques are attractive tools to establish statistical models with a high degree of non linearity. They require a large amount of data to be trained and are therefore particularly suited to analysing remote sensing data. This work is an attempt at using advanced statistical methods of machine learning to predict the bias between Sea Surface Temperature (SST) derived from infrared remote sensing and ground “truth” from drifting buoy measurements. A large dataset of collocation between satellite SST and in situ SST is explored. Four regression models are used: Simple multi-linear regression, Least Square Shrinkage and Selection Operator (LASSO), Generalised Additive Model (GAM) and random forest. In the case of geostationary satellites for which a large number of collocations is available, results show that the random forest model is the best model to predict the systematic errors and it is computationally fast, making it a good candidate for operational processing. It is able to explain nearly 31% of the total variance of the bias (in comparison to about 24% for the multi-linear regression model).
|