Automatic prediction of village-wise soil fertility for several nutrients in India using a wide range of regression methods

In low quality soils, as in the Indian state of Maharashtra, a sustainable land management practice is very important to enhance the soil quality and to maintain proper values for several nutrients that are relevant for an optimal crop yield. The evaluation of a soil fertility index for these nutrients and for each geographical place allows to create maps of village-wise fertility indices which are very useful for fertility management. An automatic prediction of such fertility indices would be very important to reduce the amount of chemical measurements of nutrients to be performed in different cultivation lands. The current study develops the prediction of fertility indices for soil organic carbon and four important soil nutrients (phosphorus pentoxide, iron, manganese and zinc) using almost all the available regression methods, specifically a collection of 76 regressors which belong to 20 families, including neural networks, deep learning, support vector regression, random forests, bagging and boosting, lasso and ridge regression, Bayesian models and more. The best results are achieved by the extremely randomized regression trees (extraTrees), with which achieve an acceptable prediction accuracy (average squared correlations between 0.57 and 0.70), being also relatively fast. Other regressors with high performance are random forests and regularized random forest, generalized boosting regression model and epsilon-support vector regression.

keywords: Extremely randomized regression trees, Indian agriculture, Machine learning, Regression, Soil fertility index