Application of machine learning to agricultural soil data

Agriculture is a major sector in the Indian economy. One key advantage of classification and prediction of soil parameters is to save time of specialized technicians developing expensive chemical analysis. In this context, this PhD. Thesis has been developed in three stages: 1) Classification for soil data: we used chemical soil measurements to classify many relevant soil parameters: village-wise fertility indices; soil pH and type; soil nutrients, in order to recommend suitable amounts of fertilizers; and preferable crop. 2) Regression for generic data: we developed an experimental comparison of many regressors to a large collection of generic datasets selected from the University of California at Irving (UCI) machine learning repository. 3) Regression for soil data: we applied the regressors used in stage 2 to the soil datasets, developing a direct prediction of their numeric values. The accuracy of the prediction was evaluated for the ten soil problems, as an alternative to the prediction of the quantified values (classification) developed in stage 1.

keywords: Indian, agriculture, soil type, machine learning, random forest, soil fertility index, fertilizer recommendation, regression, UCI machine learning repository, cubist, extreme learning machine, extremely randomized regression tree