A $1.4 Billion Fertilizer Manufacturer
Indian Government agriculture departments release data for acreage of individual crops across all states and districts of the country. Rainfall data and reservoir levels are also released by other Government departments. The quality and consistency of such datasets has been questioned. While it is common knowledge that fertilizer demand varies predictably with rain fall, few examples of large scale application of predictive analytics to this domain exist in India.
Typically all Government data is in MS-Word / PDF documents with non-uniform syntax. There are thousands of files, each having tens of thousands of data elements. There is also a strong sense of locality and crop-dependence in this domain. Data blending, cleansing and then application of models is needed, for any structured decision-support from this wealth of data.
Business managers like to drill down & visualize the data, and also do scenario analysis of the form "If rainfall in district x is N cm in next season, how many tons of product K should we ship?"
A single, central data mart is designed. Data is automatically brought into a staging area from the Excel files, taking care of format variations. Tools are built for automatic verification of data quality and integrity. Then crop-wise, region-wise data is extracted into sub tables and cleansed.
These data sets are used to build crop-wise, region-wise acreage prediction models, using sophisticated time series, dynamic regression and other methods. Acreage & crop sowing practices in turn determine expected fertilizer consumption. A software tool within which ensemble models are integrated is delivered.
Tableau, R-Shiny, GG-Plot, OCR, MySQL stored procedures.
End-to-end system successfully built and delivered.