RuEn

Journal section "Regional economics"

Clustering Russian Federation Regions According to the Level of Socio-Economic Development with the Use of Machine Learning Methods

Ketova K.V., Kasatkina E.V., Vavilova D.D.

Volume 14, Issue 6, 2021

Ketova K.V., Kasatkina E.V., Vavilova D.D. Clustering Russian Federation regions according to the level of socio-economic development with the use of machine learning methods. Economic and Social Changes: Facts, Trends, Forecast, 2021, vol. 14, no. 6, pp. 70–85. DOI: 10.15838/esc.2021.6.78.4

DOI: 10.15838/esc.2021.6.78.4

Abstract   |   Authors   |   References
The paper solves the problem of clustering Russian Federation regions according to their socio-economic development, taking into account the sectoral structure of the gross regional product. Classical machine learning methods are a tool for solving the clustering problem. The object of the study is the differentiation of regions according to various socio-economic indicators. The subject of the study is the practice of using machine learning methods for clustering objects. The initial database for solving the problem of clustering regions includes actual statistical data on socio-economic development of RF constituent entities and the sectoral structure of their gross regional product as of 2019. We identify clusters of regions according to their socio-economic development with the use of modern machine learning methods implemented in Python, a high-level programming language, with the connection of libraries for working with data: Pandas, Sklearn, SciPy, etc. The preprocessing of the initial data was carried out: digitization of data categories, transition to specific values, standardization of indicators. The initial data set for 2019 contains 5,525 records on 65 indicators of socio-economic development for 85 regions of the Russian Federation. It identifies 15 basic indicators of socio-economic development of a region, based on the principal component analysis. According to these indicators, five regional clusters were identified with the use of the k-means clustering: the first cluster is characterized by a high share of wholesale and retail trade, real estate transactions, professional, scientific and technological activities in the GRP structure; the second cluster specializes in manufacturing, wholesale and retail trade, real estate transactions, agriculture and forestry; the third cluster can be described as a cluster with a mixed economy, which is characterized by averages for the main socio-economic indicators in the Russian Federation; regions of the fourth cluster show a high level of unemployment and a high share of public administration, military and social security; the fifth cluster specializes in mining

Keywords

gross regional product, cluster analysis, socio-economic indicators, machine learning, industry structure, principal component analysis

View full article