Application of Data Mining in Tax Processes Improvement: A Literature Review and Classification

Document Type : Research Paper


1 Ph.D. Candidate, Department of Accounting, Faculty of Management and Accounting, Center Tehran Branch, Islamic Azad University, Tehran, Iran.

2 Assistant Prof., Department of Accounting, Faculty of Management and Accounting, Center Tehran Branch, Islamic Azad University, Tehran, Iran.

3 Assistant Prof., Department of IT Engineering, Faculty of Industrial and Systems Engineering, Tarbiat Modares University, Tehran, Iran.


Objective: Data mining is an effective tool to improve and enhance the efficiency and effectiveness of tax processes by extracting beneficial knowledge and insight from tax data. The purpose of this paper is to study the status of research pieces in the field, classify them, identify the research gaps and provide a roadmap for researchers and practitioners through a systematic literature review.
Methods: Due to the importance of the subject, this study focuses on studies conducted in the field of data mining and taxation from 2000 to 2021. It investigates their processes and practical domains. The reviewed studies were categorized based on the proposed framework and various analyses were presented in terms of processes, practical domains, and data mining techniques. Furthermore, the distribution of papers according to the year of their publication and also regarding the journal in which they were published were presented. Tax processes were divided into four groups i.e. submission, examination, collection, and taxpayer services. The defined practical domains included tax payment, tax refund, shell corporation identification, identification of non-filer taxpayers, risk-based tax audit selection, tax debt management, and tax comments analysis. The classification framework for data mining techniques in this study was defined as clustering, association analysis, classification and prediction, regression, time series, anomaly detection, and visualization.
Results: According to the obtained findings, most of the reviewed studies were assigned to the inspection process, 94 percent of which worked on the practical domain of “risk-based tax audit selection”. The most popular and widely used technique was "classification and prediction", while the three algorithms including neural network, decision tree, and support vector machine were widely used, compared to other algorithms.
Conclusion: Currently, tax administrations have huge databases and traditional methods and tools cannot analyze them due to the limited resources of organizations as well as the large amounts of available data. Data mining can have an effective performance on various tax processes and can be effective in making decisions and adopting appropriate approaches. There is good potential for the application of data mining techniques in all of the proposed practical domains. In the submission and collection processes, more research needs to be done. Some approaches including reinforcement learning, deep learning, graph analysis, and big data analytics are recommended for future research. Proposing practical frameworks for using data mining techniques in tax systems and tax administrations is also recommended. To the best of the author's knowledge, no study has been conducted to investigate the issue, while there is a definite need in this regard. Besides, one of the important issues, which needs to be addressed as the main gap in this field, is integrating the internal and external sources of data, which can improve effectiveness.


Ameur, F., & Tkiouat, M. (2012). Taxpayers fraudulent behavior modeling the use of datamining in fiscal fraud detecting Moroccan case. Applied Mathematics, 3 (10), 1207-1213.
Assylbekov, Z., Melnykov, I., Bekishev, R., Baltabayeva, A., Bissengaliyeva, D., & Mamlin, E. (2016). Detecting value-added tax evasion by business entities of Kazakhstan. In International Conference on Intelligent Decision Technologies, pp. 37-49.
Babajani, J., & Bagheri, B. (2017). Proposing a model for measuring tax compliance at the level of each taxpayer. Quarterly Journal of Tax Research, 34 (82), 11-45. (in Persian)
Baisalbayeva, K, van der Enden, E, Tenan, R., & Flores, R. (2018). The Data Intelligent Tax Administration: Meeting the challenges of Big Tax Data and Analytics. PwC & Microsoft.
Bots, P. W., & Lohman, F. A. (2003). Estimating the added value of data mining: A study for the Dutch Internal Revenue Service. International Journal of Technology, Policy and Management, 3 (3-4), 380-395.
Chen, J. H., Su, M. C., Chen, C. Y., Hsu, F. H., & Wu, C. C. (2011). Application of neural networks for detecting erroneous tax reports from construction companies. Automation in construction, 20 (7), 935-939.
Chen, Y. S., & Cheng, C. H. (2010). A Delphi-based rough sets fusion model for extracting payment rules of vehicle license tax in the government sector. Expert Systems with Applications, 37 (3), 2161-2174.
da Silva, L. S., Carvalho, R. N., & Souza, J. C. F. (2015). Predictive models on tax refund claims-essays of data mining in brazilian tax administration. In International Conference on Electronic Government and the Information Systems Perspective, pp. 220-228.
De Roux, D., Perez, B., Moreno, A., Villamil, M. D. P., & Figueroa, C. (2018). Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 215-222.
DeBarr, D., & Eyler-Walker, Z. (2006). Closing the gap: automated screening of tax returns to identify egregious tax shelters. ACM SIGKDD Explorations Newsletter, 8 (1), 11-16.
Didimo, W., Grilli, L., Liotta, G., Menconi, L., Montecchiani, F., & Pagliuca, D. (2020). Combining network visualization and data mining for tax risk assessment. IEEE Access, 8, 16073-16086.
Ding, N., Zhang, X., Zhai, Y., & Li, C. (2021). Risk assessment of VAT invoice crime levels of companies based on DFPSVM: a case study in China. Risk Management, 23 (1), 1-22.
Dohrmann, T., & Pinshaw, G. (2009). The road to improved compliance: A McKinsey benchmarking study of tax administrations: 2008-2009. McKinsey & Company, Washington, DC.
Ebrahimi, M., Vatanparast, M., Rezaei, F., & Mohammadi Nodeh, F. (2021). Investigating the Factors Affecting the Bias of Tax Auditors in Professional Judgments. Accounting and Auditing Review, 28 (2), 181-205. (in Persian)
Gallemore, J., & Labro, E. (2015). The importance of the internal information environment for tax avoidance. Journal of Accounting and Economics, 60 (1), 149-167.
Ghaderi, B., Kafami, M., & Karimi, F. (2018). The Effect of Financial and Non-Financial Firms Characteristics on Tax Gap. Journal of Accounting and Management Vision, 1(2), 1-16. (in Persian)
González, P. C., & Velásquez, J. D. (2013). Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Systems with Applications, 40 (5), 1427-1436.
González-Martel, C., Hernández, J. M., & Manrique-de-Lara-Peñate, C. (2021). Identifying business misreporting in VAT using network analysis. Decision Support Systems, 141.
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
Hsu, K. W., Pathak, N., Srivastava, J., Tschida, G., & Bjorklund, E. (2015). Data mining based tax audit selection: a case study of a pilot project at the minnesota department of revenue. In M. Abou-Nasr, S. Lessmann, R. Stahlbock & G. Weiss (Ed.), Real world data mining applications, pp. 221-245. Springer, Cham.
Ippolito, A., & Lozano, A. C. G. (2020). Tax Crime Prediction with Machine Learning: A Case Study in the Municipality of São Paulo. In Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020), pp. 452-459.
Jha, K., Singh, D., & Chaudhary, D. (2015). An Approach to Determine Non-filer's of Property Tax using Clustering and Classification. International Journal of Computer Applications, 111 (11), 15-18.
Jihal, H., Ounacer, S., Ardchir, S., & Azouazi, M. (2019). Clustering Model of False Positive Elimination in Moroccan Fiscal Fraud Detection. In International Conference on Advanced Intelligent Systems for Sustainable Development, pp. 125-130.
Johnson, S. J., & Murty, M. R. (2020). Detecting High Risk Property Taxpayers Using a New Business Intelligence Model: A Case of New York City Property Tax, PAIDEUMA JOURNAL OF RESEARCH, 4 (2), 145-153
Junqué de Fortuny, E., Stankova, M., Moeyersoms, J., Minnaert, B., Provost, F., & Martens, D. (2014). Corporate residence fraud detection. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1650-1659.
Jupri, M., & Sarno, R. (2018). Taxpayer compliance classification using C4. 5, SVM, KNN, Naive Bayes and MLP. In 2018 International Conference on Information and Communications Technology (ICOIACT), pp. 297-303.
Kallio, M., & Back, B. (2011). The self-organizing map in selecting companies for tax audit. In Emerging Themes in Information Systems and Organization Studies, pp. 347-358.
Kleanthous, C., & Chatzis, S. (2020). Gated Mixture Variational Autoencoders for Value Added Tax audit case selection. Knowledge-Based Systems, 188.
Lin, C. H., Lin, I. C., Wu, C. H., Yang, Y. C., & Roan, J. (2012). The application of decision tree and artificial neural network to income tax audit: the examples of profit-seeking enterprise income tax and individual income tax in Taiwan. Journal of the Chinese Institute of Engineers, 35 (4), 401-411.
Liu, B., Xu, G., Xu, Q., & Zhang, N. (2012). Outlier detection data mining of tax based on cluster. Physics Procedia, 33, 1689-1694.
Liu, X., Pan, D., & Chen, S. (2010). Application of hierarchical clustering in tax inspection case-selecting. In 2010 International Conference on Computational Intelligence and Software Engineering, pp. 1-4.
Luna, D. K., Palshikar, G. K., Apte, M., & Bhattacharya, A. (2018). Finding shell company accounts using anomaly detection. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 167-174.
Mabe-Madisa, G. V. (2018). A Decision Tree and Naïve Bayes algorithm for income tax prediction. African Journal of Science, Technology, Innovation and Development, 10 (4), 401-409.
Mam Musa Agrifash, A. (1394). The role of taxation in economic development. In Proceedings of the the Second International Conference on Management, Accounting and Economics. Shiraz. (in Persian)
Matos, T., de Macedo, J. A. F., & Monteiro, J. M. (2015). An empirical method for discovering tax fraudsters: A real case study of brazilian fiscal evasion. In Proceedings of the 19th International Database Engineering & Applications Symposium, pp. 41-48.
Matos, T., Macedo, J. A., Lettich, F., Monteiro, J. M., Renso, C., Perego, R., & Nardini, F. M. (2020). Leveraging feature selection to detect potential tax fraudsters. Expert Systems with Applications, 145.
Mehta, P., Mathews, J., Bisht, D., Suryamukhi, K., Kumar, S., & Babu, C. S. (2020). Detecting Tax Evaders Using TrustRank and Spectral Clustering. In International Conference on Business Information Systems, pp. 169-183.
Mehta, P., Mathews, J., Kumar, S., Suryamukhi, K., Babu, C. S., Rao, S. K. V., & Bisht, D. (2019). Big Data Analytics for Tax Administration. In International Conference on Electronic Government and the Information Systems Perspective. pp. 47-57.
Mi, L., Dong, B., Shi, B., & Zheng, Q. (2020). A Tax Evasion Detection Method Based on Positive and Unlabeled Learning with Network Embedding Features. In International Conference on Neural Information Processing. pp. 140-151.
Namazian, A., Pourheidari, O., & Zeinali, H. (2021). Investigating the Effect of Effective Tax Rate and Quality of Corporate Governance on Tax Evasion and Tax Corruption. Accounting and Auditing Review, 28 (3), 507-532. (in Persian)
Neuman, S., Omer, T., & Schmidt, A. (2013). Examining the Association between Tax Risk and Tax Outcomes, SSRN, DOI: 10.2139/ssrn.2215129.
OECD (2010). Tax Compliance and Tax Accounting Systems. TPA (Center for Tax Policy and Adminstration).
Okoro, F. M., Oshoiribhor, E. O., & John-Otumu, A. M. (2016). A framework for detecting fraudulent activities in edo state tax collection system using investigative data mining. International Journal of Artificial Intelligence and Applications (IJAIA), 7 (3), 11-21.
Ordóñez, P. J., & Hallo, M. (2019). Data Mining Techniques Applied in Tax Administrations: A Literature Review. In 2019 Sixth International Conference on eDemocracy & eGovernment (ICEDEG), pp. 224-229.
Pérez López, C., Delgado Rodríguez, M. J., & de Lucas Santos, S. (2019). Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet, 11 (4), 86. doi:10.3390/fi11040086.
Pistone P., Roeleveld J., Hattingh J., Nogueira J.F.P., & West C. (2019). Fundamentals of Taxation. IBFD, Amsterdam, Netherlands.
Placencia, J. O., Hallo, M., & Luján-Mora, S. (2020). Detection of Taxpayers with High Probability of Non-payment: An Implementation of a Data Mining Framework. In 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1-6.
Rad, M. S., & Shahbahrami, A. (2016). Detecting high risk taxpayers using data mining techniques. In 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), pp. 1-5.
Rahimikia, E., Mohammadi, S., Rahmani, T., & Ghazanfari, M. (2017). Detecting corporate tax evasion using a hybrid intelligent system: A case study of Iran. International Journal of Accounting Information Systems, 25, 1-17.
Ruan, J., Yan, Z., Dong, B., Zheng, Q., & Qian, B. (2019). Identifying suspicious groups of affiliated-transaction-based tax evasion in big data. Information Sciences, 477, 508-532.
Satu, M. S., Abedin, M. Z., Khanom, S., Ouenniche, J., & Kaiser, M. S. (2021). Application of Feature Engineering with Classification Techniques to Enhance Corporate Tax Default Detection Performance. In Proceedings of International Conference on Trends in Computational and Cognitive Engineering, pp. 53-63.
Savić, M., Atanasijević, J., Jakovetić, D., & Krejić, N. (2021). Tax Evasion Risk Management Using a Hybrid Unsupervised Outlier Detection Method. arXiv preprint arXiv:2103.01033.
Silva, J., Solano, D., Fernández, C., Ramos, L. N., Urdanegui, R., Herz, J., & Ovallos-Gazabon, D. (2021). Indicators for Smart Cities: Tax Illicit Analysis Through Data Mining. In Proceedings of International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications, pp. 929-937.
Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson Education India.
Utami, E., & Luthfi, E. T. (2018). Text mining based on tax comments as big data analysis using SVM and feature selection. In 2018 International Conference on Information and Communications Technology (ICOIACT), pp. 537-542.
Vanhoeyveld, J., Martens, D., & Peeters, B. (2020). Value-added tax fraud detection with scalable anomaly detection techniques. Applied Soft Computing, 86. Doi: org/10.1016/j.asoc.2019.105895
Wang, J., Yu, X., & Li, P. (2016). Research of tax assessment based on improved Fuzzy Neural Network. In 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 115-119.
Wang, X. Z., & Xie, Q. H. (2009). Wavelet neural network model application on the case selection of tax check in real estate industry. In 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, Vol. 2, pp. 189-192.
Wang, Y. (2010). Research on rough sets theory based tax data mining. In 2010 International Conference on Future Information Technology and Management Engineering, Vol. 1, pp. 13-15.
Wei, R., Dong, B., Zheng, Q., Zhu, X., Ruan, J., & He, H. (2019). Unsupervised Conditional Adversarial Networks for Tax Evasion Detection. In 2019 IEEE International Conference on Big Data, pp. 1675-1680.
Williams, G. J., & Christen, P. (2007). Exploratory multilevel hot spot analysis: Australian taxation office case study. In Proceedings of the sixth Australasian conference on Data mining and analytics, Vol. 70, pp. 77-84.
Wu, R. S., Ou, C. S., Lin, H. Y., Chang, S. I., & Yen, D. C. (2012). Using data mining technique to enhance tax evasion detection performance. Expert Systems with Applications, 39 (10), 8769-8777.
Wu, Y., Zheng, Q., Gao, Y., Dong, B., Wei, R., Zhang, F., & He, H. (2019). TEDM-PU: A Tax Evasion Detection Method Based on Positive and Unlabeled Learning. In 2019 IEEE International Conference on Big Data, pp. 1681-1686.
Xiangyu, X., Youlin, Y., & Qicheng, X. (2018). Intelligent Identification of Corporate Tax Evasion Based on LM Neural Network. In 2018 37th Chinese Control Conference (CCC), pp. 4507-4511.
Ying, Q., Xiaoxin, H., & Weige, J. (2019). Research on Tax Inspection Case Selection Model Based on Bayesian Network. In Proceedings of the 2019 2nd International Conference on Information Management and Management Sciences. pp. 198-202.
Yu, F., Qin, Z., & Jia, X. L. (2003). Data mining application issues in fraudulent tax declaration detection. In Proceedings of the 2003 International Conference on Machine Learning and Cybernetics, 4, 2202-2206.
Yu, H., Chen, J., Qing, D., Mao, S., & Liu, L. (2011). An application of improved fuzzy c-means clustering algorithm in tax administration, in IET International Communication Conference on Wireless Mobile and Computing (CCWMC 2011), pp. 496–499.
Zhang, F., Shi, B., Dong, B., Zheng, Q., & Ji, X. (2020). TTED-PU: A Transferable Tax Evasion Detection Method Based on Positive and Unlabeled Learning. In 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 207-216.
Zhu, Q. X., Guo, L. J., Liu, J., Xu, N., & Li, W. X. (2009). Research of tax inspection cases-choice based on association rules in data mining. In 2009 International Conference on Machine Learning and Cybernetics,Vol. 5, pp. 2625-2628.
Zhu, X., Yan, Z., Ruan, J., Zheng, Q., & Dong, B. (2018). Irted-tl: An inter-region tax evasion detection method based on transfer learning. In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 1224-1235.