Bot Detection Based on Data Characterization

Authors

DOI:

https://doi.org/10.54580/R0801.14

Keywords:

Bot Detection, Meta-Learning, Multi-Classifiers, Data Description

Abstract

Recent years, mitigating bot threats has become a challenging task. In addition to the enormous impact of malicious activities perpetrated by bots, the growth in internet usage has contributed significantly to the current situation. Damage to IT infrastructure, economic losses, human user dissatisfaction in certain service delivery environments, among other activities, are directly associated with malicious bots. The problem becomes even more complex because, on some occasions, human users use mobile applications with their user accounts to gain access privileges to certain commercial services. In other words, the level of sophistication of bots is increasingly high, which means that, under certain circumstances, human activity patterns exhibit the same characteristics as bot activity. With this level of development, detection tasks become increasingly complex and vital. In this study, we propose a meta-learning-based detection approach that supports detection through the characterization of user data (both bots and humans). The characterization process is based on a multi-classifier built from data from previous episodes, which used a Proactive Forest-based classifier. Statistical analysis is performed to select the most appropriate multi-classifier (Bagging, Boosting, Voting, or Stacking). Performance, measured by the percentage of correctly characterized instances, showed that the Voting multi-classifier performed best, with an average of 99.6% of correctly characterized instances.

Downloads

Download data is not yet available.

References

Acien, A., Morales, A., Fierrez, J., Vera-Rodriguez, R., & Delgado-Mohatar, O. (2021). BeCAPTCHA: Behavioral bot detection using touchscreen and mobile sensors benchmarked on HuMIdb. Engineering Applications of Artificial Intelligence, 98, 104058. https://doi.org/10.1016/j.engappai.2020.104058

Ahn, L. Von, Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008). reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science, 321(5895), 1465–1468. https://doi.org/10.1126/science.1160379

Albanese, M., Jajodia, S., & Venkatesan, S. (2018). Defending from Stealthy Botnets Using Moving Target Defenses. IEEE Security Privacy, 16(1), 92–97. https://doi.org/10.1109/MSP.2018.1331034

Alkadi, O., Moustafa, N., Turnbull, B., & Choo, K.-K. R. (2021). A Deep Blockchain Framework-Enabled Collaborative Intrusion Detection for Protecting IoT and Cloud Networks. IEEE Internet of Things Journal, 8(12), 9463–9472. https://doi.org/10.1109/JIOT.2020.2996590

Cepero-Pérez, N., Denis-Miranda, L. A., Hernández-Palacio, R., Moreno-Espino, M., & García-Borroto, M. (2018). Proactive Forest for Supervised Classification. In Y. Hernández Heredia, V. Milián Núñez, & J. Ruiz Shulcloper (Eds.), Progress in Artificial Intelligence and Pattern Recognition (pp. 255–262). Springer International Publishing. https://doi.org/10.1007/978-3-030-01132-1_29

Chen, H., He, H., & Starr, A. (2020). An Overview of Web Robots Detection Techniques. IEEE Xplore.

Chissingui, H. J., Pando, H. D., Espino, M. M., & Peréz, N. C. (2022). Bot detection algorithms: A systematic literature review. Revista Cubana de Ciencias Informáticas, 16(4), 1–26. https://rcci.uci.cu/index.php/RCCI/article/view/2548

Chissingui, H. J., Perez, N. C., Pando, H. D., & Espino, M. M. (2023). Multiclasificador homogeneo para detección de bots en el comercio electrónico. Revista Cubana de Transformación Digital, 4(1) e200. https://rctd.uic.cu/rctd/article/view/200/

Cresci, S., Pietro, R. Di, Petrocchi, M., Spognardi, A., & Tesconi, M. (2018). Social Fingerprinting: Detection of Spambot Groups Through DNA-Inspired Behavioral Modeling. IEEE Transactions on Dependable and Secure Computing, 15(4), 561–576. https://doi.org/10.1109/TDSC.2017.2681672

Duin, R. P. W. (2002). The combining classifier: to train or not to train? 2002 International Conference on Pattern Recognition, 2, 765–770 vol.2. https://doi.org/10.1109/ICPR.2002.1048415

Garcia, S., Grill, M., Stiborek, J., & Zunimo, A. (2014). An empirical comparison of botnet detection methods. Computers and Security Journal, Elsevier, 45, 100–123. http://dx.doi.org/10.1016/j.cose.2014.05.011

Gezer, A., Warner, G., Wilson, C., & Shrestha, P. (2019). A flow-based approach for Trickbot banking trojan detection. Computers & Security, 84, 179–192. https://doi.org/10.1016/j.cose.2019.03.013

Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques, third edition. Morgan Kaufmann Publishers. https://booksite.elsevier.com/9780123814791/

Hayawi, K., Saha, S., Masud, M. M., Mathew, S. S., & Kaosar, M. (2023). Social media bot detection with deep learning methods: a systematic review. Neural Computing and Applications, 35(12), 8903–8918. https://doi.org/10.1007/s00521-023-08352-z

Hitaj, D., Hitaj, B., Jajodia, S., & Mancini, L. V. (2020). Capture the Bot: Using Adversarial Examples to Improve CAPTCHA Robustness to Bot Attacks. IEEE Intelligent Systems. https://doi.org/10.1109/MIS.2020.3036156

Imperva. (2022). 2022 Imperva Bad Bot Report - Evasive Bots Drive Online Fraud. Disponível em : https://www.imperva.com/resources/resource-library/reports/bad-bot-report/

Imperva. (2025). 2025 Bad Bot Report. The Rapid Rise of Bots and the Unseen Risk for Business. https://www.imperva.com/resources/resource-library/reports/2025-bad-bot-report/

Karataş, A., & Şahin, S. (2017). A Review on Social Bot Detection Techniques and Research Directions.

Komorniczak, J., & Ksieniewicz, P. (2022, July 14). problexity -- an open-source Python library for binary classification problem complexity assessment. https://doi.org/10.48550/arXiv.2207.06709

Latah, M. (2020). Detection of malicious social bots: A survey and a refined taxonomy. Expert Systems with Applications, 151. https://doi.org/10.1016/j.eswa.2020.113383

Lorena, A. C., Garcia, L. P. F., Lehmann, J., Souto, M. C. P., & Ho, T. K. (2019). How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity. ACM Comput. Surv., 52(5) 1 - 34. https://doi.org/10.1145/3347711

Maeda, S., Kanai, A., Tanimoto, S., Hatashima, T., & Ohkubo, K. (2019). A Botnet Detection Method on SDN using Deep Learning. 2019 IEEE International Conference on Consumer Electronics (ICCE), 1–6. https://doi.org/10.1109/ICCE.2019.8662080

Orabi, M., Mouheb, D., Al Aghbari, Z., & Kamel, I. (2020). Detection of Bots in Social Media: A Systematic Review. Information Processing & Management, 57(4), 102250. https://doi.org/10.1016/j.ipm.2020.102250

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in {P}ython. Journal of Machine Learning Research, 12, 2825–2830. https://jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf

Rahman, R. U., & Tomar, D. S. (2020). A new web forensic framework for bot crime investigation. Forensic Science International: Digital Investigation, 33, 300943. https://doi.org/10.1016/j.fsidi.2020.300943

Rheault, L., & Musulan, A. (2021). Efficient detection of online communities and social bot activity during electoral campaigns. Journal of Information Technology and Politics, 18(3), 324–337. https://doi.org/10.1080/19331681.2021.1879705

Rovetta, S., Suchacka, G., & Masulli, F. (2020). Bot recognition in a Web store: An approach based on unsupervised learning. Journal of Network and Computer Applications, 157, 102577. https://doi.org/10.1016/j.jnca.2020.102577

Stassopoulou, A., & Dikaiakos, M. D. (2009). Web robot detection: A probabilistic reasoning approach. Computer Networks, 53(3), 265–278. https://doi.org/10.1016/j.comnet.2008.09.021

Suchacka, G., Cabri, A., Rovetta, S., & Masulli, F. (2021). Efficient on-the-fly Web bot detection. Knowledge-Based Systems, 223, 107074. https://doi.org/10.1016/j.knosys.2021.107074

Suchacka, G., & Iwanski, J. (2020). Identifying legitimate Web users and bots with different traffic profiles — an Information Bottleneck approach. Knowledge-Based Systems, 197, 105875. https://doi.org/10.1016/j.knosys.2020.105875

Turing, A. M. (1950). Computing Machinery and Intelligence. Oxford University Press on Behalf of the Mind Association, 59(236), 433–460. http://www.jstor.org/stable/2251299

Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017). Online Human-Bot Interactions : Detection , Estimation , and Characterization. Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), Icwsm, 280–289. Disponível em: https://doi.org/10.1609/icwsm.v11i1.14871

Venkatesan, S., Albanese, M., Cybenko, G., & Jajodia, S. (2016). A Moving Target Defense Approach to Disrupting Stealthy Botnets. Proceedings of the 2016 ACM Workshop on Moving Target Defense, 37–46.Disponível em: https://doi.org/10.1145/2995272.2995280

Zha, Z., Wang, A., Guo, Y., Montgomery, D., & Chen, S. (2019). BotSifter: An SDN-based Online Bot Detection Framework in Data Centers. 2019 IEEE Conference on Communications and Network Security (CNS), 142–150. Disponível em: https://doi.org/10.1109/CNS.2019.8802854

Published

2026-06-27

How to Cite

Chissingui, H. J. (2026). Bot Detection Based on Data Characterization. Revista Angolana De Ciencias, 8(1), e080114. https://doi.org/10.54580/R0801.14

Similar Articles

1-10 of 88

You may also start an advanced similarity search for this article.