Bot Detection Based on Data Characterization
DOI:
https://doi.org/10.54580/R0801.14Keywords:
Bot Detection, Meta-Learning, Multi-Classifiers, Data DescriptionAbstract
Recent years, mitigating bot threats has become a challenging task. In addition to the enormous impact of malicious activities perpetrated by bots, the growth in internet usage has contributed significantly to the current situation. Damage to IT infrastructure, economic losses, human user dissatisfaction in certain service delivery environments, among other activities, are directly associated with malicious bots. The problem becomes even more complex because, on some occasions, human users use mobile applications with their user accounts to gain access privileges to certain commercial services. In other words, the level of sophistication of bots is increasingly high, which means that, under certain circumstances, human activity patterns exhibit the same characteristics as bot activity. With this level of development, detection tasks become increasingly complex and vital. In this study, we propose a meta-learning-based detection approach that supports detection through the characterization of user data (both bots and humans). The characterization process is based on a multi-classifier built from data from previous episodes, which used a Proactive Forest-based classifier. Statistical analysis is performed to select the most appropriate multi-classifier (Bagging, Boosting, Voting, or Stacking). Performance, measured by the percentage of correctly characterized instances, showed that the Voting multi-classifier performed best, with an average of 99.6% of correctly characterized instances.
Downloads
References
Acien, A., Morales, A., Fierrez, J., Vera-Rodriguez, R., & Delgado-Mohatar, O. (2021). BeCAPTCHA: Behavioral bot detection using touchscreen and mobile sensors benchmarked on HuMIdb. Engineering Applications of Artificial Intelligence, 98, 104058. https://doi.org/10.1016/j.engappai.2020.104058
Ahn, L. Von, Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008). reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science, 321(5895), 1465–1468. https://doi.org/10.1126/science.1160379
Albanese, M., Jajodia, S., & Venkatesan, S. (2018). Defending from Stealthy Botnets Using Moving Target Defenses. IEEE Security Privacy, 16(1), 92–97. https://doi.org/10.1109/MSP.2018.1331034
Alkadi, O., Moustafa, N., Turnbull, B., & Choo, K.-K. R. (2021). A Deep Blockchain Framework-Enabled Collaborative Intrusion Detection for Protecting IoT and Cloud Networks. IEEE Internet of Things Journal, 8(12), 9463–9472. https://doi.org/10.1109/JIOT.2020.2996590
Cepero-Pérez, N., Denis-Miranda, L. A., Hernández-Palacio, R., Moreno-Espino, M., & García-Borroto, M. (2018). Proactive Forest for Supervised Classification. In Y. Hernández Heredia, V. Milián Núñez, & J. Ruiz Shulcloper (Eds.), Progress in Artificial Intelligence and Pattern Recognition (pp. 255–262). Springer International Publishing. https://doi.org/10.1007/978-3-030-01132-1_29
Chen, H., He, H., & Starr, A. (2020). An Overview of Web Robots Detection Techniques. IEEE Xplore.
Chissingui, H. J., Pando, H. D., Espino, M. M., & Peréz, N. C. (2022). Bot detection algorithms: A systematic literature review. Revista Cubana de Ciencias Informáticas, 16(4), 1–26. https://rcci.uci.cu/index.php/RCCI/article/view/2548
Chissingui, H. J., Perez, N. C., Pando, H. D., & Espino, M. M. (2023). Multiclasificador homogeneo para detección de bots en el comercio electrónico. Revista Cubana de Transformación Digital, 4(1) e200. https://rctd.uic.cu/rctd/article/view/200/
Cresci, S., Pietro, R. Di, Petrocchi, M., Spognardi, A., & Tesconi, M. (2018). Social Fingerprinting: Detection of Spambot Groups Through DNA-Inspired Behavioral Modeling. IEEE Transactions on Dependable and Secure Computing, 15(4), 561–576. https://doi.org/10.1109/TDSC.2017.2681672
Duin, R. P. W. (2002). The combining classifier: to train or not to train? 2002 International Conference on Pattern Recognition, 2, 765–770 vol.2. https://doi.org/10.1109/ICPR.2002.1048415
Garcia, S., Grill, M., Stiborek, J., & Zunimo, A. (2014). An empirical comparison of botnet detection methods. Computers and Security Journal, Elsevier, 45, 100–123. http://dx.doi.org/10.1016/j.cose.2014.05.011
Gezer, A., Warner, G., Wilson, C., & Shrestha, P. (2019). A flow-based approach for Trickbot banking trojan detection. Computers & Security, 84, 179–192. https://doi.org/10.1016/j.cose.2019.03.013
Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques, third edition. Morgan Kaufmann Publishers. https://booksite.elsevier.com/9780123814791/
Hayawi, K., Saha, S., Masud, M. M., Mathew, S. S., & Kaosar, M. (2023). Social media bot detection with deep learning methods: a systematic review. Neural Computing and Applications, 35(12), 8903–8918. https://doi.org/10.1007/s00521-023-08352-z
Hitaj, D., Hitaj, B., Jajodia, S., & Mancini, L. V. (2020). Capture the Bot: Using Adversarial Examples to Improve CAPTCHA Robustness to Bot Attacks. IEEE Intelligent Systems. https://doi.org/10.1109/MIS.2020.3036156
Imperva. (2022). 2022 Imperva Bad Bot Report - Evasive Bots Drive Online Fraud. Disponível em : https://www.imperva.com/resources/resource-library/reports/bad-bot-report/
Imperva. (2025). 2025 Bad Bot Report. The Rapid Rise of Bots and the Unseen Risk for Business. https://www.imperva.com/resources/resource-library/reports/2025-bad-bot-report/
Karataş, A., & Şahin, S. (2017). A Review on Social Bot Detection Techniques and Research Directions.
Komorniczak, J., & Ksieniewicz, P. (2022, July 14). problexity -- an open-source Python library for binary classification problem complexity assessment. https://doi.org/10.48550/arXiv.2207.06709
Latah, M. (2020). Detection of malicious social bots: A survey and a refined taxonomy. Expert Systems with Applications, 151. https://doi.org/10.1016/j.eswa.2020.113383
Lorena, A. C., Garcia, L. P. F., Lehmann, J., Souto, M. C. P., & Ho, T. K. (2019). How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity. ACM Comput. Surv., 52(5) 1 - 34. https://doi.org/10.1145/3347711
Maeda, S., Kanai, A., Tanimoto, S., Hatashima, T., & Ohkubo, K. (2019). A Botnet Detection Method on SDN using Deep Learning. 2019 IEEE International Conference on Consumer Electronics (ICCE), 1–6. https://doi.org/10.1109/ICCE.2019.8662080
Orabi, M., Mouheb, D., Al Aghbari, Z., & Kamel, I. (2020). Detection of Bots in Social Media: A Systematic Review. Information Processing & Management, 57(4), 102250. https://doi.org/10.1016/j.ipm.2020.102250
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in {P}ython. Journal of Machine Learning Research, 12, 2825–2830. https://jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf
Rahman, R. U., & Tomar, D. S. (2020). A new web forensic framework for bot crime investigation. Forensic Science International: Digital Investigation, 33, 300943. https://doi.org/10.1016/j.fsidi.2020.300943
Rheault, L., & Musulan, A. (2021). Efficient detection of online communities and social bot activity during electoral campaigns. Journal of Information Technology and Politics, 18(3), 324–337. https://doi.org/10.1080/19331681.2021.1879705
Rovetta, S., Suchacka, G., & Masulli, F. (2020). Bot recognition in a Web store: An approach based on unsupervised learning. Journal of Network and Computer Applications, 157, 102577. https://doi.org/10.1016/j.jnca.2020.102577
Stassopoulou, A., & Dikaiakos, M. D. (2009). Web robot detection: A probabilistic reasoning approach. Computer Networks, 53(3), 265–278. https://doi.org/10.1016/j.comnet.2008.09.021
Suchacka, G., Cabri, A., Rovetta, S., & Masulli, F. (2021). Efficient on-the-fly Web bot detection. Knowledge-Based Systems, 223, 107074. https://doi.org/10.1016/j.knosys.2021.107074
Suchacka, G., & Iwanski, J. (2020). Identifying legitimate Web users and bots with different traffic profiles — an Information Bottleneck approach. Knowledge-Based Systems, 197, 105875. https://doi.org/10.1016/j.knosys.2020.105875
Turing, A. M. (1950). Computing Machinery and Intelligence. Oxford University Press on Behalf of the Mind Association, 59(236), 433–460. http://www.jstor.org/stable/2251299
Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017). Online Human-Bot Interactions : Detection , Estimation , and Characterization. Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), Icwsm, 280–289. Disponível em: https://doi.org/10.1609/icwsm.v11i1.14871
Venkatesan, S., Albanese, M., Cybenko, G., & Jajodia, S. (2016). A Moving Target Defense Approach to Disrupting Stealthy Botnets. Proceedings of the 2016 ACM Workshop on Moving Target Defense, 37–46.Disponível em: https://doi.org/10.1145/2995272.2995280
Zha, Z., Wang, A., Guo, Y., Montgomery, D., & Chen, S. (2019). BotSifter: An SDN-based Online Bot Detection Framework in Data Centers. 2019 IEEE Conference on Communications and Network Security (CNS), 142–150. Disponível em: https://doi.org/10.1109/CNS.2019.8802854

























