DOI: 10.3390/s24010189 ISSN: 1424-8220

Android Ransomware Detection Using Supervised Machine Learning Techniques Based on Traffic Analysis

Amnah Albin Ahmed, Afrah Shaahid, Fatima Alnasser, Shahad Alfaddagh, Shadha Binagag, Deemah Alqahtani
  • Electrical and Electronic Engineering
  • Biochemistry
  • Instrumentation
  • Atomic and Molecular Physics, and Optics
  • Analytical Chemistry

In today’s digitalized era, the usage of Android devices is being extensively witnessed in various sectors. Cybercriminals inevitably adapt to new security technologies and utilize these platforms to exploit vulnerabilities for nefarious purposes, such as stealing users’ sensitive and personal data. This may result in financial losses, discredit, ransomware, or the spreading of infectious malware and other catastrophic cyber-attacks. Due to the fact that ransomware encrypts user data and requests a ransom payment in exchange for the decryption key, it is one of the most devastating types of malicious software. The implications of ransomware attacks can range from a loss of essential data to a disruption of business operations and significant monetary damage. Artificial intelligence (AI)-based techniques, namely machine learning (ML), have proven to be notable in the detection of Android ransomware attacks. However, ensemble models and deep learning (DL) models have not been sufficiently explored. Therefore, in this study, we utilized ML- and DL-based techniques to build efficient, precise, and robust models for binary classification. A publicly available dataset from Kaggle consisting of 392,035 records with benign traffic and 10 different types of Android ransomware attacks was used to train and test the models. Two experiments were carried out. In experiment 1, all the features of the dataset were used. In experiment 2, only the best 19 features were used. The deployed models included a decision tree (DT), support vector machine (SVM), k-nearest neighbor (KNN), ensemble of (DT, SVM, and KNN), feedforward neural network (FNN), and tabular attention network (TabNet). Overall, the experiments yielded excellent results. DT outperformed the others, with an accuracy of 97.24%, precision of 98.50%, and F1-score of 98.45%. Whereas, in terms of the highest recall, SVM achieved 100%. The acquired results were thoroughly discussed, in addition to addressing limitations and exploring potential directions for future work.

More from our Archive