PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification

doi:10.35377/saucis...1839585

DOI: 10.35377/saucis...1839585 ISSN: 2636-8129

PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification

Cagatay Neftali Tülü, İhsan Deniz

Cyber attackers exploit various online channels to spread attention-grabbing, intriguing, or fear-inducing content. By prompting users to engage with these materials and click on embedded links, they redirect individuals to fraudulent websites that closely resemble legitimate ones, thereby stealing confidential information or perpetrating other deceptive practices, often through these phishing sites. Therefore, mobile applications or browsers must be able to identify such fake and harmful websites even before users access them. This study introduces a two-stage approach for early phishing website detection, called as PhishShield. In the first stage, the LSTM-based deep learning model is pre-trained to assess domain names based on their address character sequences, scoring their likelihood of being phishing. In the second stage, a machine learning model, trained with the phishing score and additional features (such as, domain and SSL details), predicts whether a website is phishing or legitimate. The obtained results show that the random forests classifier in the second stage achieves the highest accuracy with 0.984. To train the models, two distinct datasets, namely Dataset1 and Dataset2, are prepared. The deep learning model in the first stage is trained with Dataset1, and the machine learning model in the second stage is trained with Dataset2.

Outline

PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification

More from our Archive