An Explainable Hybrid Pipeline for Malware Classification: Benchmark Construction, Feature Reduction, and Security-Oriented Evaluation
Carmelo Ardito, Giuseppe Loseto, Riccardo Di Pietro, Nicola Epicoco, Alessandro MassaroMalware classification increasingly relies on machine learning models that combine static and dynamic evidence, yet their practical use is often limited by dataset inconsistency, high-dimensional feature spaces, and insufficient transparency. This paper presents an explainable hybrid malware-classification pipeline built on an aligned public dataset in which static and dynamic features are matched at sample level and share the same class space. The framework combines a Random Forest static branch, a calibrated XGBoost dynamic branch, and a weighted late-fusion stage whose branch weights are derived from inner-validation weighted-F1 rather than from test performance. On the corrected no-leak benchmark, static reduction compresses the static space from 771 to 258 features, while sparse-aggressive reduction compresses the dynamic space from 21,918 to 374 features. An early-fusion XGBoost baseline achieves the best multiclass aggregate scores, whereas the validation-weighted calibrated hybrid provides the strongest false-negative-first Benign vs. Malware profile, reaching malware recall 0.9998, benign recall 0.8053, and one false negative on the test set. The study shows that, once leakage is removed and fusion is validation-driven, the preferred hybrid architecture depends on the operational objective rather than on a single aggregate metric.