SYNERGY ANALYSIS OF ENSEMBLE FEATURE SELECTION ON PERFORMANCE AMELIORATION OF INTRUSION DETECTION SYSTEM
S.Vijayalakshmi and V.Prasanna Venkatesan
Department of Banking Technology, Pondicherry University, Pondicherry, India
ABSTRACT
Unparalleled massive generation of online data by social media platforms, digital banking, networking applications and communication portals have mandated the application of data preprocessing technique in the initial stage for the machine learning models to easily discern patterns/association in the data analysis and classification task. To realize this, effective feature extraction and selection methods have been proposed to simplify the data architecture and relationship between them. This underpins the need for implementing Feature selection in the initial stages of the machine learning pipeline where the decent representation of data becomes available to describe the problem more effectively and clearly. The pruned data generated by these techniques is aimed at effective and timely analysis of the organizational information to decipher any impending threats on the flow of network packets. Collective decisions generated from multiple feature selection techniques surpass the results generated by single feature selection method. This collective ensemble strategy applied in feature selection techniques helps in ameliorating the performance of intrusion detection system inducted in the organizational network. The employment of ensemble design in the feature selection methods holistically improves the IDS performance by enhancing classification efficiency, robustness, stability in accentuating the association between the feature sets with the attack signature (Attack class-oriented feature subset mapping) even when there is disturbance/distortion in the training dataset. This paper thoroughly analyses the efficacy of improving the IDS performance through application of ensemble architecture to feature selection techniques empowered with adoption of DESIRE (Diversity, Equity, Scalability, Inclusivity, Reproducibility (stability) and Enhance Performance) characteristics as highlighted in respective Graphs using NSL-KDD dataset. The diversity generating mechanism instituted in ensemble architecture through data perturbation, function perturbation and hybrid perturbation strategies promises comprehensive coverage of the training set by incorporating cross validation strategies and random sampling techniques
KEYWORDS
Ensemble Feature Selection, Intrusion, Diversity, Equity, Scalability, Inclusivity, Reproducibility (Sensitivity), Performance, Classification Efficiency.
1. INTRODUCTION
Data analysis and preprocessing become a mandatory step in this big data era to arrive at a better understanding of the characteristics of the data and relationship existing between them. This would enable the classification/detection model to easily interpret the underlying fabric of the data composition and discern insightful patterns (attack) in a faster manner [1] [2]. Organizations and business corporates are heavily bombarded with network traffic data comprising both good and bad elements. It becomes imperative for any network security engineer to build a detection system that corners and isolate the infiltrators (intruders) from continuous perpetual of the threat to the entire vicinity. To improve the efficacy of the intrusion detection system (IDS) the dimensionality of the data has to be curtailed to a great extent with the application of feature
Figure 1. Benefits of Feature Selection
Mrs. S. Vijayalakshmi M.C.A., M.Phil. graduate currently pursuing Ph.D. in Dept. of Banking Technology, Pondicherry University. Her research interest includes Artificial Intelligence, Cyber security, Deep Learning and applications of DL models in security engineering mainly on domains such as Intrusion/Anomaly Detection System. I have 12 years of teaching and research experience and have scholarly publications in international repute conferences and erudite blind peer reviewed journals