International Journal of Computer Networks & Communications (IJCNC)

PERFORMANCE EVALUATION OF DIFFERENT KERNELS FOR SUPPORT VECTOR MACHINE USED IN INTRUSION DETECTION SYSTEM

Md. Al Mehedi Hasan¹, Shuxiang Xu², Mir Md. Jahangir Kabir² and Shamim Ahmad¹

¹Department of Computer Science and Engineering, University of Rajshahi, Bangladesh.

²School of Engineering and ICT, University of Tasmania, Australia.

ABSTRACT

The success of any Intrusion Detection System (IDS) is a complicated problem due to its nonlinearity and the quantitative or qualitative network traffic data stream with numerous features. As a result, in order to get rid of this problem, several types of intrusion detection methods with different levels of accuracy have been proposed which leads the choice of an effective and robust method for IDS as a very important topic in information security. In this regard, the support vector machine (SVM) has been playing an important role to provide potential solutions for the IDS problem. However, the practicability of introducing SVM is affected by the difficulties in selecting appropriate kernel and its parameters. From this viewpoint, this paper presents the work to apply different kernels for SVM in ID Son the KDD’99 Dataset and NSL-KDD dataset as well as to find out which kernel is the best for SVM. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99train and test dataset prior to apply different kernel for SVM. This RRE-KDD consists of both KDD99Train+ and KDD99 Test+ dataset for training and testing purposes, respectively. The way to derive RRE-KDD data set is different from that of NSL-KDD data set. The experimental results indicate that Laplace kernel can achieve higher detection rate and lower false positive rate with higher precision than other kernel son both RRE-KDD and NSL-KDD datasets. It is also found that the performances of other kernels are dependent on datasets.

KEYWORDS

Intrusion Detection, KDD’99, NSL-KDD, Support Vector Machine, Kernel, Kernel Selection

1. Introduction

In spite of having great advantages of Internet, still then it has compromised the stability and security of the systems connected to it. Although static defense mechanisms such as firewalls and software updates can provide a reasonable level of security, more dynamic mechanisms such as intrusion detection systems (IDSs) should also be utilized [1]. Intrusion detection is the process of monitoring events occurring in a computer system or network and analyzing them for signs of intrusions. The IDSs are simply classified as host-based or network-based. The former is operated on information collected from within an individual computer system and the latter collect raw network packets and analyze for signs of intrusions. There are two different detection techniques employed in IDS to search for attack patterns: Misuse and Anomaly. Misuse detection systems find known attack signatures in the monitored resources. The anomaly detection systems find attacks by detecting changes in the pattern of utilization or behavior of the system [2].

As network attacks have been increased significantly over the past few years, Intrusion Detection Systems (IDSs) have become a necessary addition to the security infrastructure of most organizations [3]. Deploying highly effective IDS systems is extremely challenging and has emerged as a significant field of research, because it is not theoretically possible to set up a system with no vulnerabilities [4]. Several machine learning (ML) algorithms, for instance Neural Network [5], Genetic Algorithm [6, 7], Fuzzy Logic [4, 8, 9], clustering algorithm [10] and more have been extensively employed to detect intrusion activities from large quantity of complex and dynamic data sets.In recent times, support vector machine (SVM) has been extensively applied to provide potential solutions for the IDS problem. But, the selection of an appropriate kernel and its parameters for a certain classification problem influence the performance of the SVM. The reason behind it is that different kernel functions construct different SVMs and affect the generalization ability and learning ability of SVM. However, there is no theoretical method for selecting kernel function and its parameters. Literature survey showed that for all practical purposes, most of the researchers applied Radial Basis Function (RBF) kernel to build SVM based intrusion detection system [11, 12, 13, 14] and found the value of its parameter by using different technique and moreover some research paper did not mention value of the kernel parameter [13]and some others used the default value of the software package used [15].Surprisingly still there are many other kernel functions which are not yet applied in intrusion detection. But the nature of classification problem requires applying of different kernels for SVM to ensure optimal result [13]. This requirement motivated us to apply different kernel functions for SVM rather than just of using RBF in IDS, which, in turn, may provide better accuracy and detection rate. At the same time, we have also tried to find out parameter value to the corresponding kernel.

The remainder of the paper is organized as follows: Section 2 provides the description of the KDD’99 and NSL-KDD dataset. We outline mathematical overview of SVM in Section 3. Dataset and Experimental setup is presented in Section 4. Preprocessing and SVM model selection are drawn in Section 5 and 6 respectively. Finally, Section 7 reports the experimental result followed by conclusion in Section 8.

2. KDDCUP’99 DATASET

Under the sponsorship of Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), MIT Lincoln Laboratory has collected and distributed the datasets for the evaluation of researches in computer network intrusion detection systems [16]. The KDD’99 dataset is a subset of the DARPA benchmark dataset prepared by Sal Stofo and Wenke Lee [17]. The KDD data set was acquired from raw tcp dump data for a length of nine weeks. It is made up of a large number of network traffic activities that include both normal and malicious connections. The KDD99 data set includes three independent sets; ‘‘whole KDD’’, ‘‘10% KDD’’, and ‘‘corrected KDD’’. Most of researchers used ‘‘10% KDD’’ and ‘‘corrected KDD’’ as training and testing set, respectively [18]. The training set contains a total of 22 training attack types. The ‘‘corrected KDD’’ testing set includes an additional 17 types of attacks and excludes 2 types (spy, warezclient) of attacks from training set. There are 37 attack types which are included in the testing set, as shown in Table 1 and Table 2. The simulated attacks fall in one of the four categories [1, 18, 19]: (a) Denial of Service Attack (DoS), (b) User to Root Attack (U2R), (c) Remote to Local Attack (R2L), (d) Probing Attack. A connection in the KDD-99 dataset is represented by 41 features, each of which is in one of the continuous, discrete and symbolic form, with significantly varying ranges [20].

Table 1: Attacks in KDD’99 Training Dataset

2.1. INHERENT PROBLEMS OF THE KDD’99 AND OUR PROPOSED SOLUTION

Statistical analysis on KDD’99 dataset found important issues which highly affects the performance of evaluated systems and results in a very poor evaluation of anomaly detection approaches [13]. The most important deficiency in the KDD data set is the huge number of redundant records. Analyzing KDD train and test sets, Mohbod Tavallaee found that about 78% and 75% of the records are duplicated in the train and test set, respectively [15]. This large amount of redundant records in the train set will cause learning algorithms to be biased towards the more frequent records, and thus prevent it from learning infrequent records which are usually more harmful to networks such as U2R attacks. The existence of these repeated records in the test set, on the other hand, will cause the evaluation results to be biased by the methods which have better detection rates on the frequent records.

To solve these issues, we have derived a new data set RRE-KDD by eliminating redundant record from KDD’99 train and test dataset (10% KDD and corrected KDD), so the classifiers will not be biased towards more frequent records. This RRE-KDD dataset consists of KDD99Train+ and KDD99Test+ dataset for training and testing purposes, respectively. The numbers of records in the train and test sets are now reasonable, which makes it affordable to run the experiments on the complete set without the need to randomly select a small portion.

2.2. NSL-KDD DATASET

To overcome the problem of KDD’99 dataset, researchers have proposed a new data set, NSL-KDD, which consists of selected records of the complete KDD data set [15]. The development of NSL-KDD dataset was different than our approach. The NSL-KDD dataset also does not include redundant records in the train set, so the classifiers will not be biased towards more frequent records. The numbers of records in the train and test sets are also reasonable, which makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research works will be consistent and comparable.

3. SVM CLASSIFICATION

The theory of support vector machine (SVM) is from statistics and the basic principle of SVM is finding the optimal linear hyper plane in the feature space that maximally separates the two target classes [21, 22, 23]. There are two types of data namely linearly separable and non-separable data. To handle these data, two types of classifier, linear and non-linear, are used in pattern recognition field.

Considering the maximum margin classifier, there is hard margin SVM, applicable to a linearly separable dataset, and then modifies it to handle non-separable data. This leads to the following constrained optimization problem:

International Journal of Computer Networks & Communications (IJCNC)

Share this: