**Generalized Legendre Polynomials For Support ****Vector Machines (Svms) ****Classification**

Ashraf Afifi^{1} and E.A.Zanaty^{2}

^{1}Department of Computer Engineering, Computers and Information Technology College, Taif University, Al-Hawiya 21974, Kingdom of Saudi Arabia

^{2}Computer Science Dept., Faculty of Science, Sohag University, Sohag, Egypt.

**Abstract**

In this paper, we introduce a set of new kernel functions derived from the generalized Legendre polynomials to obtain more robust and higher support vector machine (SVM) classification accuracy. The generalized Legendre kernel functions are suggested to provide a value of how two given vectors are like each other by changing the inner product of these two vectors into a greater dimensional space. The proposed kernel functions satisfy the Mercer’s condition and orthogonality properties for reaching the optimal result with low number support vector (SV). For that, the new set of Legendre kernel functions could be utilized in classification applications as effective substitutes to those generally used like Gaussian, Polynomial and Wavelet kernel functions. The suggested kernel functions are calculated in compared to the current kernels such as Gaussian, Polynomial, Wavelets and Chebyshev kernels by application to various non-separable data sets with some attributes. It is seen that the suggested kernel functions could give competitive classification outcomes in comparison with other kernel functions. Thus, on the basis test outcomes, we show that the suggested kernel functions are more robust about the kernel parameter change and reach the minimal SV number for classification generally.

**Keywords**

*Legendre Polynomials, Kernel Functions, Functional Analysis, SVMS, Classification Problem.*

** 1. Introduction**

Support Vector Machines (SVMs) has become famous machines for data classification as a result of use for the vast data set and practical for application [1-3]. The operation of SVMs is based upon selecting kernel functions [4-6]. Picking various kernel functions will give out various SVMs [7- 9] and may turn out to be in various performances [10-11]. Some effort has been carried out on curbing kernels by handling prior knowledge; however, the optimal selection of a kernel for a provided problem is yet a free research crisis [12]. Chapelle and Schölkopf [13] suggested a kernel to use constant transformations. The disadvantage here is that they are most probably just suitable for linear SVM classifiers. Hastie et al. [14] had given comparisons among multi-class SVMs algorithms when implied to defy data set. Zanaty et al. [15-17] mixed GF and RBF functions to attain new kernel functions that can make use of their corresponding power. In [18-19], the Hermite kernel functions were defined for advancing the operation of SVMs in a variety of applications. Meng and Wenjian [20] proposed orthogonal polynomials to advance generalization performance in both classification and regression duties. The particular estimation of crossing kernel SVMs which is logarithmic in time was shown in Maji et al. [21]. They proved that the procedure is approximately in complex and the classification efficacy is passable, but the runtimes are symbolically boosted in comparison with the implanted radial bases function (RBF) and polynomial kernel (POLY) because of the great number of SVs for every classifier [14, 21]. Ozer et al. [22] presented kernel functions coming from the Chebyshev polynomials. They built various kernel functions so that they can catch the highly non-linear boundaries in the Euclidian space. In Jiang and Ching [23], the managed kernel learning with an SVM classifier was outstandingly applied in biomedical diagnosis like segregating various types of tumour tissues for noisy Raman Spectra, see [24-25] for further details.

The problems of data classification remain in picking the most convenient kernel of SVMs for a specific application, specifically since various functions and parameters can have vastly different operations [19-22]. A vital research field in SVMs is to establish an effective kernel function for constructing SVMs in a particular application, specifically because of variable current application which will demand various methods [22].

In this paper, Legendre kernel functions are constructed to advance the classification certainty of SVMs for both linear and non-linear data groups. We sustain a group of Legendre kernel functions based on advancing SVMs classification certainty. The class of Legendre kernel functions fulfils mercer conditions and gives competitive operation in comparison with all other typical kernel functions with the same standard of the simulation datasets. The suggested kernels can be used for categorizing compound data which have numerous properties.

The remainder of the paper is arranged in this way: In section 2, SVM classifications are elaborated. The kernel theory is deliberated in section 3. The generalized Legendre kernels are introduced in section 4. Section 5 shows functional examination on the presented Legendre kernels. Experimental and comparative outcomes are given in section 6. Lastly, section 7 presents the conclusion.

**2. Support Vector Machine (Svm)**

**Along with the N restrains as given in Eq.(3). This formula included the tradeoff between a cost function term and a sum of squared errors governed by the trade-off parameter γ.****To solve this ‘primal minimization’ issue, we design the dual maximization of Eq.(2) using the Lagrangian form:**

After removal of the variables w and e we get this solution:

The kernel trick is applied here as follows:**where ***α _{k }, b *are the answer to the linear system presented by Eq.(7) and

According to Eq.(9), the kernel functions have been applied on the pairs of elements separately, for a given pair of two input vectors *x* and *z*, the outcoming kernel can be formulated as:

**Figure 1.**Expressions of kernels list

**3. Propo**s**ed Kernel Functions**

In the suggested modified Henon map will be defined in terms of two basic processes namely ciphering and deciphering. To advance the classification certainty of SVMs, various kernel functions are required for various applications. We figured that Legendre function will ensure to be effective kernels for numerous applications. From the solution of Legendre’s differential equation, the formula of Legendre polynomials may be written down using Rodrigues’ formula:

and applying the general Leibniz rule for repeated differentiation.

**3.1 Legendre Recurrence**

Eq.(11) is discriminated concerning t on both sides to acquire more terms with no use of direct broadening of the Taylor series, and reorganized to attain:

Replacing the quotient of the square root with its description in (11), and equating the coefficients of powers of t in the outcoming expansion gives Bonnet’s recursion formula:

** Theorem 1:** Taylor series expansion of Legendre’s differential equation:

**3.2 Orthogonally of Legendre Function**

This completes the proof.

** ****3.3Generalized Legendre Kernels**

Here, we are suggesting a general method of conveying the kernel function to resolve the vagueness on how to apply Legendre kernels. As of what we know, there was a preceding work illustrating the Legendre polynomials for vector inputs recursively. thus for vector inputs, we illustrate the generalized Legendre polynomials as:

**Figure 2**.List of the generated Legendre kernel functions up to 4th order.

**4. Functional Analysis**

** Theorem 3:** A kernel is a valid SVM kernel; if it satisfies the Mercer Conditions [29].

__Proof____:__ If the kernel does not fulfill the Mercer Conditions, SVM might not derive the best parameters, but instead it might bring up suboptimal parameters. Additionally, in case of the Mercer conditions not being fulfilled, the Hessian matrix for the optimization portion might not be positive straightforward. Thus we inspect if the generalized Legendre kernel fulfill the Mercer conditions.

**5. Experimental Results And Discussions**

The multi-class problem is described as the categorizing problem that has numerous classes. To prolong these classifiers to take care of many classes, the target of this method is to map the generalization capabilities of the binary classifiers to the multi-class domain. Multi-class SVMs are ordinarily applied by mixing a couple of two class SVMs. In multi-class experimentations, we have trained SVM for every class individually so that one is against all. In every experiment, we utilized the SVM toolbox accessible at [32]. The multi-class problem is described as a classification problem which has numerous classes or characters. Current SVMs [24] are binary classifiers, i.e., they could categorize two classes. To be capable of dealing with various classes (greater than 2), the current classifiers should be prolonged. The target is to depict the generalization abilities of the binary classifiers to the multi-class domain. Multi-class SVMs are usually applied by merging several two – class SVMs. The classifier is constructed to read two input data files, the training data and the test data (for more details see [11, 18]). Every file is**The classification experimentations are carried out on number image segmentation data sets like Brickface, Foliage, Sky, Cement, Window, Path and Grass data set [30-31]. The data has 7 diverse image classes. It has 210 data for training and a different 2100 data for evaluating. Every vector has 18 elements having diverse maximum and minimum values. For the training, we got 30 data for the class (+1) and 180 data for the class (1) and likely for testing, we have 300 against 1800 data, correspondingly for every class. With the test step, the kernel functions presented various performance values on various classes and there was winning kernel presenting the optimal performance one very class as shown in Table (3). The suggested Legendre kernel operated better than the standard. The optimal performance values having the least SV numbers are presented in bold. Table (3) illustrates the test results for every class with various kernel functions. We carry out some evaluations to compare the suggested kernel with its preceding opposite as well as the Gaussian (GF) [25], polynomial (POLY) [14], Wavelet [7] and Chebyshev [22] kernel functions. The operation of the suggested kernel with SVMs according to classification accuracy (ACC) and kernel parameter against SV, is calculated by implementation to data sets in Table (3). As shown in Table (3), the generalized Legendre kernel results present better generalization capability than the current GF, POLY, Wavelet and Chebyshev kernels. For instance, the optimal ACC is retrieved for Brickface, Foliage, Window, Path, Grass with the least SV to be 17, 6, 4, 6 and 15 correspondingly. Even though the ACC of the generalized Legendre kernel and GF kernel for Foliage data is similar, we emphasize that SV of the generalized Legendre kernel has the least. In Figs.(1-7), we give the ACC against SV of the POLY, GF, Wavelet, Chebyshev, and generalized Legendre kernels for Brickface, Foliage, Sky, Cement, Window, Path and Grass data set. The generalized Legendre kernel functions present the minimal SV 4 while maintaining the generalization ability right for the dataset. The relation between ACC and SV showed that, as the SV increases, the ACC increases and asymptotically arrive at a high performance value.**

Figs. (8-12) describe the relation between kernel parameter vs. SV for GF, POLY, Wavelet and Chebyshev kernel functions when these methods are implemented on Brickface, Foliage, Sky, Cement, Window, Path and Grass data set correspondingly. These figures prove that as the kernel parameter increases, the Chebyshev, Legendre and wavelet kernel increase their operation and asymptotically reach allow performance value. While as the kernel parameter increases, the Chebyshev and wavelet kernels need more SV than Legendre kernel. During the evaluations, we witnessed that the generalized Legendre kernel function reaches the minimal SV number in general.

**Figure.3 **Data classification results with different kernel functions.**Figure 1 **SV vs. ACC for Brickface data.**Figure 2 **SV vs. ACC for Sky data.**Figure 3 **SV vs. ACC for Foliage data.**Figure 4 **SV vs. ACC for Cement data.**Figure 5 **SV vs. ACC for Window data.

Figure 6SV vs. ACC for Path data.

Figure 7SV vs. ACC for Grass data

Figure 8Polynomial kernel parameter vs. SV number.

Figure 9Gaussian kernel parameter vs. SV number

Figure 10Wavelet kernel parameter vs. SV.

Figure 11Chebyshev kernel parameter vs. SV.

Figure 12Legendre kernel parameter vs. SV.

**6. Conclusion**

Presenting the current paper, the classification certainty of SVMs has become advanced by mapping the training data into a feature space by the help groups of Legendre functions. A class of Legendre kernel functions based upon the properties of the common kernels is suggested, being able to recognize many applications in training. Normalization takes a vital job for generalized Legendre kernel, and thus the whole data should be normalized between [-1,1] before utilizing the kernel function. Upon the simulation results, it can be said that picking order of Legendre polynomials from an integer group is usually sufficient to acquire a good classification consequence from the generalized Legendre kernel function.

We have made a comparison between the classification efficacy of the Legendre kernel function and the current kernels like the current GF, POLY and Wavelet kernels. In accordance with the test outcomes, the generalized Legendre kernel shows the lowest number of support vectors on almost every evaluation. In strictly, we suggest this is derived from the orthogonally characteristic of the Legendre polynomials. This character of the kernel function can be vital and helpful in several applications where the support vector number is greatly vital as in feature selection. Therefore generalized Legendre kernel functions can be regarded as a valid substitute to the GF, POLY, and Wavelet kernel functions for a couple of particular datasets. The test outcomes imply that their outcomes have been analogs to the kernel functions developed from the generalized Legendre polynomials of the primer kind. Thus, we have not comprised the family of kernel functions and their outcomes in this study. Also, since handling the properties of the generalized Legendre polynomials is beyond this study, we have not studied these properties specifically even though this study could be helpful to design new kernel functions developed from generalized Legendre polynomials and that could be the goal of upcoming work.

**References**

[1] Vapnik V. N., (1995) “The nature of statistical learning theory”, Springer-Verlag, New York, NY, USA.

[2] Kim H., Pang S., Je H., Kim D., Bang S.Y., (2003) “Constructing support vector machine ensemble”, Pattern Recognition”, vol.36, no.12, pp.2757–2767.

[3] Du P., Peng J., Terlaky T.. (2009) “Self-adaptive support vector machines”, modeling and experiments Computational Management Science, vol. 6, no.1, pp. 41–51.

[4] Boser B. E., Guyon I. M., Vapnik V. N., (1995) “A training algorithm for optimal margin classifiers”, Proc. Fifth Ann. Workshop Computing Learning Theory, pp. 144-152.

[5] Vapnik V.N. , (1999 “An overview of statistical learning theory)”, IEEE Trans. Neural networks, vol. 10, no. 5, pp. 988-999.

[6] Cortes C., Vapnik V.N., (1995) “Support-vector networks”, Machine Learning, vol. 20, pp. 273- 297.

[7] Scholkopf B., (1997) “Support vector learning”, PhD dissertation, Technische Universitat Berlin, Germany.

[8] Vapnik V.N., Golowich S., and Smola A., (1997) “Support vector method for function approximation, regression estimation and signal processing”, Advances in Neural Information processing Systems, vol. 9, Cambridge, Mass.: MIT Press.

[9] Mangasarian O.L., Musicant D.R., (1999) “Successive over relaxation for support vector machines”, IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1032-1037.

[10] Aronszajn N. , (1950) “Theory of Reproducing Kernels”, Trans. Am. Math. Soc., vol. 68, pp. 337-404.

[11] Shawe-Taylor J., Bartlett P.L., Willianmson R.C., Anthony M., (1998) “Structural risk minimization over data-dependent hierarchies”, IEEE Trans. Information Theory, vol. 44, no. 5, pp. 1926-1940.

[12] Williamson R. C., Smola A., Schölkopf B., (1999) “Entropy numbers, operators and support vector kernels”, MA: MIT Press, Cambridge, pp. 44-127.

[13] Chapelle O. and Schölkopf B., (2002) “Incorporating invariances in non-linear support vector machines”, In T. G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, vol. 14, pp. 594-609, Cambridge, MA: MIT Press.

[14] Hastie T., Hsu C-W., Lin C-J, (2005) “A comparison of methods for multi-class support vector Machines”, IEEE Transactions on Neural Networks, no. 13, pp.415–425.

[15] Zanaty , E.A. ,and Sultan Aljahdali, (2008) “Improving the accuracy of support vector machines”, Proceedings of 23rd International conference on computers and their application April, pp. 75-83, Cancun, Mexico.

[16] Zanaty E.A, Sultan Aljahdali, R.J. Cripps, (2009) “Accurate support vector machines for data classification”, Int. J. Rapid Manufacturing, vol. 1, no. 2, pp. 114-127.

[17] Zanaty E.A, Ashraf Afifi, (2011) ”Support vector machines (SVMs) with universal kernels “, in International journal of Artificial Intelligence, vol. 25 , pp.575-589.

[18] E. A. Zanaty &Ashraf Afifi, Jun (2018) “Generalized Hermite kernel function for support vector machine classifications”, International Journal of Computer Applications, In press, Accepted 07 Jun 2018.

[19] Vahid H. M., Javad H.,., (2016) “New Hermite Orthogonal Polynomial Kernel and Combined Kernels in Support Vector Machine Classifier”, Elsevier, Pattern Recognition, vol. 60, pp. 921–935.

[20] Meng T., Wenjian W., (2017) “Some Sets of Orthognal Polynomial Kernal Functions”, Elsevier, Applied Soft Computing, vol. 61, pp. 741–756.

[21] Maji S., Berg A.C., Malik J. ,(2008) “Classification using intersection kernel support vector machines is efficient”, IEEE Conference on Computer Vision and Pattern Recognition, June pp.1–8.

[22] Ozer Sedat, Chen ChiH., Cirpan HakanA., (2011) “A set of new Chebyshev kernel functions for support vector machine pattern classification”, Pattern Recognition, no. 44, pp.1435–1447.

[23] Hao Jiang, Wai-Ki Ching, (2012) “Correlation kernels for support vector machines classification with applications in cancer data”, Computational and Mathematical Methods in Medicine, doi:10.1155/2012/205025, pp.1-7.

[24] A. Kyriakides, E. Kastanos, K. Hadjigeorgiou, and C. Pitris, (2011) “Classification of Raman spectra using the correlation kernel” , Journal of Raman Spectroscopy, vol. 42, no. 5, pp. 904–909.

[25] A. Kyriakides, E. Kastanos, K. Hadjigeorgiou, and C. Pitris, (January 2011) “Support vector machines with the correlation kernel for the classification of Raman spectra”, Advanced Biomedical and Clinical Diagnostic Systems IX, vol. 7890 of Proceedings of SPIE, pp. 78901B-1–78901B-7, San Francisco, Calif, USA.

[26] Zhi-B., Hong C., Xin H., Luis C. P., Martin C., Alfonso, Rojas D., Hector P., Hector F., (2018) “A novel Formulation of Orthognal Polynomial Kernal Functions for SVM Classifiers : The Gegenbauer family”, Elsevier, Pattern Recognition, vol. 84, pp. 211–225.

[27] Mercer T. , ( 1909)“Functions of positive and negative type and their connection with the theory of integral equations”, Philosophical Trans. of the Royal Soc. of London, Series A, pp. 415-446.

[28] Williamson R.C., Smola A.J., and Scholkopf B., (1998) “Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators”, Technical Report 19, NeuroCOLT.

[29] Scholkopf B., A.J. Smola, (2001) “Learning with kernels: support vector machines, regularization, optimization”, and Beyond, MIT Press.

[30] Web: http://www.liacc.up.pt/ML/old/statlog/datasets.html

[31] Web: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/

[32] Web:http://www. http://svm.sourceforge.net/docs/3.00/api/

%d bloggers like this: