Abstract:Focusing on the problems of machine learning in traffic anomaly detection, including reliance on expert experience for feature selection, insufficient expression ability of raw features, poor robustness of models due to noise and outliers in data, and low detection rates for minority classes in imbalanced high-dimensional datasets, an improved LightGBM for Traffic Anomaly Detection Method with Feature Enhancement is proposed. Firstly, the isolation forest (iForest) method is utilized to handle outliers, and the data processed by outlier treatment is used to train an one-dimensional convolutional denoising auto-encoder (CDAE) with global average pooling (GAP), which indirectly eliminates noise in the data and obtains low-dimensional enhanced expressions of original features. Then, adaptive synthetic sampling (ADASYN) is applied to the data after outlier treatment for data augmentation, and the trained CDAE is used to extract features. The obtained low-dimensional features are used as input for LightGBM, which is trained and optimized with Bayesian parameter tuning. At last, the precision classification of anomalous traffic is achieved through the utilization of the obtained CDAE+LightGBM ensemble model. The proposed method attains accuracy rates of 87.80% and F1 scores of 87.75% in a five-class classification task on the NSL-KDD dataset. Experimental results demonstrate that the proposed approach significantly enhances detection accuracy and reinforces the capability to identify unknown attacks. The test on CICIDS2017 scene data set further verifies the feasibility of the proposed method, which superior to the same type of deep learning algorithm.