1 Introduction
With the rapid development of computer technology, new anti-malware technologies are required because malware is becoming more complex with a faster propagation speed and a stronger ability for latency, destruction, and infection.
Many companies have released anti-malware software, most of which is based on signatures and can detect known malware very quickly. However, the software often fails to detect new variations and unknown malware. Based on metamorphic and polymorphous techniques, even a layman is able to develop new variations of known malware easily using malware automaton. Thus, traditional malware detection methods based on signatures are no longer suitable for new environments; as well, heuristics have started to emerge.
For the past few years, applying immune mechanisms to computer security has developed into a new field, attracting many researchers. Forrest applied immune theory to computer abnormality detection for the first time in 1994 [1]. Since then, many researchers have proposed various different malware detection models and achieved some success.
2 A Hierarchical Artificial Immune Model for Virus Detection
As viruses become more complex, existing anti-virus methods are inefficient to detect various forms of viruses, especially new variants and unknown viruses. Inspired by immune system, a hierarchical artificial immune system (AIS) model, which is based on matching in three layers, is proposed to detect a variety of forms of viruses. In the bottom layer, a non-stochastic but guided candidate virus gene library is generated by statistical information of viral key codes. Then a detecting virus gene library is upgraded from the candidate virus gene library using negative selection. In the middle layer, a novel storage method is used to keep a potential relevance between different signatures on the individual level, by which the mutual cooperative information of each instruction in a virus program can be collected. In the top layer, an overall matching process can reduce the information loss considerably. Experimental results indicate that the proposed model can recognize obfuscated viruses efficiently with an averaged recognition rate of 94%, including new variants of viruses and unknown viruses.
3 A Virus Detection System Based on Artificial Immune System
A virus detection system (VDS) based on artificial immune system (AIS) is proposed in this paper. VDS at first generates the detector set from virus files in the dataset, negative selection and clonal selection are applied to the detector set to eliminate autoimmunity detectors and increase the diversity of the detector set in the non-self space respectively. Two novel hybrid distances called hamming-max and shift r bit-continuous distance are proposed to calculate the affinity vectors of each file using the detector set. The affinity vectors of the training set and the testing set are used to train and test classifiers respectively. VDS compares the detection rates using three classifiers, k-nearest neighbor (KNN), RBF networks and SVM when the length of detectors is 32 bit and 64 bit. The experimental results show that the proposed VDS has a strong detection ability and good generalization performance.
Fig1. The average detection rate of SVM, KNN and rbf network when L = 32, the files are randomly selected from the dataset
4 A malware detection model based on a negative selection algorithm with penalty factor
A malware detection model based on a negative selection algorithm with penalty factor (NSAPF) is proposed in this paper. This model extracts a malware instruction library (MIL), containing instructions that tend to appear in malware, through deep instruction analysis with respect to instruction frequency and file frequency. From the MIL, the proposed model creates a malware candidate signature library (MCSL) and a benign program malware-like signature library (BPMSL) by splitting programs orderly into various short bit strings. Depending on whether a signature matches "self", the NSAPF further divides the MCSL into two malware detection signature libraries (MDSL1 and MDSL2), and uses these as a two-dimensional reference for detecting suspicious programs. The model classifies suspicious programs as malware and benign programs by matching values of the suspicious programs with MDSL1 and MDSL2. Introduction of a penalty factor C in the negative selection algorithm enables this model to overcome the drawback of traditional negative selection algorithms in defining the harmfulness of "self" and "nonself", and focus on the harmfulness of the code, thus greatly improving the effectiveness of the model and also enabling the model to satisfy the different requirements of users in terms of true positive and false positive rates. Experimental results confirm that the proposed model achieves a better true positive rate on completely unknown malware and better generalization ability while keeping a low false positive rate. The model can balance and adjust the true positive and false positive rates by adjusting the penalty factor C to achieve better performance.
Fig2. Flowchart for MSEM
Fig3. Overall accuracy comparison
Fig4. Training time comparison