Summary:An article in the International Journal of Computer Science and Information Security written by the authors Heba Ezzat Ibrahim, Sherif M. Badr, and Mohamed A. Shaheen compares phase decision trees to level decision trees. The authors state that decision trees are a useful and commonly used tool for detecting intrusions into computer networks. Decision trees are composed of data that is broken down into likely attributes, and then assigned percentage detected values to these nodes.
For their paper they compared phase and level models. The phase model is divided into three stages: the first detects if the incoming data is normal or an attack, the second detects if it is a DDoS, probe, R2L (remote to local), or U2R (user to root) attack, and the third detects the various intruder types from the previous step. This model differs from the level model as the steps are sequential.
The level model arranges each stage as separate processes that detect attacks individually and then tries to label them regardless of completion of the previous step. Instead of treating detection as a process (singular tree), it treats each phase as a different section (three trees). This allows for detection for false negatives of network attacks that the phase model may miss in the first step.
The authors found that the phase model frequently detects the threats than the level model. Additionally, they found that the phase approach classifies new attacks more frequently. The level model does show more 100% detection rates than the phase model, but on average the percentage rates are not higher. Not only does a phase model decision tree show better consistency, it also exemplifies how real world attack prevention software processes an incoming threat through logical steps instead of trying all options simultaneously.
The authors successfully explained the process of how a network attack can be detected through the use of decision trees. I personally do not have any background in computer networks, but feel it was not too difficult to understand the reasoning for running the two separate models to compare consistentency. This topic does not specifically relate to the intelligence field, however, it does relate to cyber security through computer network defense.
Despite arguing well for certain parts of this paper, I found several issues. The first is the authors never clearly define the 23 types of attacks that are drawn from the second stage. Without this information I feel that it is difficult to believe their results are accurate when I am unable to tell exactly what they are detecting. They also do not describe the processes for which they run the data thoroughly, other than they are either new attacks or are partitioned data. Additionally, the authors state that the data set they use (KDDCUP'99 data set) is the best available, but has some inherent problems. The problems are never explained well (although they did eliminate duplicate entries), and instead they say that other people have used it and therefore this justifies them to use it.