Abstract— Botnets represent one ofthe most aggressive cyber security threats faced by organizations as theyprovide different platforms for many illegal activities like distributed denialof service attacks, click frauds, phishing and malware dissemination. Varietyof techniques which use different feature set are proposed for effective botnettraffic classification and analyses but several challenges remain unaddressedsuch as the effect of feature set of Network flow exporter. In this paper weexplore an open source Network traffic flow exporter (with a set of features)using different protocol filters.
We evaluated that the use of flow exporterand protocol filters indeed affect the performance of botnet trafficclassification. Keywords—Botnet,cyber security, flow exporter, protocol filter, traffic classification. I. Introduction A botnet is a collection of compromisedcomputers connected over internet and remotely controlled by botmaster. Theindividual compromised machines are called bots. Botnets are created to conductdifferent malicious activities like distributed denial of service (DDoS)attacks, click-fraud scams, spreading spam, stealing victims personalinformation and taking advantage of users significant computational resourcesby using malicious bots 1. The bots keep updating themselves and are controlledby botmaster to carry out malicious instructions for different illegalactivities. Hence with significantly increasing high rate of reportedinfections and illegal activities, the botnets contribute a serious threatagainst cyber security.
The significant aspect of botnetsarchitecture include communication scheme, which has highly evolved overthe years that enhanced botnet functionality and avoid botnet detection. The architecture includes the compromisedbots that communicate with command and control (C&C) server to fetchinstructions from botmaster. Botnets used the Internet Relay Chat (IRC)protocol for communication until early 2000s.
However, the IRC-based bots arehighly vulnerable as they use centralized topology architecture. The complete botnetnetwork can be disrupted just by shutting down the IRC server. Also, the messages may easily get reveled bycontinuous monitoring of network traffic and further research can be done oncaptured messages from packets. Since 2003, the botnets evolved and startedusing more sophisticated techniques that involved use of decentralized topologyarchitecture such as peer-to-peer (P2P) and different ubiquitous protocols suchas DNS and HTTP. The P2P communication scheme involves individual bots that actas both client and server, making it more effective without any fixedcentralized point that could be exploited. However, the P2P botnet topologyalso has its limitation that includes higher latency underlying in the commandand control transmission which further impacts the bots synchronization. Theuse of various techniques like encryption and fluxing has also helped botnetsto avoid detection. Therefore, botnet identification anddetection have become highly challenging.
Many botnet detection approaches havebeen proposed that involve network traffic analysis classification. Some of theresearch in this category focus to build a generalized model for botnetdetection where as others focuses on specific types of botnets. In Early 2000,mostly the proposed systems included specifically botnets using IRC 2.However the recent research is more focused on P2P and HTTP based botnets 3 4.
The botnet monitoring and detection techniques used for botnet classificationshould be active and continuous as the botnets use automatic update mechanisms.Also, it potentially enables them to learn new patterns and help in adapting toany changes in botnet evolution. Therefore, machine learning techniques (i.e.,classification and clustering) are an effective apt solution which can bedeployed.
To enable automatic pattern recognition for meaningful representationof network traffic analysis, the clustering and classification are used. Hence,the most significant component of these systems is meaningful feature(attribute) extraction from network traffic. It is very challenging to extract thesefeatures.
Thus to end this, various botnetdetection and analysis systems have proposed their own feature sets thatrepresent network traffic which consists of the network packets. The networkpackets is mainly divided into two major parts: 1) packet header, that containscontrol information of protocols being used over network, and 2) packetpayload, which contain the application information being used over the network.Some of the botnet detection and analysis approaches use network packet headers4, where as others use packet payload methods 5.
Flow based featureextraction methods are commonly used by the approaches that rely on packetheaders 4. In these approaches, the traffic communication packets areaggregated into flows and later the statistics are computed. The flow exportersare used for generating flows and extracting such features. However, variousbotnets use encryption techniques to hide the identity and avoid the detectionsystems which analyze the packet payload for embedded communicationinformation. Thus, the flow exporters are very effective because they summarizethe traffic using only network packet headers.
Hence, the open source flowexporter along with machine learning technique is used for performing effectivebotnet traffic classification. II. BACKGROUD AND RELATED WORKThe bots are the vulnerable hosts thatare infected by the self-propagating malwares called bot program and aredesigned to perform various malicious activities. The botmaster controls theinfected bots network known as botnet.
Initially, the infected bots receive thecommands from the botmaster by C&C medium and perform malicious operationslike DDoS, phishing, spamming, identity theft attacks and stealing user’ssignificant information 1. The bot uses five stages to create andmaintain a botnet 1. The first stage includes the infection stage, where theattacker infects the victim by exploiting the existing vulnerabilities by differentexploitation techniques.
The second stage includes the secondary injection,where the shell code is executed on the infected machine to get the image ofbot binary. This bot binary then itself installs on the infected machine andlater gets converted to a bot. The third stage involves the connection, the botbinary establishes the C channel which is used by the botmaster. Thefourth stage, after the connection is established then the malicious stagestarts where the botmaster sends the commands to the botnet. The fifth stageincludes the updating and maintenance of bots by botmaster. A.
Related WorkAlthough a significant amount ofresearch work has been done on botnet detection but botnet detection techniquesusing network traffic flow analysis approach have only emerged in the last fewyears.Gu et al. developed the BotMiner thatdetects botnets which uses the group behavior analysis approach. It uses aclustering approach to find similar C communication behavior and makesclusters, later employs Snort 6.
The data set included non malicious datafrom the campus network and malicious data from running bot binaries in asandbox environment. The captured traffic files are converted into flows andflow exporter included the features such as the total number of packets perflow, average number of bytes per packet and average number of bytes persecond. The result showed that the BotMiner could detect botnets with detectionrate (DRs) between 75% and 100%. Strayer et al. proposed an IRC botnetdetection system which used machine learning techniques (classification andclustering) 2.
Firstly the classification technique is used to filter thechat type of traffic and later the clustering technique is used to find thegroup activities in the filtered traffic. Lastly, the analyzer was applied tothe cluster for botnet detection. The data set used was gathered from acontrolled testbed running bot binary.
They evaluated the classifiers against amultidimensional flow correlation technique which was designed and proposed. Zeidanloo et al. developed a detectionsystem that focused on P2P and IRC-based botnets 5. By using filtering,classification, and clustering approaches, it focused to detect botnets groupbehavior in a given traffic file.
A flow based technique was used to analyzetraffic and payload inspection was deployed for traffic filtering.Zhao et al. investigated a botnetdetection system based on flow intervals 3. The flow features of capturedtraffic packets were employed with Bayesian networks and decision treeclassifiers to detect the botnets.
They evaluated and analyzed the normal andmalicious attack traffic. The result showed DRs over 90% with the falsepositive rates (FPRs) under 5%. Haddadiet al. proposed the botnet detection approach based on botnet traffic analysis4. By establishing the HTTP and DNS communication with the publicly availabledomain names of botnet C server and legitimate web server, the normal andmalicious traffic was generated. Netflow with machine learning algorithm wasproposed to detect the botnets. Results achieved 97% DR and 3% FPR.The recent literature work for botnetdetection focuses more on the P2P and HTTP protocols 4.
This includes usingdifferent data mining or machine learning techniques such as neural networks,decision trees, or statistical methods that used flow features. Mostly thenormal traffic files are integrated with attack traffic file to evaluate theperformance of the proposed botnet detection systems. At last, this paper is aimed to use thefeatures exported by open source flow exporter and analyzing the flowexporter’s effect on the performance of botnet classification. III. METHODOLOGYEarly literature botnet traffic analysiswork used some network flow information, which included packet headers.
Most ofthem focus on certain type of protocols such as HTTP and DNS. This indicatesuse of protocol filtering in analyzing traffic data. No packet payload relatedinformation is incorporated in it. The possibility of detecting botnets byusing only features extracted from the traffic flow is explored.
A. Traffic Data SetThe traffic files obtained from botnetsthat used HTTP protocol as the communication protocol or HTTP based P2Ptopology that look like normal HTTP traffic are used for analysis. The botnettraffic files publically available at NETRESEC 7 and Snort 8 website areemployed for carrying out the research. The different botnets and domain namelist is as follow:1) Alexa: Alexa Internet, Inc 9 ranks the websites based ontheir page views and unique site users. Later this ranking is published as themost popular website list.2) Zeus: Zeus is one of the most well known botnet that collected banking databy using man-in-the-browser keystroke logging, form grabbing and can beutilized for any identity theft attack 10.3) Citadel: The Citadel botnet is the enhanced version ofZeus, which was developed by fixing Zeus bugs and adapted to new securityplatforms 11.
It stole more than $500 million and also infected more than 5million personal computer systems across different countries4) Conficker: In a servey, Conficker botnet was listed inDamballa top 10 botnets of the year. It was responsible for DDoS attacks andstealing banking credentials by using distributed computing resources and alsoinfected many medical devices 12.5) Cutwail: It is Pushdo trojan that originally used todistribute various other malwares like Zeus. It has its own spam module whichis known as Cutwail, that is responsible for large portion of worlds’s dailyspam traffic 13.6) Kelihos: This botnet is mainly involved in DDoS attacks andspreading email spam attacks. It also has capability of stealing Bitcoinwallets and later spreading links over various socail networking websites.
B. Flow GenerationFlow generation tools are responsiblefor summarizing the network packet headers. They collect the packet informationwith similar properties such as IP addresses and port numbers, later aggregatethem into flows, and then compute statistics such as the number of packets perflow.
To collect and analyze traffic flowdata, the following three network components should work together. 1) FlowExporter, that generates the flow data, 2) Flow Collector, which collects theflow data from exporter and, 3) Flow Analyzer, which analyzes the collecteddata. Tranalyzer is a lightweight unidirectionalflow exporter and analyzer which use an extended version of NetFlow featureset. It exports in both binary and ASCII formats and hence does not require anyadditional collector.A.
Flow AnalysisThe different machine learningapproaches are widely used for botnet detection such as C4.5 algorithm, SVM,ANN Bayesian Networks and Naïve Bayes.C4.
5: It is a decision tree algorithm which includes a tree-structured graphwhere the internal nodes represent conditions applied to attributes, the leaf nodes denote the class labelsand the path from root to leaves represent the classification rules. It aims tofind the smallest decision trees and later convert the trained tree intoif-then rule set. II. CONCLUSIONBotnets areconsidered as one of the most predominant aggressive threats against cybersecurity. The effective botnet detection is very challenging because of thecomplexity and changing technology that botnets adapt automatically nowdays.Many requirements that help in effective botnet detection are largelyunaddressed by most of the existing detection schemes include early detection,novelty detection, and adaptibility.
Hence, the need for botnet detectionapproach that can adapt to the botnet evoution is very necessary. To solve thisproblem, various automatic botnet detection approches use network trafficanalysis. Different systems employ the particular network traffic feature setbased on flows in their analysis of the traffic.
The selection of feature set and protocolfilter is very important and can greatly affect the performance of botnetdetection systems