Analysis of Malware Impact on Network Traffic using Behavior-based Detection Technique

Malware is a software or computer program that is used to carry out malicious activity. Malware is made with the aim of harming user ’ s device because it can change user’s data, use up bandwidth and other resources without user's permission. Some research has been done before to identify the type of malware and its effects. But previous research only focused on grouping the types of malware that attack via network traffic. This research analyzes the impact of malware on network traffic using behavior-based detection techniques. This technique analyzes malware by running malware samples into an environment and monitoring the activities caused by malware samples. To obtain accurate results, the analysis is carried out by retrieving API call network information and network traffic activities. From the analysis of the malware API call network, information is generated about the order of the API call network used by malware. Using the network traffic, obtained malware activities by analyzing the behavior of network traffic malware, payload, and throughput of infected traffic. Furthermore, the results of the API call network sequence used by malware and the results of network traffic analysis, are analyzed so that the impact of malware on network traffic can be determined.


INTRODUCTION
In the era of IoT (Internet of Things), one of the biggest threats on the internet today is malicious software, usually called malware because almost all of the main causes of internet security problems are malware.Malware is a program that was created with the intention of damaging it by breaking into a computer system.Malware has various types, namely viruses, worms, spyware, adware, trojans, keyloggers, rootkits, botnets and phishing [1].Malware can be spread using various attack vector, including website and USB Flash Drive [2].One example of malware is botnet, botnet is usually used to send spam and phishing host websites that make it difficult to track and blacklist [3].In addition to botnets, malware that is often used to infiltrate a system is spyware.According to statistical data, 70-80% of spyware comes from websites that are considered safe by internet users [4].
Based on reports from ShadowServer, there are thousands of new malware samples that are received every day [3].Each malware sample is analyzed one by one with the aim to find out what type of malware, how big the threat is and how to deal with it.Complete information of malware sample can be retrieved by using static analysis, a malware analysis conducted by examining the source code of malware [5].However, for some cases this analysis is ineffective and inefficient because malware creators started using executable packing and obfuscation techniques, this technique can create several new malware variants from one malware [6].In order to overcome this problem, dynamic analysis is needed.
There are a lot of threats against network traffic, including eavesdropping, IP Poisoning, ARP Cache Poisoning [7], and malicious network traffic containing malware and virus.Dynamic Analysis usually analyzes malware behavior such as network activity, API calls, file operations and registry modification records by executing samples in a virtual environment.The disadvantage of dynamic analysis is that this method requires a lot of time and resources for executing malware [8].
On Windows Operating System, every executable program needs to make a set of API calls.For example, for file management there are several API calls namely, OpenFile, DeleteFile, FindClose, FindFirstFile, GetFileSize [9].Hence, API call information obtained after conducting dynamic analysis with behavior-based techniques is useful to find out what activities are carried out and what behaviors are owned by the malware so that it can be known what type of malware is and how much the threat is to the system based on the API call being run.
Analysis of malware on network traffic using dynamic analysis is necessary because there is not many previous researches has been conducted focusing on API Call [21].To get the results of this analysis in a complex way, we need API call network for each malicious activity run by malware using the Cuckoo Sandbox tool.Network APIs obtained after executing malware in a virtual environment will be analyzed in order of their calls and what network APIs are used by the malware so that it can know the activities carried out by malware that utilize the Windows API especially the API network.After all the information about the network API is obtained, an analysis of network traffic is recorded while running malware in a virtual environment using Wireshark.Analysis carried out on the results of capturing network traffic is to look at the behavior of malware on network traffic, the payload carried by malware, and the measurement of traffic between normal traffic and malicious traffic that has been infected with malware [22].
Based on the results of the analysis of the sequence of network APIs and network traffic, the final result of this research is an explanation of how the impact of each malware sample used in this study on network traffic.

LITERATURE STUDY 2.1 Malware
Malware or Malicious Software is software that is made to attack by damaging, disturbing, retrieving important information that involves the confidentiality, integrity and availability of data of a system or application [10].Malware is software that is explicitly designed to perform malicious activities such as Trojans, Viruses, Spyware and Exploit [11].The way malware works is by entering into a system such as an existing computer system through various applications that exist on the system or it can also be through sending data from a device that has been exposed to a virus [12].
Malware can be divided into several types according to the method of operation and its characteristics [13], along with the types of malware.
1. Logic Bomb: malware that has two core parts namely, contents and triggers.This type of malware will be active if the trigger is run.Similar to time bombs, Logic Bombs can be active for a certain period of time.2. Trojan Horse: a program that looks harmless but secretly runs malicious tasks.For example, when the user wants to do the login to a website by entering a username and password, but the website is already installed program password-grabbing, the role of the Trojan Horse is displaying a message error that states that the user entered your username and password, but behind the processes that the Trojan Horse is already taking information user authentication.3. Back Door: a mechanism that goes through security standard inspection processes such as authentication.4. Virus: a malware that will multiply itself by infecting other programs that are running. 5. Worm: malware that is similar to a virus that can copy itself, the difference is that a worm can copy itself from one system to another over a network and does not depend on executable programs.6. Rabbit: a program that uses all resources on a system.An example is the Fork Bomb which always makes a new process repeatedly with an infinite number of loops so that makes the system slow.7. Spyware: a program that takes a computer's information and sends it to other people.8. Adware: a program similar to Spyware but the focus is marketing advertisement.9. Zombies: a term for a computer that has been attacked / hacked without the user's knowledge.
Usually the activity carried out is to spread spam email.

Dynamic Analysis
Malware Analysis is a collection of processes carried out by dissecting malware to understand how it works, how to identify it and how to defeat or eliminate it.This process is needed to adapt the development of malware detection techniques to the growth of new malware.Malware Analysis is divided into three methods namely Static Analysis, Dynamic Analysis and Hybrid Analysis [14].
Dynamic Analysis is a method of detecting malware by running the malware in a virtual environment.This method monitors the behavior of malware, its interactions with the system and its effects on the system.Analysis with this method is required to provide an isolated environment so as not to cause other programs to be affected by running malware directly [15].

Behavior-based technique
Behavior-based is a technique of the Dynamic Analysis method whose purpose is to determine the behavior of malware.This technique is carried out by executing malware and monitoring its behavior in network activity, API calls, file operations and registry modification notes.The drawback of this technique is that it takes resources and time for many, while the advantages of this technique is able to identify or introduction of new malware [16].
Behavior-based technique has several advantages and disadvantages [17].The following are advantages and disadvantages of behavior-based technique.The advantages are: • Detecting malware behavior is complex because this technique detects malware from behavior to the final destination of malware.• Can identify new malware.
• Can identify a lot of malware with less time compared to other techniques.
• Simplify the classification of malware because it uses several tools to automate detection.On another hand, the disadvantages of behavior-based technique are: • Does not detect in detail such as checking for signatures and bits of malware.
• Couldn't find a malware solution.
• It consumes a lot of resources because this detection has to run malware directly in an environment.

Windows Network API
APIs or Application Programming Interfaces are application programming interfaces created by Windows in order to interact with the kernel system.For example, if the application wants to run File Management, the activity will involve several API calls such as OpenFile, DeleteFile, FindClose, FindFirstFile, GetFileSize.There are two API calls that will be used in this research, they are memory API and network API.The Windows API is dynamic-link libraries (DLLs) that are part of the Windows operating system.The advantage of using the Windows API is that it can save development time because it contains dozens of functions already written and waiting to be used.Network API is an API in the Windows operating system that is used in the process of transfer/request data in the network, while also allowing communication between applications through the network.Network API is usually used by malware by hiding malicous code into sending traffic or requesting data on a network [18].

Network Analysis
Network analysis usually includes traffic analysis, protocol analysis, packet analysis, eavesdropping, etc. Network analysis is the process of capturing a network of traffic and checking it in detail to find out what is happening on the network.A network analyzer is a combination of hardware and software.In this study, the network analyzer used is Wireshark.
Wireshark is one of the best network analyzer applications available free.Wireshark has many advantages such as a good graphical user interface (GUI), supports 400 protocols, and is active in development and maintenance.Wireshark can be run on various operating systems such as UNIXbased systems, Mac OS X, and Windows [19].

RESEARCH METHOD
This research method is explained by the conceptual model.conceptual model in this study consists of several important components, namely problems, environment, concepts, methods, IT artifacts, and evaluation.The problem and environment component is one part that forms the basis of research.The problem component contains the factors that are the reasons for this research.The environmental component is the party or part involved in this research such as people and technology.Technology used in this research are malware dynamic analysis tools, Windows 7 OS, virtualization software, Cuckoo Sandbox, and network traffic.Artifacts of this research is analysis of malware impact on network traffic while evaluation method used is malware dynamic analysis

RESULTS AND DISCUSSION
Testing is conducted through network API analysis and network traffic.The first test was performed on the Cuckoo Sandbox that was installed on VMware.Using Cuckoo, information about all network APIs used by malware is obtained.The next test is testing using Wireshark.Wireshark is a network traffic analysis tool intended to monitor network traffic.This test aims to obtain the activity carried out by malware on network traffic so that it can be seen how the behavior of malware, payload, and its impact on network traffic.In this test, the analysis carried out is divided into three parts, namely the analysis of network traffic to determine the behavior of malware, analysis of the payload to determine the activities carried out, and analysis of throughput measurements to determine its impact on network traffic.

Cuckoo Sandbox (API network)
After analyzing Cuckoo for all malware samples, network API sequences can be grouped.Grouping is done based on TID (task ID) for each network API.Next is the grouping of API network sequences after analyzing 30 malware samples.
WSAStarup-InternetCrackUrlA-ObtainUserAgentStringsocket-gethostbyname-connect-closesocket API 9 Socket-ioctlsocket-gethostbyname-connect-selectclosesocket API 10 InternetOpenA-InternetOpenUrlA-InternetReadFile-InternetCloseHandle Using API Call listed on Table 1, experiment is conducted resulted in API network used the most as depicted in Figure 1.Based on the chart above, we can conclude that the API group network the most widely used is API Network Group 2, WSAStartup-socket-setsockopt-closesocket-GetAddrInfoW.This means that half of the 30 malware samples have functions that can use the Windows Socket API to provide connection-oriented two-way communication between client and server malware.With the Windows Socket API, malware servers can affect Windows networks such as Quality of Service (QoS) so that an infected network can decrease QoS on network traffic.
In the second place, the most common use of the API network group is API 1 and 3 groups with a total of 10 malware samples each.It can be said that as many as 10 malwares with the API Network Group 1 have activities to access a URL or IP and then provide file transfer services between client and server so that, without realizing it, the server can send and retrieve data on the client computer.Whereas 10 malwares with API Network Group 3 have activities to send and receive data through a predetermined socket.The API Network Group 3 will affect the throughput of network traffic because the process of sending and receiving data happens.

Wireshark 4.2.1 Malware Behavior in Network Traffic
The testing aims to determine the behavior of malware on network traffic.Following is the malware behavior on network traffic which is grouped into four parts, namely DNS and NetBIOS requests, DNS and rDNS, ICMP echo, suspicious activities [20].
• B2: Connection attempt to IP address failed while DNS request was successful.
• B3: attempt to connect to an IP address in the absence of rDNS.• B4: Connection attempt to IP address with rDNS failed.
• B5: ICMP echoes IP addresses without replies or error messages.
• B6: try a TCP connection to an IP address that has a successful message at ICMP echo request.• B7: use TCP connections to IP addresses that have never been used in DNS, NetBIOS, ICMP.
Figure 2. Malware Behavior From Figure 2, the most detectable malware behavior is B1 or the first behavior.A total of 15 malware samples were detected on B1.This is because half of the 30 malware samples make DNS requests to obtain IPs from each domain that the malware wants to connect to.

Payload
This experiment aims to determine the activities carried out by malware on network traffic.Here is a test from one of the samples  3, the first payload belongs to newup.txt.In the payload there is a command that directs the client to access the 33.exe file so that the HTTP request activity against 33.exe also occurs.Figure 4 is the payload owned by 33.exe.In the payload, there is an MZ code at the beginning of the payload.MZ is the code on Wireshark with a function to download a file with format executable.This can be proven by exporting the file via the File menu -Export Object -HTTP in Wireshark.

Throughput Analysis
In testing this section, aims to determine the impact of malware on network traffic, especially the speed of traffic after being infected with malware.To measure throughput in this analysis, a file test with an executable format is downloaded.This file is 15.17 MB in size.
To measure throughput, the data taken is the number of average bit/s.This data shows the amount of bandwidth used when downloading files that have been provided.In addition, the data shows the size of the data entering the traffic.The results of bandwidth testing can be seen on Table 2. Table 2 describe the normal traffic throughput reaches 4379000 bits/s or 4.3 Mbps.While malware traffic has a throughput between 1138-3716 Kbits/s.In addition, data that was successfully recorded on normal traffic were 17339071 Bytes.While data on malware traffic has a larger size.

Impact on Network Traffic
Based on the results of malware analysis with the API network and network traffic, it can be seen how the impact of malware on network traffic.The impact contained in the table below is a significant impact from different types of malware samples.This sample is categorized as a sample that conducts mining with socket minelitecoin.com:8336as its connection path.This process takes a considerable amount of bandwidth but the size of data recorded during traffic recording cannot be detected.

CONCLUSION
Based on the results of the analysis of the impact of malware on network traffic using behaviorbased detection techniques that have been done, the following conclusions are obtained: a. Analysis of the impact of malware on network traffic using behavior-based detection techniques can be done by looking at the behavior of malware through its activity.The malware activity can be known by arranging the network API sequences used by malware and analyzing network traffic that has been infected with malware.b.Malware activity can be identified by looking at the network API sequences used by malware.
This sequence can be grouped by looking at the Task ID of each process that is running.In one process requires several network API functions so that the activity cannot be determined by only identifying each network API.c. Malware activity can be determined through analysis of network traffic such as payload analysis, bandwidth usage, and malware network behavior.Payload analysis aims to see the activity of a malware by checking the functions or behaviors shown during testing.Throughput analysis aims to examine the impact of malware on network traffic.While the analysis of malware network behavior aims to detect whether the sample can be categorized as malware or not.

Table 1 .
List of API Network

Table 2
Throughput Measurement Result

Table 3
Impact of Malware on Network Traffic