TY - GEN
T1 - Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph
AU - Alabdulmohsin, Ibrahim
AU - Han, Yufei
AU - Shen, Yun
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2016/10/26
Y1 - 2016/10/26
N2 - Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy. © 2016 Copyright held by the owner/author(s).
AB - Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy. © 2016 Copyright held by the owner/author(s).
UR - http://hdl.handle.net/10754/622527
UR - http://dl.acm.org/citation.cfm?doid=2983323.2983700
UR - http://www.scopus.com/inward/record.url?scp=84996538799&partnerID=8YFLogxK
U2 - 10.1145/2983323.2983700
DO - 10.1145/2983323.2983700
M3 - Conference contribution
SN - 9781450340731
SP - 2395
EP - 2400
BT - Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16
PB - Association for Computing Machinery (ACM)
ER -