Cyber Threat Intelligence for Zero-day Attacks from Dark and Surface Web Information

Abstract:

In parallel to our growing online presence in recent years, zero-day cyberthreats have also become more common- place and causing a considerable amount of financial damage each year. To protect the end-users from zero-day cyberthreats, it is imperative to provide them with the timely and actionable cyberthreat information. In this work, our approach is building a model to identify cyberthreats for zero-day attacks using heterogeneous information sources, including the dark web. For model building, we investigated three different feature extraction techniques to evaluate their effect on classifying zero-day cyberthreat information using different classifiers. Experimental results reveal that the use of relevant dark web data in model training significantly improves its performance to identify zero-day threats irrespective of the feature extraction technique used while TF-IDF based extraction yields better results than word or sentence based feature extraction in our case. We demonstrated that this approach is equally benefiting for extracting threat intelligence from multilingual surface web data. Our results are validated using two different performance metrics using four different classifiers.