Click Fraud Detection with LightGBM

Abstract:

Fraud in the current web advertising scenario represents a serious risk to the Internet economy and advertising industry. One of the most widespread and daunting problem in online advertising is click fraud. In spite of the fact that online advertisers constantly improve their traffic filtering techniques, they still lack effective defense to detect click fraud independently. Thus, having an effective fraud detection algorithm is pivotal for online advertising businesses. In this paper we analyzed click patterns over a generous dataset covering 200 million clicks over 4 days. The key idea was to measure the journey of a user’s click across their portfolio and flag IP addresses who produce lots of clicks, but never end up in installing apps. In our study, we used the modern machine learning algorithm, LightGBM - a Gradient Boosting Decision Tree-type method. The algorithm gave an average precision of 98%. In our research, the literature review was the central source to confirm our results.