Active Learning and Autoencoders in Banking Fraud Detection

Fraud detection and operational risk management are two main challenges faced by banks and financial institutions worldwide. Traditionally, static rule-based controls have been effective for uncovering known fraud patterns. However, with the rising anti-fraud requirements, it is essential to take fraud detection to the next level and detect emerging fraud types in a proactive manner. Unlike traditional solutions in the market, our solution works by using advanced analytics, dynamic profiling and machine learning to build up highly accurate customer profiles. All transactions linked to an account are continuously monitored across all channels and compared against the customer profile. The result is a massive reduction in the number of false positives, thereby maintaining an excellent customer and user experience. Machine-learning algorithms discover new fraud schemes, helping banks stay on top of emerging threats.

One of the challenges we have is the following: they receive a lot of data from the banks (customer transactions, ebanking activity) which are used by the software to perform fraud detection, and provide the banks with a score. However, the banks do not generally update us on whether the prediction of the algorithm was correct or which fraud cases were identified (as these checks are done manually by other bank departments). We are keen to develop more advanced machine-learning tools for their banks, which realise the importance of having data (transactions) which are labelled as fraud or non-fraud by bank staff. However, the labelling of the data is time consuming and has to be done manually by bank staff. This means that it is practically impossible to label all the data and we need to limit the number of requests for labelling.

Another challenge concerns the fraud detection with new methods. The most advanced companies in the financial industry (PayPal for instance) have been pioneering advanced machine learning approaches such as deep neural networks. In this project we would like to use a particular form of them, autoencoders, which takes the core features of an input and reverses the process to recreate the input, keeping only the key features. It means that if the autoencoder tries to reconstruct something it does not recognize it won’t be able to reconstruct the main features. We assume therefore that using autoencoders for frauds and anomalies will suffer a high reconstruction error which should lead to a good detection rate of frauds.

Deep Learning (incl. autoencoders) is becoming a mature technology, showing results in many fields, such as image and video recognition, text processing or complex robotic tasks such as autonomous car driving. In fraud detection domain, Deep Learning is still a work in progress and the specific setup (unbalanced, unlabeled) is a real challenge. However, the modeling capacity of Neural Networks is almost unlimited and would allow to combine both transactional and internal audit data sources, while reaching high true positive and low false positive rates.

Collaboration: NetGuardians SA