A credit card fraud detection algorithm consists in identifying those transactions with a high probability of being fraudulent based on historical fraud patterns. The use of predictive modeling/machine learning in fraud detection has been a topic of interest in recent years. Different detection systems based on machine learning techniques have been successfully used for this problem, including neural networks, Bayesian learning, artificial immune systems, random forest, among others.

Normally, these algorithms are compared and evaluated using traditional binary classification measures such as false alert rate, misclassification error, receiver operating characteristic (ROC), Kolmogorov-Smirnov (KS) or F1-Score. However, these measures may not be the most appropriate evaluation criteria when evaluating fraud detection models, because they tacitly assume that misclassification errors carry the same cost, similarly with the correct classified transactions. This assumption does not hold in practice when wrongly predicting a fraudulent transaction. A legitimate transaction carries a significantly different financial cost than the inverse case.

In order to take into account the different costs of fraud detection during the evaluation of an algorithm, we used the modified cost matrix. The following table presents the cost matrix, where the costs associated with two types of correct classification, namely, true positives , and true negatives ; and the two types of misclassification errors, namely, false positives , and false negatives , are presented. Where in the case of false positive the associated cost is the administrative cost related to analyzing the transaction and contacting the card holder. This cost is the same assigned to a true positive , because in this case, the card holder will have to be contacted. Lastly, when a fraud is not detected, the losses of that particular fraud correspond to the stolen amount, therefore, the cost of a false negative is equal to the amount of the transaction . It is worth mentioning that since each transaction has different amounts, the cost of false negative depends on each transaction ().

Credit Card Fraud Detection Cost Matrix

Then using the actual () and the predicted () labels, the cost of using an algorithm on transactions is evaluated using:

In order to show the results using the cost evaluation measure, we compared different machine learning models trained using a real credit card fraud dataset provided by a large European card processing company. In particular, we evaluated a logistic regression, a decision tree and a random forest. The database contains approximately 750,000 transactions and a fraud ratio of 0.467%. Moreover, the total losses due to fraud are 866,410 Euros. We compare the results of the algorithms measured by the F1-Score and Costs as defined in the aforementioned equation.

It is observed that the best model measured by a statistic such as the F1-Score is not the one that minimizes the financial Cost. For example, the model that maximizes the F1-Score is the Decision Tree classifier. However, that model performs quite poorly when measured by Cost. On the other hand, the Random Forest algorithm is the one that minimizes the Cost measure, but has a bad performance measured by the F1-Score.

The previous example helps us understand the need for a more business-oriented measure like Cost. With this kind of measure, companies are able to make decisions that are better aligned to the company’s objectives. This way the discussion will not be about expected levels of false positives or customer satisfaction, but more focused on the actual economic impact of detecting electronic fraud.