As cybercriminals are constantly updating their strategies to avoid being detected, traditional fraud detection tools, such as expert rules, are less effective as they do not incorporate recent fraud patterns as fast as the fraudsters are changing their behavior. To incorporate the fraudulent behavior fast, it is important to use advanced machine learning algorithms, such as deep neural networks, support vector machines or random forests. Notwithstanding, when constructing a fraud detection model there are several factors that impact the performance of the algorithm. In particular, the fact that there are a few number of frauds detected, the different financial costs associated with fraud losses and the fraud detection process, the short time response of the system, and how to preprocess and create features that accurately represent a customer behavior.
When constructing a fraud detection model, it is very important to use those features that allow accurate classification. Typical models only use raw transactional features, such as time, amount and place of the transaction. However, these approaches do not take into account the spending behavior of the customer, which is expected to help discover fraud patterns. A standard way to include these behavioral spending patterns consists of using aggregated features that creates a customer behavioral profile by grouping the transactions that each customer made during the last given number of hours, first by card or account number, then by transaction type, merchant group, country or other, followed by calculating the number of transactions and the total amount spent on those transactions.
Formally, the process of constructing aggregating features begins with selecting the subset of transactions made by a customer during the last hours.
where is the set containing all the transactions, and are the time and id of the transaction , respectably. And is a function that calculates the time difference between transactions and . Afterwards, new features are estimated by either counting the number of transactions in the subset or by calculating the summatory of the amounts of the transactions in the subset. To further clarify how the aggregated features are calculated we show an example. Consider a set of transactions made by a customer between the first and third of January of 2015, as shown in the following Table. Then we estimate four aggregated features, the number and summatory of the transactions made by a customer in the previous 6 and 24 hours. The different aggregated features provides us with different information on the customer’s spending behavior. Moreover, the total number of aggregated features can grow quite quickly, as can have several values, and the combination of combination criteria can be quite large as well.
|Raw Features||Aggregated Features|
|TrxId||Time||Type||Country||Amt||No Trx last 6h||Amt last 6h||No Trx last 24h||Amt last 24h|
However, when using the aggregated features, there is still some information that is not captured correctly by those features. For example, in only using aggregated features, it is not possible to have an accurate time frame of when a customer is expected to make new transactions. Moreover, the issue when dealing with the time of the transaction, and specifically, when analyzing a feature such as the mean of transactions time, it is easy to make the mistake of using the arithmetic mean. Indeed, the arithmetic mean is not a correct way to average time because it does not take into account the periodic behavior of the time feature. In the following figure, an example of an histogram (using a 24 hours clock) of different transactions made by a customer is shown. It can be seen that the arithmetic mean of the transactions time (dashed line) do not accurately represents the actual times distribution.
To deal with this issue, a method that models the time of a transaction using the von-Mises distribution was proposed in a recent academic paper. The von Mises distribution, also known as the periodic normal distribution, is a distribution of a wrapped normal distributed variable across a circle. The von Mises distribution of a subset of transactions made by the same customer is calculated as
where and are the periodic mean and periodic standard deviation, respectively. Moreover, the Von Mises distribution is calculated using the first order Bessel function
Using this periodic distribution, it is possible to correctly estimate the time frame of a customer’s transactions, and with that, analyze if the behavior of a transaction is different than the customer’s normal behavior. For example, as can be observed in the following figure, the von-Mises distribution (purple area) makes a more realistic representation of the actual customer transactional times.
Then, using the estimated distribution, a new set of features can be extracted, i.e., a binary feature that takes the value of one if a new transaction time is within the confidence interval with given probability. An example is presented in the next figure. Furthermore, other features can be calculated, as the confidence interval range can be calculated for several sizes of the confidence interval, and also the time period can have an arbitrary size, i.e., last hour, last day, last week and so on.
In the aforementioned study, the authors evaluated the results by using different machine learning algorithms and comparing the different sets of features, raw, aggregated and periodic. The results are shown in the following figure.
In summary, regardless of which machine learning algorithm, there is a significant increase in the predictive power of the algorithms by using the periodic features, allowing businesses to improve detection rates and react faster to newer fraudulent strategies.