Fraud detection in Payments

Fraud detection

Through this blog, I would be talking about one of the very important usecases that many companies try to focus on which is detecting fraudulent activities in financial transactions.


Scenario No output labels ( unsupervised learning)

In this case we can use an algorithm called as oneclass svm. This is an unsupervised learning algorithm,doesnt need output labels.

OneClassSVM is trained on “healthy” data, in our case the normal transactions, and learns how that pattern is. When introduced to data that has an abnormal pattern compared to what it has been trained on, it is classified as an outlier.

If we get a bit more nerdy, One Class SVM is inspired by how SVM (Support vector machines) separates different classifications by the help of a hyperplane margin. Outliers are detected, based on their distance to the normal data, making it an interesting algorithm for such a problem like we are having. We train our normal transactions on the algorithm, and thereby creates a model that contains a representational model of this data. When introduced to observations that are too different, they are labeled as out-of-class.
The One Class SVM algorithm returns values that are either positive values (for inliners) or negative values (for outliers). The more negative the value is, the longer the distance from the separating hyperplane

When the model has been trained, we can now predict the classifications of outliers:

With the nu-parameter, we can decide how many errors the training data is allowed to have. That means we can determine the maximum number of false positives (transaction is normal but we predict that it is fraudulent). With One Class SVM we can decide whether it is more important for our model to catch all fraudulent transactions, increasing the probability of false negatives where we wrongly block customers credit cards, or allow that our model do not catch all fraudulent transactions, but reduces the probability of false positives.

When having nu=0.2, we allow 20% of our normal transactions to be miss-classified. When running the code above with nu=0.2, we wrightly classify 80.01% as normal transactions, and 93.09% as fraudulent.

Here in this case false negatives are very important i.e our model is predicting a fraud transaction as normal which is catastrophic!

Scenario 2 Output labels are given (Supervised learning)

After finding out the fraud transactions we have a labelled dataset ,we have the labels as fraud or normal. Though it is not 100% accurate but still we have something to play with.

So our next step would be to train a supervised learning algorithm to detect fraud activities.

The most popular algorithms are :

1 Isolation forest

2 Local Outlier method

Isolation Forest : Lets see the intution behind this algorithm……

One of the newest techniques to detect anomalies is called Isolation Forests. The algorithm is based on the fact that anomalies are data points that are few and different. As a result of these properties, anomalies are susceptible to a mechanism called isolation.

This method is highly useful and is fundamentally different from all existing methods. It introduces the use of isolation as a more effective and efficient means to detect anomalies than the commonly used basic distance and density measures. Moreover, this method is an algorithm with a low linear time complexity and a small memory requirement. It builds a good performing model with a small number of trees using small sub-samples of fixed size, regardless of the size of a data set.

Typical machine learning methods tend to work better when the patterns they try to learn are balanced, meaning the same amount of good and bad behaviors are present in the dataset.

How Isolation Forests Work The Isolation Forest algorithm isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. The logic argument goes: isolating anomaly observations is easier because only a few conditions are needed to separate those cases from the normal observations. On the other hand, isolating normal observations require more conditions. Therefore, an anomaly score can be calculated as the number of conditions required to separate a given observation.

The way that the algorithm constructs the separation is by first creating isolation trees, or random decision trees. Then, the score is calculated as the path length to isolate the observation

Each isolation tree is created using the following steps:

1. Randomly sample N instances from your training dataset.

At each node:

2. Randomly choose a feature to split upon.

3. Randomly choose a split value from a uniform distribution spanning from the minimum value to the maximum value of the feature chosen in Step 2.

Steps 2 and 3 are repeated recursively until, in principle, all N instances from your sample are “isolated” in leaf nodes of your isolation tree — one training instance per leaf node. In practice, we don’t need to build the tree so deeply and can apply a height limit.


it is very evident from this diagram that outliers can be detected at the early levels of a tree.

Lets go about actually implementing this solution while using some data.

This is an opensource dataset from kaggle ( credit card detection). Feel free to play with it.

The dataset that is used contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a Principal component analysis transformation. Unfortunately, due to confidentiality issues, the original features and more background information about the data is not provided. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are ‘Time’ and ‘Amount’. Feature ‘Time’ contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature ‘Amount’ is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature ‘Class’ is the response variable and it takes value 1 in case of fraud and 0 otherwise.


Given the class imbalance ratio, I am using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

Exploratory data analysis

Comparison of the amount

Observations It is pretty evident from the result that fraud activities have very little amount spent as compared to normal transactions.

Lets talk about another algorithm…

Local outlier factor

Intution behind algorithm

The LOF algorithm is an unsupervised outlier detection method which computes the local density deviation of a given data point with respect to its neighbors. It considers as outlier samples that have a substantially lower density than their neighbors.

The number of neighbors considered, (parameter n_neighbors) is typically chosen 1) greater than the minimum number of objects a cluster has to contain, so that other objects can be local outliers relative to this cluster, and 2) smaller than the maximum number of close by objects that can potentially be local outliers. In practice, such informations are generally not available, and taking n_neighbors=20 appears to work well in general.

These were the results

Isolation forest

Important Observations obtained

  • Isolation Forest detected 73 errors versus Local Outlier Factor detecting 97 errors vs. SVM detecting 8516 errors
  • Isolation Forest has a 99.74% more accurate than LOF of 99.65% and SVM of 70.09
  • When comparing error precision & recall for 3 models , the Isolation Forest performed much better than the LOF as we can see that the detection of fraud cases is around 27 % versus LOF detection rate of just 2 % and SVM of 0%.
  • So overall Isolation Forest Method performed much better in determining the fraud cases which is around 30%.
  • We can also improve on this accuracy by increasing the sample size or use deep learning algorithms however at the cost of computational expense.We can also use complex anomaly detection models to get better accuracy in determining more fraudulent cases.

Hope you have learnt something useful by following this blog :)

ML engineer | Data scientist