Published on

Credit Card Fraud Detection

314 words2 min read–––


With the growth of digital payments, the incidence of credit card fraud has also increased. Fraudulent transactions can cause significant financial losses to credit card companies and their customers. Therefore, it is essential to develop accurate and efficient methods to detect fraudulent transactions.

This web application is designed to predict whether a credit card transaction is fraudulent or not using a machine learning model. The model was trained on a dataset of credit card transactions and uses a random forest classifier to make predictions.

The demo of tbe web application can be accessed at https://credit-card-fraud-detection-parthmaniar.streamlit.app.

Data source

This dataset consists of one million credit card transactions, including both fraudulent and non-fraudulent ones. The primary objective of this study is to develop an accurate and effective model that can predict fraudulent transactions based on their unique features.


  • distance_from_home: the distance from home where the transaction happened.

  • distance_from_last_transaction: the distance from last transaction happened.

  • ratio_to_median_purchase_price: Ratio of purchased price transaction to median purchase price.

  • repeat_retailer: Is the transaction happened from same retailer.

  • used_chip: Is the transaction through chip (credit card).

  • used_pin_number: Is the transaction happened by using PIN number.

  • online_order: Is the transaction an online order.

  • fraud: Is the transaction fraudulent.

As it can be seen from the charts, number of fraud transactions are significantly low when compared to non-fraud transactions.

When we look at the categoric variables for fraud transactions, we can safely say that almost all of the fraud transactions were online did not use pin number. These key points will be important when creating a model.


Model Training

I am using random forest classifier to train the model. Random forest is a popular ensemble learning algorithm that combines multiple decision trees to improve the performance and reduce overfitting.