
My project is about credit card fraud detecting in online transaction. The goal of this project is to identify any possible fraud within data gathered on internet as part of personal development.
The research target is for me to learn different method to do data cleaning such as classifying data using averages, or classify categorical. After producing a data quality report, it is the time for some fraud analysis. I split the fraud into four different types:
- Type I: unusual transaction frequency at both card and merchant level in 3-day, 7-day, 14-day and 28-day time windows, eg. card_frequency_3
- Type II: unusual transaction amount at both card and merchant level in 3-day, 7-day, 14-day and 28-day time windows, eg. merchant_amount_to_median_7
- Type III: unusual transaction location at both card and merchant level in 3-day, 7-day, 14-day and 28-day time windows, eg. merchant_distinct_zip_14
- Type IV: unusual transaction interactions between card and merchant in 3-day, 7-day, 14-day and 28-day time windows, eg. card_distinct_merch_28
Later on, i learned that there are different method to identify fraud using machine learning such as using the random forest etc.
Over all, the project is very promising and I learned a lot from it. There are challenges in between especially during understanding the different method of machine learning (Some of them are very confusing), however I still manage to understand some of them through online YouTube videos which enriched my knowledge.
As a result, I have identified the frauds and learned why it is important to identify fraud.
My fraud rate is 0.3% and there are 95000 transactions in total
- 95000 transactions per year
- 0.3% fraud rate
- Average amount of fraud/non-fraud: • Non-Fraud: $377/transaction
- Fraud: $1524/transaction
- Expected Business Value=95000x0.3%x1524x72.37%- 95000x(1-0.3%)x377x0.07% = 290,000 ($/year)
A simple fraud could lead to a loss in profit like this.




近期评论