NATO Advanced Study Institute on Mining Massive Data Sets for Security

Statistical techniques for fraud detection, prevention, and evaluation

author: David Hand, Imperial College London
published: Dec. 3, 2007, recorded: September 2007, views: 41970

Slides

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Lecture popularity: You need to login to cast your vote.

Description

The talk begins by setting the context: fraud is defined and its breadth outlined; figures are given showing how significant fraud is; and different areas of fraud are examined, including health care fraud, banking fraud, and scientific fraud.

The particular data analytic challenges of banking fraud are described and illustrated in detail. These include the fact that the classes are highly unbalanced (with typically no more than 1 in a 1000 transactions being fraudulent), that class labels may often be incorrect, that there will typically be delays in discovering the true labels, that the transaction arrival times are random, that the data are dynamic, and, perhaps most challenging of all, that the distributions are reactive, changing in response to the implementation of fraud detection systems. The role of mechanistic and empirical models in tackling these problems is described. Both have been widely used, and both have a contribution to make.

Banking data, and in particular banking fraud data are examined in detail. Raw credit card transaction data have 70-80 variables per transaction, and this can be multiplied many-fold for behavioural data, as in fraud detection problems. Questions arise as to how to aggregate the data: should one try to classify individual transactions or should activity records be constructed?

A fundamental aspect of any predictive problem in data analysis is the choice of an appropriate criterion for estimation and performance assessment. In the case of fraud, one needs, in particular, to combine both classification accuracy and timeliness of classification. This means that standard measures of classification performance, such as error rate, AUC, KS statistic, information value, etc, are not sufficient. Suitable measures and performance curves are described which combine these aspects and which are now being adopted by the industry.

Various statistical (used here in John Chambers’s sense of ‘greater statistics’) approaches have been developed for fraud detection problems, and some are described and illustrated, using data from some of the banks which have been collaborating with us. In particular, we look at supervised classification and anomaly detection methods. Finally in the context of banking fraud, some of the deeper but very important conceptual issues are outlined, including the economic imperative, whether fraud is now becoming ‘acceptable’, and what exactly we learn from empirical comparisons, Scientific fraud is contrasted with banking fraud. They have rather different drivers. In particular, financial gain is generally irrelevant to scientific fraud, which makes it an unusual kind of fraud - although, of course, the impact can be even more serious. Several examples are given, from a range of disciplines. The role of data analytic tools in detecting scientific fraud, and the nature of such tools, is described

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

1 Atif Abdul-Rahman, February 10, 2008 at 8:13 p.m.:

This is a very good presentation with good balance between breadth of coverage and specificity in issues like issues faced when building a model evaluation framework.

Mr. Hand's paper, Statistical Review of Fraud Detection, 2002 is also worth referencing.

2 vasanti Dutta, September 16, 2009 at 10:27 a.m.:

Very useful presentation giving the necessary details related to the topic.
I would like to read the paper if it can be made available.
Thanks and Regards.

3 mesbah, April 13, 2010 at 10:19 p.m.:

very good presentation indeed! very usefull clear and easy to understand!

4 chika, February 23, 2012 at 5:52 a.m.:

please i need your hard copy. Nice presentation

5 ashok sripati, March 16, 2013 at 5:58 a.m.:

Excellent and informative. Can I have a PDF version of this valuable lecture?

6 Gaber Cerle, March 20, 2013 at 10:35 a.m.:

You can always download the slides :). See links bellow the description.

Write your own review or comment:

Comment:
Name:
Email address:
URL:

make sure you have javascript enabled or clear this field: