-
Essay / Effective Credit Card Fraud Detection
Table of ContentsSummaryIntroductionDataset and DataPreprocessingDeepLearning and Neural NetworkDecision TreeRandom ForestConclusionReferencesSummaryWith the active development of E-commerce, the number of online bank card transactions is increasing rapidly, the risks are many to be taken into consideration by financial companies where most payments are made by credit card. Financial fraud is a growing problem with major consequences on online payment systems affecting online transactions. In case of credit cards, there is very less risk of fraud which can be up to 0.1%, but it still costs a lot of money and can also run into billions. Many techniques have been discovered to limit bank card fraud. E-commerce gives more visibility to sales, but it also exposes you to hackers. Therefore, they may use methods such as Trojans and phishing to steal other people's credit card information. Thanks to AI, we have many parameters to validate a transaction rather than just rules which may be simple. Effective credit card fraud detection is therefore very important. Since it can detect fraud in time based on historical transaction data, including normal and fraud transactions, to obtain fraudulent behavior features based on machine learning and deep learning techniques. Say no to plagiarism. Get Custom Essay on “Why Violent Video Games Should Not Be Banned”?Get Original EssayIntroductionIn this big and growing world of e-commerce, many people pay their bills online by credit card. Businesses invest a lot in preventing the risks of credit card fraud committed by hackers. Even with the credit card number (16 digits) and expiration date, anyone could hack it and use it to pay bills and do many other things sitting there and issuing it elsewhere and so detecting such frauds are not so easy because they occur in very small amount, 0.1%, which seems less, but can cost some companies more than billions, which is a very huge amount, and the Companies therefore invest a lot in such prevention. To prevent and detect such frauds we would like a safe and secure system and for such a system to exist we can use “AI” which will help our system to study the dataset and historical patterns with the different machine learning and deep learning algorithms in existence.DataSet and DataPreprocesingOur system will consist of different phases in which the 1st phase deals with data preprocessing, where we will create a suitable dataset which will ensure and contain less errors using the 'NumPy' library which includes the mathematical tool which helps in performing calculations. The next main library used here in this phase is the “Pandas” library which is known for its features to work with datasets, for example, it is used to import the datasets and manage the datasets in a way simple and effective. This complete phase only deals with datasets and normalizes them, which will later help our algorithms to finally get a correct result. The next phase deals with dividing the dataset into 3 parts: the first type is the training dataset, thesecond is the validation dataset and the third. tests the dataset. The training dataset is used initially where the machine is trained until it learns to predict the correct values, the testing dataset is used to evaluate the final model ready when training the The dataset is finished, after which our system is completely ready to perform training. real tests. It is important that the dataset is divided equally into these 3 datasets. For example, if there is 0.1% transaction fraud in the entire data set, these transactions should be divided equally into these sets. The training dataset is always large in size compared to the validation and testing dataset. DeepLearning and Neural Network This next phase deals with deep learning and neural network, here we will use the supervised model of Deep learning that will help us train a machine in such a way that it can accept inputs and perform specific calculations that give the desired result, so this model is called a supervised model. Here the main part is feature extraction, which means we only need to fetch the important data which will be used in the calculation. Let's say there are 2 features x1 and x2, then we design the system in such a way that it gives a result using these features which is x1 and x2, so the perception is like a formula or an equation which gives us a result . So we need to find a solution to these equations. But a single equation is not enough to find a result. So we need more than one perception, this means one or more perception outputs can be input into other similar perceptions and now there are many such equations and we have many perceptions input into another . Now we know that the features are in numerical format and they are multiplied by the weights, say w1 and w2 with respect to x1 and x2, so to find the right solution and we need to find the optimal values for all these w1 and w2, so we need to find the optimal value for the dataset in order to find a solution and hence the neural network is used. So now we need to find an output for this entire network for a given input and so here the neural network is needed to find the optimal solution and this method is called feedforward where we try to find the desired solution. The feedback process uses back-propagation and continues to try to discover the cause of the error or deviation from the desired reality. Now the other part is to wait for the machine to be trained on the given dataset, but this is a very long and slow process and takes time, so what we do is train the machine to a specific dataset for a given "EPOCHS", this is necessary because we need to train the data enough, meaning it should not be trained less or it should not be trained more, because less training results in wrong predictions since the learning is not complete and if we train more it becomes complex and difficult to predict the data correctly and once the overtraining process is over it cannot be undone and hence we have to struggling to sufficiently retrain the dataset. So we use a method where random data is removed using a probability between 0 and 1 so that it is. be trained appropriately and this processis called the “dropout layer” method. The question that then arises is: “How are these perceptions created? ". The answer is simple, they are used as activation functions, for example as a step function which takes either “0” as a value or a “1”. We will use these functions to find the solution. There are similar functions named “Sigmoid Function” and “Relu” which take values between 0 and 1 and the property of these functions is that the change is not drastic, but rather gradual. Decision Tree Decision tree is a supervised machine learning algorithm used for both classification and regression. This works for categorical and continuous variables. It uses a graph or tree-based decision model to predict the outcome. The model behaves like “if this then that” conditions ultimately giving us a particular outcome. Splitting is a process of dividing a node into subnodes. Branch is a subsection of a whole tree. The parent node is the one which is divided into sub-nodes and these sub-nodes are called the children of that parent node. The root node is the node that represents the entire sample and is the first node to perform the division. The leaves are the terminal nodes that are not divided and these nodes determine the outcome of the model. The tree nodes are split for a value of a certain attribute. The edges of the tree indicate the result of a division to the next node. Tree depth is an important concept. It shows how many questions were asked before the final prediction was made. The entropy of each attribute is calculated using the problem dataset. Entropy controls how data is divided in decision trees and how its boundaries are drawn. Information gain indicates how much information the feature provides about the class. The information gain must be maximized. The dataset is then divided into subsets using the attributes for which entropy is minimum or gain is maximum. This determines the attribute that best classifies the training data that is the root of the tree. This process is repeated in each branch. Decision trees work well on large data sets and are extremely fast. Decision trees are prone to overfitting, especially when a tree is particularly deep. Sizing can be used to avoid overfitting. The number of true negatives, false positives, false negatives, and true positives in the confusion matrix were 284,292, 23, 37, and 445, respectively. Random ForestRandom Forest is a supervised machine learning algorithm that is used at the both for classification and regression purposes. It is flexible, easy to use and offers high precision. It is a set of decision tree classifiers in which each tree is trained independently of the other. It has almost the same parameters as a bagging classifier or a decision tree. As the tree nodes expand, additional randomness is added to the model. The best feature from a random subset of features is selected to split a node. Due to this selection, a large diversity is generated, thus building a better model. Initially, a set of N random data points is selected from the training set. Then, a decision tree is constructed associated with the N selected data points. The number of trees to be built in the forest is decided and the above steps are repeated until the required number of trees in the forest is.