Building an algorithm to predict Mortgage approval based on historical HMDA lending practices
Автор: Brian Byrne
Загружено: 2021-03-01
Просмотров: 629
https://sites.google.com/view/vinegar...
Data Description
dir: debt payments to total income ratio;
hir: housing expenses to income ratio;
lvr: ratio of size of loan to assessed value of property;
ccs: consumer credit score;
mcs: mortgage credit score;
pbcr: public bad credit record;
dmi: denied mortgage insurance;
self: self employed;
single: applicant is single;
uria: 1989 Massachusetts unemployment rate applicant's industry;
condominiom: condominium;
black: race of applicant black;
deny: mortgage application denied;
Munnell, Tootell, Browne, and McEneaney (1996) at the Boston Fed examined mortgage lending in Boston to determine if race played a significant role in determining who was approved for a mortgage. The primary econometric technique they relied upon was logistic regression where race was included as one of the predictors or independent variables. The coefficient on race showed a statistically significant negative impact on probability of getting a mortgage for minority applicants. This finding prompted considerable subsequent debate and discussion. Here we apply machine learning techniques of the type suggested by Varian (2014). The data consists of 2380 observations of 12 predictors, one of which was race.
We extend the analysis to consider how to train algorithms to automate the lending or mortgage approval process and then test algorithm against actual out-of-sample data. We use the sklearn library and import a number of models including Logistic Regression, SVMs, K Nearest Neighbours, Decision Trees and Random Forest classifiers. We then use historical lending patterns to shape eligibility and predict mortgage approval. The algorithms do nothing more than merely attempt to replicate the historcal loan patterns of lending officers. The lending algorithms created therefore are not state of the art but do reflect historcal norms - flawed or not. These benchmarks nevertheless could be applied to determine how patterns in lending change.
#Using Logistic Regression Algorithm to the Training Set
from sklearn.linear_model import LogisticRegression
log = LogisticRegression(random_state = 0)
log.fit(X_train, y_train)
#Using KNeighborsClassifier Method of neighbors class to use Nearest Neighbor algorithm
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)
#Using SVC method of svm class to use Support Vector Machine Algorithm
from sklearn.svm import SVC
svc_lin = SVC(kernel = 'linear', random_state = 0)
svc_lin.fit(X_train, y_train)
#Using SVC method of svm class to use Kernel SVM Algorithm
from sklearn.svm import SVC
svc_rbf = SVC(kernel = 'rbf', random_state = 0)
svc_rbf.fit(X_train, y_train)
#Using GaussianNB method of naïve_bayes class to use Naïve Bayes Algorithm
from sklearn.naive_bayes import GaussianNB
gauss = GaussianNB()
gauss.fit(X_train, y_train)
#Using DecisionTreeClassifier of tree class to use Decision Tree Algorithm
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
tree.fit(X_train, y_train)
#Using RandomForestClassifier method of ensemble class to use Random Forest Classification algorithm
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
forest.fit(X_train, y_train)
#print model accuracy on the training data.
print('[0]Logistic Regression Training Accuracy:', log.score(X_train, y_train))
print('[1]K Nearest Neighbor Training Accuracy:', knn.score(X_train, y_train))
print('[2]Support Vector Machine (Linear Classifier) Training Accuracy:', svc_lin.score(X_train, y_train))
print('[3]Support Vector Machine (RBF Classifier) Training Accuracy:', svc_rbf.score(X_train, y_train))
print('[4]Gaussian Naive Bayes Training Accuracy:', gauss.score(X_train, y_train))
print('[5]Decision Tree Classifier Training Accuracy:', tree.score(X_train, y_train))
print('[6]Random Forest Classifier Training Accuracy:', forest.score(X_train, y_train))
return log, knn, svc_lin, svc_rbf, gauss, tree, forest
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: