Skip to content

A data analysis project that entails using the data from a fictional loan company known as Prosper to perform exploratory data analysis using univariate, bivariate and multivariate visualizations to produce insights that answers questions asked from the data

License

Notifications You must be signed in to change notification settings

mchenryspagg/prosper-loan-project

Repository files navigation

(Prosper Loan Data)

Table of contents

Dataset

  • This data set contains the customer's data from a fictional loan company known as Prosper. This dataset comprises of 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others.

  • Out of these 81 variables, the analysis was focused on just 17 variables of interest to help with answering the research questions. A new dataset with the variables of interest was created with a structure of 106312 observations and 17 features.

  • The loan dataset includes :

categorical variables quantitative variables
ListingCreationDate Term
LoanStatus BorrowerAPR
ListingCategory BorrowerRate
EmploymentStatus LenderYield
IsBorrowerHomeowner EmploymentStatusDuration
IncomeRange StatedMonthlyIncome
IncomeVerifiable LoanOriginalAmount
LoanOriginationDate MonthlyLoanPayment
LoanOriginationQuarter

Two new variables were created during data wrangling namely

new categorical variables
LoanOriginationMonth
LoanOriginationYear

Summary of Data Wrangling

  • The first step involved was gathering the Prosper loan data, after which we created a subset from the main dataset consisting of 17 variables and 106312 observations which we worked with.

Data cleaning was then carried out on the new dataset created which involved :

  • Converting EmploymentStatusDuration to integer;
  • Renaming ListCategory (Numeric) variable to ListCategory and its numeric values of 1-20 replaced with its categorical listing;
  • ListingCreationDate and LoanOriginationDate was converted to a datetime datatype;
  • LoanStatus, ListingCategory, EmploymentStatus, IncomeRange, & LoanOriginationQuarter was also converted to category datatype;
  • IncomeRange categories was reordered to be in a set order.

Research Questions

  • After the Data cleaning, we proceeded to define the main variables of interest with our research questions:
    1. Factors that affect a loan's outcome status.
    2. Factors that affects the borrower's APR or interest rate.
    3. Identifying if there are differences between loans depending on how large the original loan amount was?

Links

Solution URL:

Built with

  • Jupyter Notebook
  • Python
  • Pandas, Numpy, seaborn, matplotlib, nbconvert.

Key Insights

The main insights as listed are those which directly or indirectly answers the research questions as earlier portrayed looking at the univariate, bivariate and multivariate visualizations done during the data exploration.

  • A loan outcome status is affected by the loan Term which is the length of the loan. The Term of the loan however is dependent on the LoanOriginalAmount as smaller loan amounts get smaller loan Term and vice versa.

  • It was observed that loans with a Term of 12 months had a higher rate of having a loan outcome status of completed.

  • It was observed that higher loan original amounts are more prone to have a loan outcome status of Defaulted, or PastDue while lower loan amounts have a higher rate of having a loan outcome status of completed.

  • A loan's Term and its Original Amount will have a profound effect on the borrower's annual percentage rate (APR). Lower loan amounts of less than $10,000 and of a 12 month Term will have lower borrower Annual Percentage Rate.

  • The progression of loans obtained across months of the year shows that higher loan amounts are gotten by borrowers in January and December than in other periods of the year, with June been the month with the lowest loan amounts given out on average.

  • Borrowers with a yearly income range of $50,000 and above have access to the higher sizes of loans.

  • Employed borrowers accessed higher loans sizes than borrowers who are not employed or work part-time.

  • It was observed that the higher the loan original amount obtained, the higher the monthly loan payments.

  • 2009 had a very low proportion of loans given out to borrowers which was likely due to low or no business activity within the first quarter of that year while 2013 had a very high proportion of loans given out.

  • The highest percentage of loans across all years had a listing category of Debt Consolidation. This connotes that the biggest reason why people take loans is mainly to pay off their other huge debts such as student loans, home equity loans etc. ...that's quite interesting and worrisome 🤔🤔🤔

Useful resources

Acknowledgments

Special thanks to ALX - T and the entire ALG/ALX group and their sponsors for the sponsorship of this program and giving me the opportunity to be a beneficiary of the Udacity data analysis nanodegree program. Being able to complete a degree of this nature is a huge achievement. Special Thanks also to all the instructors and reviewers for their feedback and reviews. Also special thanks to my session lead whose weekly sessions has helped to make my learning more easier and better. And also to my colleagues for their support and encouragement.

About

A data analysis project that entails using the data from a fictional loan company known as Prosper to perform exploratory data analysis using univariate, bivariate and multivariate visualizations to produce insights that answers questions asked from the data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published