Skip to content

WayneDW/Automated-Equity-Asset-Selection-and-Allocation

Repository files navigation

This repository is separated into four parts:

1. Data Preprocessing

./crawl.py

Output: ./input/raw_*

crawl ticker list and basic stock from http://www.nasdaq.com/screening/company-list.aspx

crawl financial statement from http://financials.morningstar.com/ratios/r.html?t=BIDU&region=USA&culture=en_US

crawl historical stock prices from https://finance.yahoo.com/

2. Feature engineering

./feature engineering.py

Input: ./input/feature_projection
Output: ./input/feature_label_*, selected_feature_*

2.1 Feature format

Format basic information to the (sample_n, feature_m) matrix

	[ feature1_sample_1, feature2_sample_1, ... feature_m_sample_1]

	...

	[ feature1_sample_m, feature2_sample_m, ... feature_m_sample_n]

Before filter, a sample has feature dimension 80 * 11 (880 financial ratios)

2.2 Missing value filter

Delele the feature if it has more than N% missing values (we can set N as 1, 5, 10)

2.3 feature time-horizon completeness check

Since we need a complete time window to shift, if it doesn't have full 11 data, delete all of this

type of feature, such P/E or Asset turnover

2.4 Feature date filter

Some stock may only have data from 2005 - 2014

Some starts at April 2007, we regard them as 2006

3. Data accuracy check

./dataCheck.py

4. Stock classification based on financial statements

./learning.py

Input: ./input/feature_label_*
Output: ./output/result_*, ./output/tickers_*

4.1 Train the model

Take financial ratios (2006 - Dec.2014) to train the model

Increase training sample by using 2006-2011, 2007-2012, 2008-2013, 2009-2014, 2010-2015

Label based on Sortino ratio

4.2 Predict data and make comparison

In summary, gradient boosting gives us the best performance (highest precision of label 1 in average)

5. Optimize the best weight for your portfolio

./mean_variance_optimization.py

Do a brute-force method to randomly pick 15 stocks from the stock sets and implement mean-variance portfolio with no short constraint

Delelte the portfolio with the worst CVAR

About

Optimal portfolio selection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages