FinanceDatabase/compression at d7eba6ffee27ad0bf93bddd8733e12de4aef3cfc · webbsledge/FinanceDatabase

History

Name		Name	Last commit message	Last commit date
parent directory ..
categories		categories
README.md		README.md
compression.ipynb		compression.ipynb
cryptos.bz2		cryptos.bz2
currencies.bz2		currencies.bz2
equities.bz2		equities.bz2
etfs.bz2		etfs.bz2
funds.bz2		funds.bz2
indices.bz2		indices.bz2
moneymarkets.bz2		moneymarkets.bz2

README.md

This compression notebook figures out what compression techniques are best suited for the database. It tries a variety of methods including csv, pickle and hdf. Based on these findings the compression technique is chosen. Here, the most important thing is file size given that every time someone access the database he is required to download the data file (unless stored locally). Therefore, I do not only test how long it takes for the files to get read in locally but also how long it would take to do so remotely.

It uses the methodology as described here: https://towardsdatascience.com/still-saving-your-data-in-csv-try-these-other-options-9abe8b83db3a

The conclusion is that Pickle (xz) results in the most efficient loading. However, to solve the vulnerability issue that arises with loading with Pickles I've decided to take the next best thing, this is the CSV BZ2 option which is about the same in terms of loading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compression

compression

README.md

Files

compression

Directory actions

More options

Directory actions

More options

Latest commit

History

compression

Folders and files

parent directory

README.md