Note: I'm not a machine learning expert. This is my first machine learning project. Pull requests for improvements are welcome.
This code is public domain. Use at your own risk. No warranties.
87072 malware binaries from virussign.com
4047 good binaries scraped from Windows and Linux
Trained on the 1st 20kb of each malware file.
Training consisted of 3 epochs.
accuracy: 0.9991
loss: 0.0029
val_accuracy: 0.9986
val_loss: 0.0043
Trained on the 1st 40kb of each malware file.
classify.py
shows an example of using both the 20k model and the 40k model to cancel out
each other's false positives and negatives.
git clone https://github.com/9cb14c1ec0/MalwareVision
cd MalwareVision
python3 -m venv .venv
source .venv/bin/activate
pip install tensorflow numpy keras
python3 classify.py
If you want to use an nvidia gpu you need to install the pip install tensorflow[and-cuda]
package via pip. Otherwise, the model will run on cpu.