Skip to content

tarunsingh272/ml-as-a-service-pipeline

 
 

Repository files navigation

EGDAR Sentiment Analysis Data Pipeline on Kubernetes Engine

Cover Image

The Goal of this project is to create a data pipeline which creates a labeled dataset, using which we train a ML Model Pipeline and Deploy a Flask App on a Kubernetes Cluster. Everything is Managed On the Cloud.

Lets Get Started!

This Project has 4 Stages

  1. Annotation Pipeline
    • This is the starting point for the main pipeline.
    • It Generates a Database of A Labeled Dataset using Azure Text Analytics API
    • Entire Database is stored in a AWS S3 bucket
  2. Machine Learning Pipeline
    • This is the second pipeline.
    • The Database created in the Annotation Pipeline is used to train our model
    • The trained model is stored on a S3 bucket
  3. REST Flask App
    • The trained model is incubated in a Python Flask REST App
    • The Flask App is tested inside a Docker Container
    • The Docker Container is Deployed on a Google Cloud Kubernetes Engine
  4. Inference Pipeline
    • Inference Pipeline is an Automated Sentiment Analysis Pipeline
    • It scrapes EDGAR Earning Call Transcript Data and stores it in the cloud
    • Using the Flask Webapp in Stage 3, It predicts the sentiment of the document.

Getting Started

These instructions will get you a copy of the project up and running on your Local Environment using Cloud Infrastructure

git clone www.github.com/kashishshah881/ml-as-a-service-pipeline

Prerequisites

Python3.7
AWS Account
GCP Account
Microsoft Azure Account

Installing

What things you need to install the software:

pip3 install -r requirements.txt

Steps For Running on AWS EC2 Cloud

Step 1:
  • Create Multiple AWS S3 Buckets
  • Configure IAM Role having Full S3 Bucket Access in your local environment. Learn More Here
  • Create a GCP Account. Get Started Here
  • Create an Azure Account. Get Started Here
  • Request a Metaflow Sandbox to run your pipeline on AWS Batch.
Step 2:
  • Once Everything is setup, Configure Metaflow's Sandbox. Run metaflow configure sandbox on CLI. Enter The API Keys from Step 1
  • Configure the input/output buckets on AWS S3 and Enter the bucket name in Annotation Pipeline , ML Pipeline , Inference Pipeline and Flask App
  • Lastly, add the Azure Api Keys Here
Step 3:

Run on CLI

  • Change the permission of the files

chmod a+x Annotation\ Pipeline/index.py ML\ Pipeline/index.py Inference\ Pipeline/index.py

  • Running the Annotation Pipeline

./Annotation\ Pipeline/index.py run --with sandbox

  • Running the Machine Learning Pipeline

./ML\ Pipeline/index.py run --with sandbox

  • Creating a docker container of the flask app

cd REST\ Flask\ App/
docker build .
docker login --username=yourhubusername --email=youremail@company.com
docker push yourhubusername/reponame

Step 4:

Once the Dockerized Flask App is in the repo in Step 3, Create a Kubernetes Cluster on Google Cloud Product and Deploy your Docker File From Hub. Learn More Here
Now Your Flask App Is Up! and Accessible from Anywhere Across The World!

Step 5:

Add the required Tickerfile bucket location in Inference Pipeline
Add Bucket Location Inference Pipeline
Add the IP Address and Port Number Obtained from The GCP Kubernetes Cluster in Inference Pipeline

Built With

Authors

  • Kashish Shah - Design, Architect and Deployment - Linkedin
  • Manogana Mantripragada - Machine Learning Engineer - Linkedin
  • Dhruv Panchal - Research - Linkedin

License

This project is licensed under the Commons Clause License - see the LICENSE.md file for details

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.0%
  • HTML 2.4%
  • Dockerfile 1.6%