PYSPARK Apache Spark has become one of the most commonly used and supported open-source tools for machine learning and data science. For more info on pyspark dive into these links Apache Spark Apache spark for classification & Regression For linear regression Notebook , Data is taken from Kaggle Competition