Skip to content

VeerendraPappala/An-Article-A-Day

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 

Repository files navigation

An-Article-A-Day

Day-1 | Build Book Recommender Systems.

book-shelf

Article Is about Building Recommendation System for Books using Booksdata Set.

  • Virtually everyone has had an online experience where a website makes personalized recommendations. Amazon tells you “Customers Who Bought This Item Also Bought”, Udemy tells you “Students Who Viewed This Course Also Viewed”. Building recommender systems today requires specialized expertise in analytics, machine learning and software engineering, and learning new skills and tools is difficult and time-consuming.
  • Checkout the code from here - DAY1

Day-2 | Summarize Trump’s State of the Union Address

trump

Article Is about build a Text summarizer so as to summarize Trumps reactions on State of White Houses instead of listening entire speech for 82minutes

  • Automatic text summarization, is the process of creating a short, concise and coherent version of a longer document

  • Checkout the code from here - Day2

Day-3 | Time Forecast using Tpot Automated ML in python

tpot-logo

Article is about Time Forecast Using TPOT- An automated Machine Learning In Python

  • TPOT is meant to be an assistant that gives you ideas on how to solve a particular machine learning problem by exploring pipeline configurations that you might have never considered, then leaves the fine-tuning to more constrained parameter tuning techniques such as grid search.

  • So TPOT helps you find good algorithms. Note that it isn’t designed for automating deep learning — something like AutoKeras might be helpful there.

  • AutoML algorithms aren’t as simple as fitting one model on the dataset; they are considering multiple machine learning algorithms (random forests, linear models, SVMs, etc.) in a pipeline with multiple preprocessing steps (missing value imputation, scaling, PCA, feature selection, etc.), the hyperparameters for all of the models and preprocessing steps, as well as multiple ways to ensemble or stack the algorithms within the pipeline.

  • There are so many interesting directions to explore with TPOT and autoML. I’d like to compare TPOT with autoSKlearn, MLBox, Auto-Keras, and others. I’d also like to see how it performs with a greater variety of data, other imputation strategies, and other encoding strategies. A comparison with LightGBM, CatBoost , and deep learning algorithms would also be interesting.

  • For More info info Read the complete Article & Documentation

  • Checkout the complete Code here - Day3

Day-4 | PYTORCH AND ResNet for Traffic Sign Classification With PyTorch

pytorch-logo

Article is Restnet used with pytorch to classify German Traffic Board signs using Fastai in Python

  • PyTorch was one of the most popular frameworks in 2018.PyTorch is a Python based scientific computing package that is similar to NumPy, but with the added power of GPUs. It is also a deep learning framework that provides maximum flexibility and speed during implementing and building deep neural network architectures.

  • The FastAi library is a high level library build on PyTorch, which allows us to build models using only a few lines of code.The FastAI library provides a lot of different datasets which can be loaded in directly, but it also provides functionality for downloading images given a file containing the urls of these images.

  • Fastai allows us to perform transfer learning with less code and time by giving us the ability to set different learning rates for different parts in the network. This allows us to train the earlier layers less than the latter layers.

  • Get here more Info on pytorch official Guide, Deep Learning with PyTorch , Introduction on pytorch, Resnet for traffic signs Classification using Pytorch

  • Code for - Day4

Day-5 | Google Facets & Bookeh For Data Visualisation in Python

data-visualization-tools-in-python-15-638

Visualising Machine Learning Datasets with Google’s FACETS - An open source tool from Google to easily learn patterns from large amounts of data And Bokeh

Day-6 | Develop a NLP Model in Python & Deploy It with Flask

flask

In reality, generating predictions is only part of a machine learning project, although it is the most important part . This article is about how to deploying a model in Production using Flask.

Day-7 | PyViz: Simplifying the Data Visualisation process in Python.

pyviz

Article today is about An overview of the PyViz ecosystem to make data visualizations in Python easier to use, learn and more powerful.

  • Now, to choose the best tool for our job from amongst all of the above is a bit tricky and confusing. PyViz tries to plug this situation. It helps to streamline the process of working with small and large datasets (from a few points to billions) in a web browser, whether doing exploratory analysis, making simple widget-based tools or building full-featured dashboards.

  • HoloViews : Declarative objects for instantly visualizable data, building Bokeh plots from convenient high-level specifications.

  • The open source libraries which constitute PyViz are:

    • GeoViews
    • Datashader
    • hvPlot
    • Param
    • Holoviews
    • Bokeh
    • Panel
  • Get more info by reading these Article by Parul Pandey, Pyviz Documentation.

  • The dataset used in practise is Uniqlo (FastRetailing) Stock Price Prediction from Kaggle.

  • Here is the Data and Notebook of Day-7

DAY-8 | A Linear Regression with PySpark and MLlib

pyspark

Apache Spark has become one of the most commonly used and supported open-source tools for machine learning and data science.

DAY-9 | A Logistic Regression with Pyspark

merge

DAY-10 | K-Means with Pyspark

k-means_convergence

  • k-means clustering aims to converge on an optimal set of cluster centers (centroids) and cluster membership based on distance from these centroids via successive iterations, it is intuitive that the more optimal the positioning of these initial centroids, the fewer iterations of the k-means clustering algorithms will be required for convergence.
  • Today I've learned kmeans implementation using pyspark in python.
  • Article on K-means uisng pyspark is here
  • Notebook of Day-9

Day-11 | Classification with Pyspark

binary

  • Binary classification involves classifying the data into two groups, e.g. whether or not a customer buys a particular product or not (Yes/No), based on independent variables such as gender, age, location etc.
  • As the target variable is not continuous, binary classification model predicts the probability of a target variable to be Yes/No.
  • I learned implementing Binary Classification using Pyspark in Python as of today's work
  • Article I learned today.
  • Notebook of Day-10

Day-12 | NLP with Pyspark

what-is-nlp

  • A Natural language or Ordinary language is any language that has evolved naturally with time in humans through use and repetition without conscious planning or premeditation. Natural languages can take different forms, such as speech, signing or text.

  • Natural language processing is a field of Artificial Intelligence that explores computational methods for interpreting and processing natural language, in either textual or spoken form.

  • To implement NLP we have some useful tools available:

    • CoreNLP
    • NLTK, the most widely-mentioned NLP library for Python
    • TextBlob
    • Gensi,
    • SpaCy
  • NLTK environment setup and Installation in Apache Spark

    • Word tokenize
    • Remove Stopwords
    • Remove punctuations
    • Part of speech tagging
    • Named Entity Recognition
    • Lemmatization
    • Text Classification
  • Notebook - Day-12

Day-13 | Decision Tree & Random Forest In Pyspark

decisiontree

Day-14 | Bringing the best out of Jupyter Notebooks for Data Science

jupyter

  • Enhancing Jupyter Notebook’s productivity with these Tips & Tricks.

  • Exploring Jupyter Notebooks’ features which can enhance our productivity while working with them.

  • Notebook - Day-14

Day-15 | Data visualtion with Matplotlib, seaborn

plots

  • Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.

  • Seaborn is a library for making statistical graphics in Python. It is built on top of matplotlib and closely integrated with pandas data structures.

  • Get in touch with its official documentation here: * Seaborn * Matplotlib

  • Notebook - Day-15

Day-16 | Introduction to pytorch

pytorch-logo

PyTorch is a Python machine learning package based on Torch, which is an open-source machine learning package based on the programming language Lua. PyTorch has two main features:

  • Tensor computation (like NumPy) with strong GPU acceleration

  • Automatic differentiation for building and training neural networks

  • Notebook - Day-16

Day-17 | Multi-Class Text Classification with Scikit-Learn

text-scikit

  • There are lots of applications of text classification in the commercial world.

  • However, the vast majority of text classification articles and tutorials on the internet are binary text classification such as email spam filtering (spam vs. ham), sentiment analysis (positive vs. negative).

  • In most cases, our real world problem are much more complicated than that.

  • Notebook - Day-17

Day-18 | Convolutional Neural Network with Keras

keras

  • CNNs have wide applications in image and video recognition, recommender systems and natural language processing. CNNs, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and responds with an output.

  • Today, I've practised how to implement CNN with Keras

  • Notebook - Day-18

Day-19 | Ensemble Learing- Random Forest,Ada Boost, Gradient Boosting

ensemble-learning-jun29-1

  • Ensemble learning, in general, is a model that makes predictions based on a number of different models. By combining individual models, the ensemble model tends to be more flexible🤸‍♀️ (less bias) and less data-sensitive🧘‍♀️ (less variance).

  • Two most popular ensemble methods are bagging and boosting.

  • With a basic understanding of what ensemble learning learnt Random Forest, AdaBoost, and Gradient Boosting, and their implementation in Python Sklearn.

  • Notebook - Day-19

Day-20 | Essentials of Deep Learning – Sequence to Sequence modelling with Attention (using python)

sequence modeling

  • We know that to solve sequence modelling problems, Recurrent Neural Networks is our go-to architecture.
  • Suppose you have a series of statements:

Joe went to the kitchen. Fred went to the kitchen. Joe picked up the milk. Joe travelled to the office. Joe left the milk. Joe went to the bathroom.

And you have been asked the below question:

Where was Joe before the office?

  • The appropriate answer would be “kitchen”. A quick glance makes this seem like a simple problem. But to understand the complexity – there are two dimensions which the system has to understand:

  • The underlying working of the English language and the sequence of characters/words which make up the sentence The sequence of events which revolve around the people mentioned in the statements This can be considered as a sequence modelling problem, as understanding the sequence is important to make any prediction around it.

  • Notebook -Day-20]

Day-21 | Lingvo: A TensorFlow Framework for Sequence Modeling

linguo

  • Lingvo is the international language Esperanto word for “language”. This naming alludes to the roots of the Lingvo framework — it was developed as a general deep learning framework using TensorFlow with a focus on sequence models for language-related tasks such as machine translation, speech recognition, and speech synthesis.
  • Above picture demontrates an overview of the Lingvo framework, outlining how models are instantiated, trained, and exported for evaluation and serving.
  • To jump straight into the code, check out Tensoflow github page.
  • More details about Lingvo or some of the advanced features it supports.

Day-22 | 9 obscure Python libraries for data science

python libraries

  • Python is an amazing language. In fact, it's one of the fastest growing programming languages in the world.
  • The entire ecosystem of Python and its libraries makes it an apt choice for users (beginners and advanced) all over the world. One of the reasons for its success and popularity is its set of robust libraries that make it so dynamic and fast.
  • This article is about Python libraries for data science tasks other than the commonly used ones like pandas, scikit-learn, and matplotlib.
  • Article - Day-22

Day-23 | TextBlob

textblob

  • A Python library for processing textual data, NLP framework, sentiment analysis.
  • As an NLP library for Python, TextBlob has been around for a while, after hearing many good things about it such as part-of-speech tagging and sentiment analysi.
  • Notebook - Day-23

Day-24 | Is your Machine Learning Model Biased?

machine learning

  • Machine learning models are being increasingly used to make decisions that affect people’s lives. With this power comes a responsibility to ensure that the model predictions are fair and not discriminating.
  • This article is about how to measure your model’s fairness and decide on the best fairness metrics.
  • Article - Day-24

Day-25 | A Gentle Introduction to Graph Neural Networks

graphical neural network

  • Recently, Graph Neural Network (GNN) has gained increasing popularity in various domains, including social network, knowledge graph, recommender system, and even life science.
  • The power of GNN in modeling the dependencies between nodes in a graph enables the breakthrough in the research area related to graph analysis.
  • This article aims to introduce the basics of Graph Neural Network and two more advanced algorithms, DeepWalk and GraphSage.
  • Article - Day-25

Day-26 | An Excessively Deep Dive Into Natural Gradient Optimization

gradient descent

  • All modern deep learning models are trained using gradient descent. At each step of gradient descent, your parameter values begin at some starting point, and you move them in the direction of greatest loss reduction.
  • You do this by taking the derivative of your loss with respect to your whole vector of parameters, otherwise called the Jacobian.
  • However, this is just the first derivative of your loss, and it doesn’t tell you anything about curvature, or, how quickly your first derivative is changing. More about article is below.
  • Article - Day-26

Day-27 | Google open-sources GPipe

google

  • Google open-sources GPipe, a library for efficiently training large deep neural networks

  • GPipe, a library for “efficiently” training deep neural networks (layered functions modeled after neurons) under Lingvo, a TensorFlow framework for sequence modeling.

  • It’s applicable to any network consisting of multiple sequential layers, Google AI software engineer Yanping Huang said in a blog post, and allows researchers to “easily” scale performance.

  • Article - Day-27

Day-28 | Simulators

simulator

  • Simulators : The Key Training Environment for Applied Deep Reinforcement Learning
  • Deep reinforcement learning (DRL) is one of the most exciting fields in AI right now.
  • It’s still early days, but there are obvious and underserved markets to which this technology can be applied today: enterprises that want to automate or optimize the efficiency of industrial systems and processes
  • Article - [Day-28]

Day-29 | creditR

creditR

  • creditR: An Amazing R Package to Enhance Credit Risk Scoring and Validation
  • Machine learning is disrupting multiple and diverse industries right now. One of the biggest industries to be impacted – finance.
  • Functions like fraud detection, customer segmentation, employee or client retention are primary machine learning targets.
  • Notebook - Day-29

Day-30 | Web Scraping , Text Mining and Sentiment Analysis

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published