Skip to content

Commit

Permalink
GitBook: [#199] No subject
Browse files Browse the repository at this point in the history
  • Loading branch information
orico authored and gitbook-bot committed Apr 30, 2022
1 parent 90b07fa commit 53ba9dc
Show file tree
Hide file tree
Showing 39 changed files with 270 additions and 35 deletions.
Binary file modified .gitbook/assets/image (10).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (11).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (12).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (13).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (14).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (15).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (16).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (17).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (18).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (19).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (20).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (21).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (22).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (23).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (24).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/image (25).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (7).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .gitbook/assets/image (8).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 11 additions & 9 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,16 @@
* [DOE Tools](experimental-design/doe-tools.md)
* [A/B Testing](experimental-design/a-b-testing.md)
* [Factorial Design](experimental-design/factorial-design.md)
* [Page 1](page-1.md)
* [Additional Data Science Skills](additional-data-science-skills/README.md)
* [Expanding Your Data Science Skills](additional-data-science-skills/expanding-your-data-science-skills.md)
* [Product Management](additional-data-science-skills/product-management.md)
* [User Experience Design (UX)](additional-data-science-skills/user-experience-design-ux.md)
* [Business](additional-data-science-skills/business.md)
* [Marketting](additional-data-science-skills/marketting.md)
* [Humor](additional-data-science-skills/humor.md)
* [Product](product/README.md)
* [Expanding Your Data Science Skills](product/expanding-your-data-science-skills.md)
* [Product Vision & Strategy](product/product-vision-and-strategy.md)
* [Product Managers](product/product-managers.md)
* [Product Management Resources](product/product-management-resources.md)
* [Product Tools](product/product-tools.md)
* [User Experience Design (UX)](product/user-experience-design-ux.md)
* [Business](product/business.md)
* [Marketting](product/marketting.md)
* [Humor](product/humor.md)
* [Business Domains](business-domains/README.md)
* [Follow the regularized leader](business-domains/follow-the-regularized-leader.md)
* [Growth](business-domains/growth.md)
Expand All @@ -94,7 +96,7 @@
* [Design Patterns](mlops-engineering/design-patterns.md)
* [Full-Stack & Ops](mlops-engineering/full-stack-and-ops.md)
* [MLOps](mlops-engineering/mlops.md)
* [MLOps Monitoring & Alerts](mlops-monitoring-and-alerts.md)
* [MLOps Monitoring & Alerts](mlops-engineering/mlops-monitoring-and-alerts.md)
* [Data Engineering](data-engineering/README.md)
* [SQL](data-engineering/sql.md)
* [Patterns](data-engineering/patterns.md)
Expand Down
4 changes: 2 additions & 2 deletions business-domains/fraud-detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

1. [Machine Learning for Credit Card Fraud Detection](https://fraud-detection-handbook.github.io/fraud-detection-handbook/Foreword.html) - Practical Handbook, [Git](https://github.com/Fraud-Detection-Handbook/fraud-detection-handbook)

![](<../.gitbook/assets/image (14).png>)
![](<../.gitbook/assets/image (17).png>)
2. [Fraud detection](https://towardsdatascience.com/frauddetection-f801b781410b) on money pools, using social network & pool size, future optimizing using f-beta.
3. [Fraud detection Objectives.](https://nethone.com/post/beginners-guide-to-machine-learning)

![](<../.gitbook/assets/image (12).png>)
![](<../.gitbook/assets/image (13).png>)

Data sets

Expand Down
28 changes: 14 additions & 14 deletions data-engineering/data-engineering-questions-and-training.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@
10. Have you worked with data science teams? What were your responsibilities?
11. What are the considerations of choosing spark vs bigquery?
12. What are the differences between ETL and ELT? [1](https://www.guru99.com/etl-vs-elt.html), [2](https://www.xplenty.com/blog/etl-vs-elt/), [3](https://blog.panoply.io/etl-vs-elt-the-difference-is-in-the-how)\
![](../../.gitbook/assets/0)\
![](../.gitbook/assets/0)\
By guru99/david taylor

![](../../.gitbook/assets/1)
<img src="../.gitbook/assets/1" alt="" data-size="original">

![](../../.gitbook/assets/2)
<img src="../.gitbook/assets/2" alt="" data-size="original">

By mark smallcombe\

Expand All @@ -34,11 +34,11 @@ By mark smallcombe\
6. [What does the CAP Theorem actually say?](https://www.fullstack.cafe/blog/cap-theorem-interview-questions)
7. how it effect real world application (latency is availability in real world)
8. [Another great resource for CAP questions](https://github.com/henryr/cap-faq)
9. ![](../../.gitbook/assets/3)
10. ![](../../.gitbook/assets/4)
11. ![](../../.gitbook/assets/5)
12. ![](../../.gitbook/assets/6)
13. ![](../../.gitbook/assets/7)\
9. ![](../.gitbook/assets/3)
10. ![](../.gitbook/assets/4)
11. ![](../.gitbook/assets/5)
12. ![](../.gitbook/assets/6)
13. ![](../.gitbook/assets/7)\

2. Explain the difference and the reason to choose using NoSQL {mongoDB | DynamoDB | .. } over Relational database {Postgress |MySQL} and vice versa. Give an example for a project where you had to make this choice, and walk through your reasoning.

Expand Down Expand Up @@ -106,33 +106,33 @@ By mark smallcombe\
11. Outage handling and the differences between stream-based processing vs concurrent isolated worker-based processing using

Q: you have a real time stream - what is better? A stream-based processing system, or a worker-based, that can be triggered on different time ranges, in the context of recovery from outage.\
![](../../.gitbook/assets/8)
![](../.gitbook/assets/8)

By nielsen Ilai Malka

1. [How to design a](https://github.com/donnemartin/system-design-primer#system-design-interview-questions-with-solutions) .

![](../../.gitbook/assets/9)
![](../.gitbook/assets/9)

* How would you design and implement an API rate limiter?
1. [The twitter question](https://github.com/donnemartin/system-design-primer/blob/master/solutions/system\_design/twitter/README.md)

![](../../.gitbook/assets/10)\
![](../.gitbook/assets/10)\


1. [More than 2000 questions for data engineers](https://github.com/OBenner/data-engineering-interview-questions)

![](../../.gitbook/assets/11)\
![](../.gitbook/assets/11)\


1. [More data engineering questions](https://realpython.com/data-engineer-interview-questions-python/)

![](../../.gitbook/assets/12)\
![](../.gitbook/assets/12)\


1. [Even more qs](https://www.softwaretestinghelp.com/data-engineer-interview-questions/)

![](../../.gitbook/assets/13)
![](../.gitbook/assets/13)

References:

Expand Down
4 changes: 2 additions & 2 deletions data-engineering/lakes-and-warehouses.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@

1. [Data Lake vs Data Warehouse](https://www.talend.com/resources/data-lake-vs-data-warehouse/)

![](<../.gitbook/assets/image (12) (1) (1).png>)
![](<../.gitbook/assets/image (21).png>)
2. [Top 5 differences between DL & DWH](https://www.bluegranite.com/blog/bid/402596/top-five-differences-between-data-lakes-and-data-warehouses)
3. [Amazon on DL vs DWH](https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/)

![](<../.gitbook/assets/image (13) (1) (1) (1).png>)
![](<../.gitbook/assets/image (25).png>)
4. [Snowflake vs Delta Lake vs Fire Bolt](https://www.firebolt.io/blog/snowflake-vs-databricks-vs-firebolt) - "Databricks Delta Lake and Delta Engine is a lakehouse. You choose it as a data lake, and for data lakehouse-based workloads including ELT for data warehouses, data science and machine learning, even static reporting and dashboards if you don’t mind the performance difference and don’t have a data warehouse.\


Expand Down
2 changes: 1 addition & 1 deletion data-engineering/patterns.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Patterns

1. [Slowly Changing Dimensions (SCD)](https://adatis.co.uk/introduction-to-slowly-changing-dimensions-scd-types/) by adatis - what can you do when the information in your table changes, types 0-6. Image by Adatis.co.uk.\
![](<../.gitbook/assets/image (17).png>)\
![](<../.gitbook/assets/image (22).png>)\


&#x20;
2 changes: 1 addition & 1 deletion data-science.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This article provides an overview of TDSP and its main components. We provide a

![by The DS lifecycle, Microsoft Documentation](https://lh5.googleusercontent.com/6uVYD4xbDkj2HG\_rfP7fWQUn5eERj0nl\_m-kKPpuyYX4q6R0g95WAduUFmIrSWVOd0P6dptgZG-1gkqWX-PvX4Png\_ocJwI8VVxnj5WaZHCyetwvCLMwaKnp6g5b4goekVy9RuWV)

![by The DS lifecycle, Microsoft Documentation](<.gitbook/assets/image (11).png>)
![by The DS lifecycle, Microsoft Documentation](<.gitbook/assets/image (12).png>)

[**Google’s famous MLops**](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#mlops\_level\_0\_manual\_process)

Expand Down
2 changes: 1 addition & 1 deletion datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

1. [Various Bias types](https://queue.acm.org/detail.cfm?id=3466134) by queue.acm

![](<.gitbook/assets/image (12) (1).png>)
![](<.gitbook/assets/image (18).png>)

1. [Overfitting your test set, a statistican view point, a great article](https://lukeoakdenrayner.wordpress.com/2019/09/19/ai-competitions-dont-produce-useful-models/?fbclid=IwAR1WM5U7imq-2LFPifyCoTPp-MFwPoGROMLr2TZWAp41qgVeLdT-\_2bkLyk\&blogsub=confirming#subscribe-blog), bottom line use bonferroni correction.
2. Understanding what is the next stage in DL (& ML) algorithm development: basic approach - [Andrew NG](https://www.youtube.com/watch?v=F1ka6a13S9I) on youtube
Expand Down
41 changes: 41 additions & 0 deletions mlops-engineering/mlops-monitoring-and-alerts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# MLOps Monitoring & Alerts

## **MONITORING & ALERTS**

* [**Monitor! Stop being a blind DS**](https://towardsdatascience.com/monitor-stop-being-a-blind-data-scientist-ac915286075f)
* [**Monitor your dependencies! Stop being a blind DS**](https://towardsdatascience.com/monitor-your-dependencies-stop-being-a-blind-data-scientist-a3150bd64594)
* [**Data science observability for executives**](https://towardsdatascience.com/data-science-observability-for-executives-a054411faecc)
* [**Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance**](https://towardsdatascience.com/production-machine-learning-monitoring-outliers-drift-explainers-statistical-performance-d9b1d02ac158)**,** [**youtube**](https://www.youtube.com/watch?v=QcevzK9ZuDg)**, uses** [**alibi-explain**](https://docs.google.com/document/d/1dXELAcJn9KCPSRMDvZoumUyHx8K8Yn7wfFxesSpbNCM/edit#heading=h.xs1o8m3ro5iy) **(see compendium) and Ali-detect (see compendium)**
* [**Mlflow, Hyperparameterhunter,hyperopt, concept drift, unit tests.**](https://towardsdatascience.com/putting-ml-in-production-ii-logging-and-monitoring-algorithms-91f174044e4e)
* [**meta anomaly over multiple models, aggregate.** ](https://www.anodot.com/blog/monitoring-machine-learning/)
* [**Vidhya on monitoring data & models**](https://www.analyticsvidhya.com/blog/2019/10/deployed-machine-learning-model-post-production-monitoring/)

### **Drift**

1. [**Data & concept drifts**](https://deepchecks.com/how-to-monitor-ml-models-in-production/)**,** [**2**](https://www.explorium.ai/blog/understanding-and-handling-data-and-concept-drift/)
2. **(good)** [**Inferring Concept Drift Without Labeled Data**](https://concept-drift.fastforwardlabs.com)**. also talks about stream-based drift by Cloudera - fast forward labs.**
3. **Arize.ai**&#x20;
1. **Data, concept,** [**feature drifts**](https://towardsdatascience.com/using-statistical-distance-metrics-for-machine-learning-observability-4c874cded78) **- various comparisons between train/prod/validation time windows, diff models, a/b testing etc.., and how to measure drifts**
2. [**Model store, Feature store, evaluation store**](https://towardsdatascience.com/the-only-3-ml-tools-you-need-1aa750778d33)
3. [**Monitor model performance in production**](https://towardsdatascience.com/the-playbook-to-monitor-your-models-performance-in-production-ec06c1cc3245) **- real- time, biased, delayed, and no ground truth.**&#x20;
4. [use cases - i.e., how to use statistical differences/distances](https://towardsdatascience.com/using-statistical-distance-metrics-for-machine-learning-observability-4c874cded78)
4. [**Some advice on medium**](https://towardsdatascience.com/concept-drift-and-model-decay-in-machine-learning-a98a809ea8d4)**, relabel using latest model (can we even trust it?) retrain after.**
5. [**Adversarial Validation Approach to Concept Drift Problem in User Targeting Automation Systems at Uber**](https://arxiv.org/abs/2004.03045) **- Previous research on concept drift mostly proposed model retraining after observing performance decreases. However, this approach is suboptimal because the system fixes the problem only after suffering from poor performance on new data. Here, we introduce an adversarial validation approach to concept drift problems in user targeting automation systems. With our approach, the system detects concept drift in new data before making inference, trains a model, and produces predictions adapted to the new data.**&#x20;
6. **Drift estimator between data sets using random forest, the formula is in the medium article above, code here at** [**mlBOX**](https://github.com/AxeldeRomblay/MLBox/blob/811dbcb04fc7f5501e82f3e78aa6c119f426ee78/python-package/mlbox/preprocessing/drift/drift\_estimator.py)
7. [**Alibi-detect**](https://docs.google.com/document/d/1dXELAcJn9KCPSRMDvZoumUyHx8K8Yn7wfFxesSpbNCM/edit#heading=h.y6mpsp4co5t9) **- is an open-source Python library focused on outlier, adversarial, and drift detection, by Seldon.**

![Alibi Detection Drift Features](https://lh4.googleusercontent.com/sASV5qq3CTmv0gx6Tl3DiwACMnwsW9wj1yNHF5sFIFbQr4BFFgAVgfcWsnrHxNnQtQKa-b5-IdbC-OElnQIr117lxaH3TGCuz1CmpgU6mof3i9VkPR3LyzdD9S0ujTmWj7o88Iep)

### Tool Comparisons

1. [State of MLOps](https://www.stateofmlops.com) (by me), [medium](https://towardsdatascience.com/mlops-monitoring-market-review-66904f0863bb) article, open-source [AirTable](https://airtable.com/shr4rfiuOIVjMhvhL).
2. [MLOps.toys](https://mlops.toys) - A curated list of MLOps projects by [Aporia](https://aporia.com)
3. [Neptune.AI](https://mlops.neptune.ai) MLOPS tools landscape
4. [Twimlai](https://twimlai.com/solutions/) ML AI solutions
5. [Ambiata](https://www.ambiata.com/blog/2020-12-07-mlops-tools/) how to choose the best MLOps tools
6. [Lakefs](https://lakefs.io/the-state-of-data-engineering-in-2021/) on the state of data engineering - has monitoring and observability inside
7. [The NLP Pandec](https://github.com/ivan-bilan/The-NLP-Pandect#mlops-for-nlp) - MLOps for NLP
8. [ml-ops.org](https://ml-ops.org)
9. [Awesome production ML](https://github.com/EthicalML/awesome-production-machine-learning/)

![Awesome production ML](../.gitbook/assets/image.png)
4 changes: 2 additions & 2 deletions natural-language-processing/name-matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
1. [first and last name dataset](https://github.com/philipperemy/name-dataset), facebook 533M records, philippe remy
2. [data.world name datasets](https://data.world/datasets/names)
3. [Kaggle](https://www.kaggle.com/fivethirtyeight/fivethirtyeight-most-common-name-dataset/version/108), Name datasets, by fivethirtyeight\
![](<../.gitbook/assets/image (16).png>)
![](<../.gitbook/assets/image (20).png>)
4. [gender by name dataset](https://archive.ics.uci.edu/ml/datasets/Gender+by+Name)
5. [paper](http://www.lrec-conf.org/proceedings/lrec2008/pdf/291\_paper.pdf) - a ground truth dataset for matching coltural diverse romanized person names

Expand All @@ -21,4 +21,4 @@
1. [Dedupe](https://www.reddit.com/r/datasets/comments/4zrozk/request\_name\_matching\_dataset/) - a python library for accurate and scalable fuzzy matching record deduplication and entity resolution
2. [name](https://github.com/bradhackinen/nama) - fast flexible name matching for large datasets
3. [name matcher](https://github.com/athenianco/names-matcher) by athenianco\
![](<../.gitbook/assets/image (13) (1).png>)
![](<../.gitbook/assets/image (15).png>)
4 changes: 2 additions & 2 deletions natural-language-processing/nlp.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Natural Language Programming (Many Topics)
# Natural Language Processing (Many Topics)

### **NLP - a reality check**

Expand Down Expand Up @@ -485,7 +485,7 @@

This is the official code accompanying a paper on the [Hebrew Psychological Lexicons](https://www.aclweb.org/anthology/2021.clpsych-1.6.pdf) was presented at CLPsych 2021.

![Summary Hebrew Psych Lexicon](<../.gitbook/assets/image (10).png>)
![Summary Hebrew Psych Lexicon](<../.gitbook/assets/image (11).png>)

**Reference papers:**

Expand Down
2 changes: 2 additions & 0 deletions product/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Page 1

11 changes: 11 additions & 0 deletions product/business.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Business

[Unusual Ventures](https://www.field-guide.unusual.vc) - "A tactical field guide provides you with all of the best practices and tools founders need to solve the most challenging early-stage problems". However, I see this as a guide for DS that wants to understand what is go-to-market, product-led-growth, and other topics in that space, and need a good introduction.![](https://files.gitbook.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-Mgd48oS5\_duTKOVE\_Et%2F-MkVBtp7BJSNV\_p-NgWd%2F-MkVDHbsRF\_b2\_Ogjkgu%2Fimage.png?alt=media\&token=65daf790-ac08-431e-a3cb-3d1a5cb2843d)\
Unusual Ventures Field guide Index\


![](https://files.gitbook.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-Mgd48oS5\_duTKOVE\_Et%2F-MkVBtp7BJSNV\_p-NgWd%2F-MkVCr8pibrDfwzK4dNn%2Fimage.png?alt=media\&token=89576131-b6d9-4e06-a3c7-065f99194547)\
Unusual Ventures Product Field guide

![](https://files.gitbook.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-Mgd48oS5\_duTKOVE\_Et%2F-MkVBtp7BJSNV\_p-NgWd%2F-MkVCx3OidZV5yE3BWhs%2Fimage.png?alt=media\&token=a70b5ad9-850e-427c-bdc6-f542a9902a2e)\
Unusual Ventures Modern GTM Field guide
3 changes: 3 additions & 0 deletions product/expanding-your-data-science-skills.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Expanding Your Data Science Skills

As a data scientist, I believe that we should all acquire deep knowledge and skills in other domains, in order to have a greater understanding of how our work is connected to the product, business, experience, marketing, full-stack, and Ops.
4 changes: 4 additions & 0 deletions product/humor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Humor

1. [**Risitas learns about MLOPS**](https://www.youtube.com/watch?v=1C\_l5ICJlEo\&feature=youtu.be)
2. [**Pyception**](https://youtu.be/C1wiOTkA44Y)
7 changes: 7 additions & 0 deletions product/marketting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Marketting

1. [**Why is your marketing isn't working**](https://jvullinghs.medium.com/this-is-why-your-marketing-isnt-working-55e761b3e05e) **- The 3 elements of your marketing foundation**
2. **Brand**
3. **Positioning**
4. **Messaging**
5. ****[**Inbound vs Outbound marketing**](https://blog.hubspot.com/blog/tabid/6307/bid/2989/inbound-marketing-vs-outbound-marketing.aspx) **by hubspot**
Loading

0 comments on commit 53ba9dc

Please sign in to comment.