Merge pull request JohnSnowLabs#150 from JohnSnowLabs/jsl-lib-docs

Jsl lib docs
rohitn · Oct 5, 2022 · 3e61040 · 3e61040
2 parents 6b4fd08 + 41de46d
commit 3e61040
Show file tree

Hide file tree

Showing 12 changed files with 595 additions and 86 deletions.
diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml
@@ -19,19 +19,27 @@ header:
   - title: Spellbook
     url:    /docs/en/spellbook
     key: tutorial_notebooks
-
   - title: '<span style="color: #FF8A00;"><i class="fab fa-github fa-2x"></i></span>'
     url: https://github.com/JohnSnowLabs/nlu
   - title: '<span style="color: #FF8A00;"><i class="fab fa-slack-hash fa-2x"></i></span>'
     url: https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA
 docs-en:
-  - title:     NLU
+  - title:     John Snow Labs
     children:
       - title:  Installation
         url:    /docs/en/install
-      - title:  Usage
+      - title:  Starting a Spark Session
+        url:    /docs/en/start-a-sparksession
+      - title:  John Snow Labs Usage & Overview
+        url:    /docs/en/import-structure
+      - title:  Settings & Cache Folder
+        url:    /docs/en/john-snow-labs-home
+
+  - title:     NLU
+    children:
+      - title:  NLU Usage
         url:    /docs/en/concepts
-      - title:  General Examples
+      - title:  General NLU Examples
         url:    /docs/en/examples
       - title:  NLU for Healthcare
         url:    /docs/en/nlu_for_healthcare

diff --git a/docs/assets/images/jsl_lib/install/access_token1.png b/docs/assets/images/jsl_lib/install/access_token1.png
diff --git a/docs/assets/images/jsl_lib/install/databricks_access_token.png b/docs/assets/images/jsl_lib/install/databricks_access_token.png
diff --git a/docs/assets/images/jsl_lib/install/install_button_colab.png b/docs/assets/images/jsl_lib/install/install_button_colab.png
diff --git a/docs/assets/images/jsl_lib/install/install_logs_colab.png b/docs/assets/images/jsl_lib/install/install_logs_colab.png
diff --git a/docs/assets/images/jsl_lib/install/install_pop_up.png b/docs/assets/images/jsl_lib/install/install_pop_up.png
diff --git a/docs/assets/images/jsl_lib/start/start.png b/docs/assets/images/jsl_lib/start/start.png
diff --git a/docs/en/install.md b/docs/en/install.md
diff --git a/docs/en/jsl_home.md b/docs/en/jsl_home.md
@@ -0,0 +1,61 @@
+---
+layout: docs
+seotitle: NLU | John Snow Labs
+title: John Snow Labs Configurations
+permalink: /docs/en/john-snow-labs-home
+key: docs-install
+modify_date: "2020-05-26"
+header: true
+---
+
+<div class="main-docs" markdown="1">
+
+
+
+## Installed Library Version Settings
+Each version of the John Snow Labs library comes with a **hardcoded set of versions** for very of product of the John Snow Labs company.       
+It will not accept **library secrets** which correspond to **versions do not match the settings**.
+This essentially prevents you from installing **outdated** or **new but not deeply tested** libraries, or from shooting yourself in the foot you might say.
+
+
+You can work around this protection mechanism, by configuring which version of licensed and open source libraries should be installed and accepted by updating the 
+`jsl.settings` module. 
+
+```python
+#Example of all library versions to configure
+from johnsnowlabs import *
+jsl.settings.raw_version_jsl_lib='1.2.3'
+jsl.settings.raw_version_nlp='1.2.3.rc1'
+jsl.settings.raw_version_medical='1.2.3rc2'
+jsl.settings.raw_version_secret_medical='1.2.3.a3'
+jsl.settings.raw_version_secret_ocr='1.2.3.abc'
+jsl.settings.raw_version_ocr='1.2.3.abc'
+jsl.settings.raw_version_nlu='1.2.3.abc'
+jsl.settings.raw_version_pyspark='1.2.3.abc'
+jsl.settings.raw_version_nlp_display='1.2.3.abc'
+```
+
+
+## John Snow Labs Home Cache Folder
+The John Snow Labs library maintains a home folder in `~/.johnsnowlabs` which contains all your Licenses, Jars for Java and Wheels for Python to install and run any feature.
+Additionally, each directory has an `info.json` file, telling you more about Spark compatibility, Hardware Targets and versions of the files.
+
+
+```shell
+~/.johnsnowlabs/
+   ├─ licenses/
+   │  ├─ info.json
+   │  ├─ license1.json
+   │  ├─ license2.json
+   ├─ java_installs/
+   │  ├─ info.json
+   │  ├─ app1.jar
+   │  ├─ app2.jar
+   ├─ py_installs/
+   │  ├─ info.json
+   │  ├─ app1.tar.gz
+   │  ├─ app2.tar.gz
+   ├─ info.json
+
+```
+</div>
diff --git a/docs/en/jsl_lib_imports.md b/docs/en/jsl_lib_imports.md
@@ -0,0 +1,100 @@
+---
+layout: docs
+seotitle: NLU | John Snow Labs
+title: John Snow labs Usage & Overview
+permalink: /docs/en/import-structure
+key: docs-install
+modify_date: "2020-05-26"
+header: true
+---
+
+<div class="main-docs" markdown="1">
+
+The John Snow Labs Python library gives you a clean and easy way to structure your Python projects.
+The very first line of a project should be:
+```python
+from johnsnowlabs import *
+```
+This imports all licensed and open source Python modules installed from other John Snow Labs Products, as well as
+many handy utility imports.
+
+
+The following Functions, Classes and Modules will available in the global namespace
+
+## The **nlp** Module
+-------------------
+`nlp` module with classes and methods from [Spark NLP](https://nlp.johnsnowlabs.com/docs/en/quickstart)  like `nlp.BertForSequenceClassification`  and `nlp.map_annotations()`
+- `nlp.AnnotatorName` via Spark NLP [Annotators](https://nlp.johnsnowlabs.com/docs/en/annotators) and [Transformers](https://nlp.johnsnowlabs.com/docs/en/transformers) i.e. `nlp.BertForSequenceClassification`
+- Spark NLP [Helper Functions](https://nlp.johnsnowlabs.com/docs/en/auxiliary) i.e. `nlp.map_annotations()`
+- `nlp.F` via `import pyspark.sql.functions as F` under the hood
+- `nlp.T` via `import pyspark.sql.types as T` under the hood
+- `nlp.SQL` via `import pyspark.sql as SQL` under the hood
+- `nlp.ML` via  `from pyspark import ml as ML` under the hood
+- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/nlp.py)
+
+
+## The **jsl** Module
+
+`jsl` module with the following methods
+- `jsl.install()` for installing John Snow Labs libraries and managing your licenses, [more info here](https://nlu.johnsnowlabs.com/docs/en/install)
+- `jsl.load()` for predicting with any the 10k+ pretrained models in 1 line of code or training new ones, using the [nlu.load() method](https://nlu.johnsnowlabs.com/) under the hood
+- `jsl.start()` for starting a Spark Session with access to features, [more info here](https://nlu.johnsnowlabs.com/docs/en/start-a-sparksession)
+- `jsl.viz()` for visualizing predictions with any of the 10k+ pretrained models using [nlu.viz()](https://nlu.johnsnowlabs.com/docs/en/viz_examples) under the hood
+- `jsl.viz_streamlit()` and other `jsl.viz_streamlit_xyz for using any of the 10k+ pretrained models in 0 lines of code with an [interactive Streamlit GUI and re-usable and stackable Streamlit Components](https://nlu.johnsnowlabs.com/docs/en/streamlit_viz_examples)
+- `jsl.to_pretty_df()` for predicting on raw strings getting a nicely structures Pandas DF from a Spark Pipeline using [nlu.to_pretty_df()](https://nlu.johnsnowlabs.com/docs/en/utils_for_spark_nlp) under the hood
+
+
+## The **viz** Module
+
+`viz` module with classes from [Spark NLP Display](https://nlp.johnsnowlabs.com/docs/en/display)
+- `viz.NerVisualizer` for visualizing prediction outputs of Ner based Spark Pipelines
+- `viz.DependencyParserVisualizer` for visualizing prediction outputs of DependencyParser based Spark Pipelines
+- `viz.RelationExtractionVisualizer` for visualizing prediction outputs of RelationExtraction based Spark Pipelines
+- `viz.EntityResolverVisualizer` for visualizing prediction outputs of EntityResolver based Spark Pipelines
+- `viz.AssertionVisualizer` for visualizing prediction outputs of Assertion based Spark Pipelines
+
+
+## The **ocr** Module
+
+`ocr` module with annotator classes and methods from [Spark OCR](https://nlp.johnsnowlabs.com/docs/en/ocr) like `ocr.VisualDocumentClassifier`  and `ocr.helpful_method()
+- [Pipeline Components](https://nlp.johnsnowlabs.com/docs/en/ocr_pipeline_components) i.e. `ocr.ImageToPdf`
+- [Table Recognizers](https://nlp.johnsnowlabs.com/docs/en/ocr_table_recognition) i.e. `ocr.ImageTableDetector`
+- [Visual Document Understanding](https://nlp.johnsnowlabs.com/docs/en/ocr_visual_document_understanding) i.e. `ocr.VisualDocumentClassifier`
+- [Object detectors](https://nlp.johnsnowlabs.com/docs/en/ocr_object_detection) i.e. `ocr.ImageHandwrittenDetector`
+- [Enums, Structures and helpers](https://nlp.johnsnowlabs.com/docs/en/ocr_structures) i.e. `ocr.Color`
+- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/ocr.py)
+
+## The **medical** Module
+
+
+`medical` module with annotator classes and methods from [Spark NLP for Medicine](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators)  like `medical.RelationExtractionDL`  and `medical.profile()`
+- [Medical Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `medical.DeIdentification`
+- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training)  i.e. `medical.AnnotationToolJsonReader`
+- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `medical.NerDLEvaluation`
+- **NOTE:** Any class which has `Medical` in its name is available, but the `Medical` prefix has been omitted. I.e. `medical.NerModel` maps to `sparknlp_jsl.annotator.MedicalNerModel`
+  - This is achieved via `from sparknlp_jsl.annotator import MedicalNerModel as NerModel` under the hood.
+- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/medical.py)
+
+## The **legal** Module
+
+`legal` module with annotator classes and methods from [Spark NLP for Legal](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators)  like `legal.RelationExtractionDL`  and `legal.profile()`
+- [Legal Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `legal.DeIdentification`
+- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training)  i.e. `legal.AnnotationToolJsonReader`
+- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `legal.NerDLEvaluation`
+- **NOTE:** Any class which has `Legal` in its name is available, but the `Legal` prefix has been omitted. I.e. `legal.NerModel` maps to `sparknlp_jsl.annotator.LegalNerModel`
+  - This is achieved via `from sparknlp_jsl.annotator import LegalNerModel as NerModel` under the hood.
+- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/legal.py)
+
+
+## The **finance** Module
+
+
+`finance` module with annotator classes and methods from [Spark NLP for Finance](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators)  like `finance.RelationExtractionDL`  and `finance.profile()`
+- [Finance Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `finance.DeIdentification`
+- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training)  i.e. `finance.AnnotationToolJsonReader`
+- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `finance.NerDLEvaluation`
+- **NOTE:** Any class which has `Finance` in its name is available, but the `Finance` prefix has been omitted. I.e. `finance.NerModel` maps to `sparknlp_jsl.annotator.FinanceNerModel`
+  - This is achieved via `from sparknlp_jsl.annotator import FinanceNerModel as NerModel` under the hood.
+- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/finance.py)
+- 
+</div>
diff --git a/docs/en/predict_api.md b/docs/en/predict_api.md
@@ -220,6 +220,14 @@ nlu.load('sentiment').predict(text_df[['tweet','tweet_location']])
 
 ## Supported data types
 NLU supports all of the common Python data types and formats
+- Pandas Dataframes
+- Spark Dataframes
+- Modin with Dask backend
+- Modin with Ray backend
+- 1-D Numpy arrays of Strings
+- Strings
+- Arrays of Strings
+
 
 </div><div class="h3-box" markdown="1">
 

diff --git a/docs/en/start_sparkseession.md b/docs/en/start_sparkseession.md
@@ -0,0 +1,73 @@
+---
+layout: docs
+seotitle: NLU | John Snow Labs
+title: Starting a Spark Session
+permalink: /docs/en/start-a-sparksession
+key: docs-install
+modify_date: "2020-05-26"
+header: true
+---
+
+<div class="main-docs" markdown="1">
+
+To use most features you must start a Spark Session with `jsl.start()`first.        
+This will launch a [Java Virtual Machine(JVM)](https://en.wikipedia.org/wiki/Java_virtual_machine) process on your machine
+which has all of John Snow Labs and Sparks [Scala/Java Libraries(JARs)](https://de.wikipedia.org/wiki/Java_Archive) you have access to loaded into memory. 
+
+The `jsl.start()` method downloads loads  and caches all jars for which credentials are provided if they are missing into `~/.jsl_home/java_installs`.       
+If you have installed via `jsl.install()` you can most likely **skip the rest of this page**, since your secrets have been cached in `~/.jsl_home` and will be re-used.        
+If you **disabled license caching** while installing or if you want to **tweak settings about your spark session** continue reading this section further.        
+
+Outputs of running `jsl.start()` tell you which jars are loaded and versions of all relevant libraries.  
+![access_token1.png](/assets/images/jsl_lib/start/start.png)
+
+
+
+## Authorization Flow Parameters 
+Most of the authorization Flows and Parameters of `jsl.install()` are supported.                 
+Review detailed [docs here](https://nlu.johnsnowlabs.com/docs/en/install#authorization-flows-overview)  
+
+| Parameter           | Description                                                                                                                                                                                                          | Example                                          | Default |
+|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|---------|
+| `None`              | Load license automatically via one of the **Auto-Detection Mechanisms**                                                                                                                                              | `jsl.start()`                                    | `False` |
+| `browser_login`     | Browser based authorization, Button to click on Notebooks and Browser Pop-Up otherwise.                                                                                                                              | `jsl.start(browser_login=True)`                  | `False` |
+| `access_token`      | Vist [my.johnsnowlabs.com](https://my.johnsnowlabs.com/) to extract a token which you can provide to enable license access. See [Access Token Example](http://nlu.johnsnowlabs.com/docs/en/install#via-access-token) | `jsl.start(access_token='myToken')`              | `None`  |
+| `secrets_file`      | Define JSON license file with keys  defined by [License Variable Overview](https://nlu.johnsnowlabs.com/docs/en/install#license-variables-names-for-json-and-os-variables) and provide file path                     | `jsl.start(secrets_file='path/to/license.json')` | `None`  |
+| `store_in_jsl_home` | Disable caching of new licenses to `~./jsl_home`                                                                                                                                                                     | `jsl.start(store_in_jsl_home=False)`             | `True`  |
+| `license_number`    | Specify which license to use, if you have access to multiple locally cached or are loading one from  [my.jsl.com](https://my.johnsnowlabs.com/)                                                                      | `jsl.start(license_number=5)`                    | `0`     |
+
+
+### Manually specify License Parameters 
+These can be omitted according to the [License Variable Overview](https://nlu.johnsnowlabs.com/docs/en/install#license-variables-names-for-json-and-os-variables)
+
+| Parameter               | Description                            |
+|-------------------------|----------------------------------------|
+| `aws_access_key`        | Corresponds to `AWS_ACCESS_KEY_ID`     |
+| `aws_key_id`            | Corresponds to `AWS_SECRET_ACCESS_KEY` |
+| `enterprise_nlp_secret` | Corresponds to `HC_SECRET`             |
+| `ocr_secret`            | Corresponds to `OCR_SECRET`            |
+| `hc_license`            | Corresponds to `HC_LICENSE`            |
+| `ocr_license`           | Corresponds to `OCR_LICENSE`           |
+| `fin_license`           | Corresponds to `JSL_LEGAL_LICENSE`     |
+| `leg_license`           | Corresponds to `JSL_FINANCE_LICENSE`   |
+
+## Sparksession Parameters
+These parameters configure how your spark Session is started up.        
+See [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html) for a comprehensive overview of all spark settings 
+
+| Parameter            | Default    | Description                                                                                                                                                        | Example                                                                     |
+|----------------------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
+| `spark_conf`         | `None`     | Dictionary Key/Value pairs of [Spark Configurations](https://spark.apache.org/docs/latest/configuration.html) for the Spark Session                                | `jsl.start(spark_conf={'spark.executor.memory':'6g'})`                      |
+| `master_url`         | `local[*]` | URL to Spark Cluster master                                                                                                                                        | `jsl.start(master_url='spark://my.master')`                                 |
+| `jar_paths`          | `None`     | List of paths to jars which should be loaded into the Spark Session                                                                                                | `jsl.start(jar_paths=['my/jar_folder/jar1.zip','my/jar_folder/jar2.zip'] )` |
+| `exclude_nlp`        | `False`    | Whether to include Spark NLP jar in Session or not. This will always load the jar if available, unless set to `True`.                                              | `jsl.start(exclude_nlp=True)`                                               |
+| `exclude_healthcare` | `False`    | Whether to include licensed NLP Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to `True`. | `jsl.start(exclude_healthcare=True)`                                        |
+| `exclude_ocr`        | `False`    | Whether to include licensed OCR Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to `True`. | `jsl.start(exclude_ocr=True)`                                               |
+| `hardware_target`    | `cpu`      | Specify for which hardware Jar should be optimized. Valid values are `gpu`,`cpu`,`m1`,`aarch`                                                                      | `jsl.start(hardware_target='m1')`                                           |
+| `model_cache_folder` | `None`     | Specify where models should be downloaded to when using `model.pretrained()`                                                                                       | `jsl.start(model_cache_folder=True)`                                        |
+
+
+
+
+
+</div>