Skip to content

Commit

Permalink
Merge pull request JohnSnowLabs#150 from JohnSnowLabs/jsl-lib-docs
Browse files Browse the repository at this point in the history
Jsl lib docs
  • Loading branch information
C-K-Loan committed Oct 5, 2022
2 parents 6b4fd08 + 41de46d commit 3e61040
Show file tree
Hide file tree
Showing 12 changed files with 595 additions and 86 deletions.
16 changes: 12 additions & 4 deletions docs/_data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,27 @@ header:
- title: Spellbook
url: /docs/en/spellbook
key: tutorial_notebooks

- title: '<span style="color: #FF8A00;"><i class="fab fa-github fa-2x"></i></span>'
url: https://github.com/JohnSnowLabs/nlu
- title: '<span style="color: #FF8A00;"><i class="fab fa-slack-hash fa-2x"></i></span>'
url: https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA
docs-en:
- title: NLU
- title: John Snow Labs
children:
- title: Installation
url: /docs/en/install
- title: Usage
- title: Starting a Spark Session
url: /docs/en/start-a-sparksession
- title: John Snow Labs Usage & Overview
url: /docs/en/import-structure
- title: Settings & Cache Folder
url: /docs/en/john-snow-labs-home

- title: NLU
children:
- title: NLU Usage
url: /docs/en/concepts
- title: General Examples
- title: General NLU Examples
url: /docs/en/examples
- title: NLU for Healthcare
url: /docs/en/nlu_for_healthcare
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/jsl_lib/start/start.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
423 changes: 341 additions & 82 deletions docs/en/install.md

Large diffs are not rendered by default.

61 changes: 61 additions & 0 deletions docs/en/jsl_home.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
layout: docs
seotitle: NLU | John Snow Labs
title: John Snow Labs Configurations
permalink: /docs/en/john-snow-labs-home
key: docs-install
modify_date: "2020-05-26"
header: true
---

<div class="main-docs" markdown="1">



## Installed Library Version Settings
Each version of the John Snow Labs library comes with a **hardcoded set of versions** for very of product of the John Snow Labs company.
It will not accept **library secrets** which correspond to **versions do not match the settings**.
This essentially prevents you from installing **outdated** or **new but not deeply tested** libraries, or from shooting yourself in the foot you might say.


You can work around this protection mechanism, by configuring which version of licensed and open source libraries should be installed and accepted by updating the
`jsl.settings` module.

```python
#Example of all library versions to configure
from johnsnowlabs import *
jsl.settings.raw_version_jsl_lib='1.2.3'
jsl.settings.raw_version_nlp='1.2.3.rc1'
jsl.settings.raw_version_medical='1.2.3rc2'
jsl.settings.raw_version_secret_medical='1.2.3.a3'
jsl.settings.raw_version_secret_ocr='1.2.3.abc'
jsl.settings.raw_version_ocr='1.2.3.abc'
jsl.settings.raw_version_nlu='1.2.3.abc'
jsl.settings.raw_version_pyspark='1.2.3.abc'
jsl.settings.raw_version_nlp_display='1.2.3.abc'
```


## John Snow Labs Home Cache Folder
The John Snow Labs library maintains a home folder in `~/.johnsnowlabs` which contains all your Licenses, Jars for Java and Wheels for Python to install and run any feature.
Additionally, each directory has an `info.json` file, telling you more about Spark compatibility, Hardware Targets and versions of the files.


```shell
~/.johnsnowlabs/
├─ licenses/
│ ├─ info.json
│ ├─ license1.json
│ ├─ license2.json
├─ java_installs/
│ ├─ info.json
│ ├─ app1.jar
│ ├─ app2.jar
├─ py_installs/
│ ├─ info.json
│ ├─ app1.tar.gz
│ ├─ app2.tar.gz
├─ info.json

```
</div>
100 changes: 100 additions & 0 deletions docs/en/jsl_lib_imports.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
layout: docs
seotitle: NLU | John Snow Labs
title: John Snow labs Usage & Overview
permalink: /docs/en/import-structure
key: docs-install
modify_date: "2020-05-26"
header: true
---

<div class="main-docs" markdown="1">

The John Snow Labs Python library gives you a clean and easy way to structure your Python projects.
The very first line of a project should be:
```python
from johnsnowlabs import *
```
This imports all licensed and open source Python modules installed from other John Snow Labs Products, as well as
many handy utility imports.


The following Functions, Classes and Modules will available in the global namespace

## The **nlp** Module
-------------------
`nlp` module with classes and methods from [Spark NLP](https://nlp.johnsnowlabs.com/docs/en/quickstart) like `nlp.BertForSequenceClassification` and `nlp.map_annotations()`
- `nlp.AnnotatorName` via Spark NLP [Annotators](https://nlp.johnsnowlabs.com/docs/en/annotators) and [Transformers](https://nlp.johnsnowlabs.com/docs/en/transformers) i.e. `nlp.BertForSequenceClassification`
- Spark NLP [Helper Functions](https://nlp.johnsnowlabs.com/docs/en/auxiliary) i.e. `nlp.map_annotations()`
- `nlp.F` via `import pyspark.sql.functions as F` under the hood
- `nlp.T` via `import pyspark.sql.types as T` under the hood
- `nlp.SQL` via `import pyspark.sql as SQL` under the hood
- `nlp.ML` via `from pyspark import ml as ML` under the hood
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/nlp.py)


## The **jsl** Module

`jsl` module with the following methods
- `jsl.install()` for installing John Snow Labs libraries and managing your licenses, [more info here](https://nlu.johnsnowlabs.com/docs/en/install)
- `jsl.load()` for predicting with any the 10k+ pretrained models in 1 line of code or training new ones, using the [nlu.load() method](https://nlu.johnsnowlabs.com/) under the hood
- `jsl.start()` for starting a Spark Session with access to features, [more info here](https://nlu.johnsnowlabs.com/docs/en/start-a-sparksession)
- `jsl.viz()` for visualizing predictions with any of the 10k+ pretrained models using [nlu.viz()](https://nlu.johnsnowlabs.com/docs/en/viz_examples) under the hood
- `jsl.viz_streamlit()` and other `jsl.viz_streamlit_xyz for using any of the 10k+ pretrained models in 0 lines of code with an [interactive Streamlit GUI and re-usable and stackable Streamlit Components](https://nlu.johnsnowlabs.com/docs/en/streamlit_viz_examples)
- `jsl.to_pretty_df()` for predicting on raw strings getting a nicely structures Pandas DF from a Spark Pipeline using [nlu.to_pretty_df()](https://nlu.johnsnowlabs.com/docs/en/utils_for_spark_nlp) under the hood


## The **viz** Module

`viz` module with classes from [Spark NLP Display](https://nlp.johnsnowlabs.com/docs/en/display)
- `viz.NerVisualizer` for visualizing prediction outputs of Ner based Spark Pipelines
- `viz.DependencyParserVisualizer` for visualizing prediction outputs of DependencyParser based Spark Pipelines
- `viz.RelationExtractionVisualizer` for visualizing prediction outputs of RelationExtraction based Spark Pipelines
- `viz.EntityResolverVisualizer` for visualizing prediction outputs of EntityResolver based Spark Pipelines
- `viz.AssertionVisualizer` for visualizing prediction outputs of Assertion based Spark Pipelines


## The **ocr** Module

`ocr` module with annotator classes and methods from [Spark OCR](https://nlp.johnsnowlabs.com/docs/en/ocr) like `ocr.VisualDocumentClassifier` and `ocr.helpful_method()
- [Pipeline Components](https://nlp.johnsnowlabs.com/docs/en/ocr_pipeline_components) i.e. `ocr.ImageToPdf`
- [Table Recognizers](https://nlp.johnsnowlabs.com/docs/en/ocr_table_recognition) i.e. `ocr.ImageTableDetector`
- [Visual Document Understanding](https://nlp.johnsnowlabs.com/docs/en/ocr_visual_document_understanding) i.e. `ocr.VisualDocumentClassifier`
- [Object detectors](https://nlp.johnsnowlabs.com/docs/en/ocr_object_detection) i.e. `ocr.ImageHandwrittenDetector`
- [Enums, Structures and helpers](https://nlp.johnsnowlabs.com/docs/en/ocr_structures) i.e. `ocr.Color`
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/ocr.py)

## The **medical** Module


`medical` module with annotator classes and methods from [Spark NLP for Medicine](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) like `medical.RelationExtractionDL` and `medical.profile()`
- [Medical Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `medical.DeIdentification`
- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training) i.e. `medical.AnnotationToolJsonReader`
- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `medical.NerDLEvaluation`
- **NOTE:** Any class which has `Medical` in its name is available, but the `Medical` prefix has been omitted. I.e. `medical.NerModel` maps to `sparknlp_jsl.annotator.MedicalNerModel`
- This is achieved via `from sparknlp_jsl.annotator import MedicalNerModel as NerModel` under the hood.
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/medical.py)

## The **legal** Module

`legal` module with annotator classes and methods from [Spark NLP for Legal](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) like `legal.RelationExtractionDL` and `legal.profile()`
- [Legal Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `legal.DeIdentification`
- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training) i.e. `legal.AnnotationToolJsonReader`
- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `legal.NerDLEvaluation`
- **NOTE:** Any class which has `Legal` in its name is available, but the `Legal` prefix has been omitted. I.e. `legal.NerModel` maps to `sparknlp_jsl.annotator.LegalNerModel`
- This is achieved via `from sparknlp_jsl.annotator import LegalNerModel as NerModel` under the hood.
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/legal.py)


## The **finance** Module


`finance` module with annotator classes and methods from [Spark NLP for Finance](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) like `finance.RelationExtractionDL` and `finance.profile()`
- [Finance Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `finance.DeIdentification`
- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training) i.e. `finance.AnnotationToolJsonReader`
- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `finance.NerDLEvaluation`
- **NOTE:** Any class which has `Finance` in its name is available, but the `Finance` prefix has been omitted. I.e. `finance.NerModel` maps to `sparknlp_jsl.annotator.FinanceNerModel`
- This is achieved via `from sparknlp_jsl.annotator import FinanceNerModel as NerModel` under the hood.
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/finance.py)
-
</div>
8 changes: 8 additions & 0 deletions docs/en/predict_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,14 @@ nlu.load('sentiment').predict(text_df[['tweet','tweet_location']])

## Supported data types
NLU supports all of the common Python data types and formats
- Pandas Dataframes
- Spark Dataframes
- Modin with Dask backend
- Modin with Ray backend
- 1-D Numpy arrays of Strings
- Strings
- Arrays of Strings


</div><div class="h3-box" markdown="1">

Expand Down
73 changes: 73 additions & 0 deletions docs/en/start_sparkseession.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
layout: docs
seotitle: NLU | John Snow Labs
title: Starting a Spark Session
permalink: /docs/en/start-a-sparksession
key: docs-install
modify_date: "2020-05-26"
header: true
---

<div class="main-docs" markdown="1">

To use most features you must start a Spark Session with `jsl.start()`first.
This will launch a [Java Virtual Machine(JVM)](https://en.wikipedia.org/wiki/Java_virtual_machine) process on your machine
which has all of John Snow Labs and Sparks [Scala/Java Libraries(JARs)](https://de.wikipedia.org/wiki/Java_Archive) you have access to loaded into memory.

The `jsl.start()` method downloads loads and caches all jars for which credentials are provided if they are missing into `~/.jsl_home/java_installs`.
If you have installed via `jsl.install()` you can most likely **skip the rest of this page**, since your secrets have been cached in `~/.jsl_home` and will be re-used.
If you **disabled license caching** while installing or if you want to **tweak settings about your spark session** continue reading this section further.

Outputs of running `jsl.start()` tell you which jars are loaded and versions of all relevant libraries.
![access_token1.png](/assets/images/jsl_lib/start/start.png)



## Authorization Flow Parameters
Most of the authorization Flows and Parameters of `jsl.install()` are supported.
Review detailed [docs here](https://nlu.johnsnowlabs.com/docs/en/install#authorization-flows-overview)

| Parameter | Description | Example | Default |
|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|---------|
| `None` | Load license automatically via one of the **Auto-Detection Mechanisms** | `jsl.start()` | `False` |
| `browser_login` | Browser based authorization, Button to click on Notebooks and Browser Pop-Up otherwise. | `jsl.start(browser_login=True)` | `False` |
| `access_token` | Vist [my.johnsnowlabs.com](https://my.johnsnowlabs.com/) to extract a token which you can provide to enable license access. See [Access Token Example](http://nlu.johnsnowlabs.com/docs/en/install#via-access-token) | `jsl.start(access_token='myToken')` | `None` |
| `secrets_file` | Define JSON license file with keys defined by [License Variable Overview](https://nlu.johnsnowlabs.com/docs/en/install#license-variables-names-for-json-and-os-variables) and provide file path | `jsl.start(secrets_file='path/to/license.json')` | `None` |
| `store_in_jsl_home` | Disable caching of new licenses to `~./jsl_home` | `jsl.start(store_in_jsl_home=False)` | `True` |
| `license_number` | Specify which license to use, if you have access to multiple locally cached or are loading one from [my.jsl.com](https://my.johnsnowlabs.com/) | `jsl.start(license_number=5)` | `0` |


### Manually specify License Parameters
These can be omitted according to the [License Variable Overview](https://nlu.johnsnowlabs.com/docs/en/install#license-variables-names-for-json-and-os-variables)

| Parameter | Description |
|-------------------------|----------------------------------------|
| `aws_access_key` | Corresponds to `AWS_ACCESS_KEY_ID` |
| `aws_key_id` | Corresponds to `AWS_SECRET_ACCESS_KEY` |
| `enterprise_nlp_secret` | Corresponds to `HC_SECRET` |
| `ocr_secret` | Corresponds to `OCR_SECRET` |
| `hc_license` | Corresponds to `HC_LICENSE` |
| `ocr_license` | Corresponds to `OCR_LICENSE` |
| `fin_license` | Corresponds to `JSL_LEGAL_LICENSE` |
| `leg_license` | Corresponds to `JSL_FINANCE_LICENSE` |

## Sparksession Parameters
These parameters configure how your spark Session is started up.
See [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html) for a comprehensive overview of all spark settings

| Parameter | Default | Description | Example |
|----------------------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| `spark_conf` | `None` | Dictionary Key/Value pairs of [Spark Configurations](https://spark.apache.org/docs/latest/configuration.html) for the Spark Session | `jsl.start(spark_conf={'spark.executor.memory':'6g'})` |
| `master_url` | `local[*]` | URL to Spark Cluster master | `jsl.start(master_url='spark://my.master')` |
| `jar_paths` | `None` | List of paths to jars which should be loaded into the Spark Session | `jsl.start(jar_paths=['my/jar_folder/jar1.zip','my/jar_folder/jar2.zip'] )` |
| `exclude_nlp` | `False` | Whether to include Spark NLP jar in Session or not. This will always load the jar if available, unless set to `True`. | `jsl.start(exclude_nlp=True)` |
| `exclude_healthcare` | `False` | Whether to include licensed NLP Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to `True`. | `jsl.start(exclude_healthcare=True)` |
| `exclude_ocr` | `False` | Whether to include licensed OCR Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to `True`. | `jsl.start(exclude_ocr=True)` |
| `hardware_target` | `cpu` | Specify for which hardware Jar should be optimized. Valid values are `gpu`,`cpu`,`m1`,`aarch` | `jsl.start(hardware_target='m1')` |
| `model_cache_folder` | `None` | Specify where models should be downloaded to when using `model.pretrained()` | `jsl.start(model_cache_folder=True)` |





</div>

0 comments on commit 3e61040

Please sign in to comment.