forked from JohnSnowLabs/nlu
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request JohnSnowLabs#150 from JohnSnowLabs/jsl-lib-docs
Jsl lib docs
- Loading branch information
Showing
12 changed files
with
595 additions
and
86 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
--- | ||
layout: docs | ||
seotitle: NLU | John Snow Labs | ||
title: John Snow Labs Configurations | ||
permalink: /docs/en/john-snow-labs-home | ||
key: docs-install | ||
modify_date: "2020-05-26" | ||
header: true | ||
--- | ||
|
||
<div class="main-docs" markdown="1"> | ||
|
||
|
||
|
||
## Installed Library Version Settings | ||
Each version of the John Snow Labs library comes with a **hardcoded set of versions** for very of product of the John Snow Labs company. | ||
It will not accept **library secrets** which correspond to **versions do not match the settings**. | ||
This essentially prevents you from installing **outdated** or **new but not deeply tested** libraries, or from shooting yourself in the foot you might say. | ||
|
||
|
||
You can work around this protection mechanism, by configuring which version of licensed and open source libraries should be installed and accepted by updating the | ||
`jsl.settings` module. | ||
|
||
```python | ||
#Example of all library versions to configure | ||
from johnsnowlabs import * | ||
jsl.settings.raw_version_jsl_lib='1.2.3' | ||
jsl.settings.raw_version_nlp='1.2.3.rc1' | ||
jsl.settings.raw_version_medical='1.2.3rc2' | ||
jsl.settings.raw_version_secret_medical='1.2.3.a3' | ||
jsl.settings.raw_version_secret_ocr='1.2.3.abc' | ||
jsl.settings.raw_version_ocr='1.2.3.abc' | ||
jsl.settings.raw_version_nlu='1.2.3.abc' | ||
jsl.settings.raw_version_pyspark='1.2.3.abc' | ||
jsl.settings.raw_version_nlp_display='1.2.3.abc' | ||
``` | ||
|
||
|
||
## John Snow Labs Home Cache Folder | ||
The John Snow Labs library maintains a home folder in `~/.johnsnowlabs` which contains all your Licenses, Jars for Java and Wheels for Python to install and run any feature. | ||
Additionally, each directory has an `info.json` file, telling you more about Spark compatibility, Hardware Targets and versions of the files. | ||
|
||
|
||
```shell | ||
~/.johnsnowlabs/ | ||
├─ licenses/ | ||
│ ├─ info.json | ||
│ ├─ license1.json | ||
│ ├─ license2.json | ||
├─ java_installs/ | ||
│ ├─ info.json | ||
│ ├─ app1.jar | ||
│ ├─ app2.jar | ||
├─ py_installs/ | ||
│ ├─ info.json | ||
│ ├─ app1.tar.gz | ||
│ ├─ app2.tar.gz | ||
├─ info.json | ||
|
||
``` | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
--- | ||
layout: docs | ||
seotitle: NLU | John Snow Labs | ||
title: John Snow labs Usage & Overview | ||
permalink: /docs/en/import-structure | ||
key: docs-install | ||
modify_date: "2020-05-26" | ||
header: true | ||
--- | ||
|
||
<div class="main-docs" markdown="1"> | ||
|
||
The John Snow Labs Python library gives you a clean and easy way to structure your Python projects. | ||
The very first line of a project should be: | ||
```python | ||
from johnsnowlabs import * | ||
``` | ||
This imports all licensed and open source Python modules installed from other John Snow Labs Products, as well as | ||
many handy utility imports. | ||
|
||
|
||
The following Functions, Classes and Modules will available in the global namespace | ||
|
||
## The **nlp** Module | ||
------------------- | ||
`nlp` module with classes and methods from [Spark NLP](https://nlp.johnsnowlabs.com/docs/en/quickstart) like `nlp.BertForSequenceClassification` and `nlp.map_annotations()` | ||
- `nlp.AnnotatorName` via Spark NLP [Annotators](https://nlp.johnsnowlabs.com/docs/en/annotators) and [Transformers](https://nlp.johnsnowlabs.com/docs/en/transformers) i.e. `nlp.BertForSequenceClassification` | ||
- Spark NLP [Helper Functions](https://nlp.johnsnowlabs.com/docs/en/auxiliary) i.e. `nlp.map_annotations()` | ||
- `nlp.F` via `import pyspark.sql.functions as F` under the hood | ||
- `nlp.T` via `import pyspark.sql.types as T` under the hood | ||
- `nlp.SQL` via `import pyspark.sql as SQL` under the hood | ||
- `nlp.ML` via `from pyspark import ml as ML` under the hood | ||
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/nlp.py) | ||
|
||
|
||
## The **jsl** Module | ||
|
||
`jsl` module with the following methods | ||
- `jsl.install()` for installing John Snow Labs libraries and managing your licenses, [more info here](https://nlu.johnsnowlabs.com/docs/en/install) | ||
- `jsl.load()` for predicting with any the 10k+ pretrained models in 1 line of code or training new ones, using the [nlu.load() method](https://nlu.johnsnowlabs.com/) under the hood | ||
- `jsl.start()` for starting a Spark Session with access to features, [more info here](https://nlu.johnsnowlabs.com/docs/en/start-a-sparksession) | ||
- `jsl.viz()` for visualizing predictions with any of the 10k+ pretrained models using [nlu.viz()](https://nlu.johnsnowlabs.com/docs/en/viz_examples) under the hood | ||
- `jsl.viz_streamlit()` and other `jsl.viz_streamlit_xyz for using any of the 10k+ pretrained models in 0 lines of code with an [interactive Streamlit GUI and re-usable and stackable Streamlit Components](https://nlu.johnsnowlabs.com/docs/en/streamlit_viz_examples) | ||
- `jsl.to_pretty_df()` for predicting on raw strings getting a nicely structures Pandas DF from a Spark Pipeline using [nlu.to_pretty_df()](https://nlu.johnsnowlabs.com/docs/en/utils_for_spark_nlp) under the hood | ||
|
||
|
||
## The **viz** Module | ||
|
||
`viz` module with classes from [Spark NLP Display](https://nlp.johnsnowlabs.com/docs/en/display) | ||
- `viz.NerVisualizer` for visualizing prediction outputs of Ner based Spark Pipelines | ||
- `viz.DependencyParserVisualizer` for visualizing prediction outputs of DependencyParser based Spark Pipelines | ||
- `viz.RelationExtractionVisualizer` for visualizing prediction outputs of RelationExtraction based Spark Pipelines | ||
- `viz.EntityResolverVisualizer` for visualizing prediction outputs of EntityResolver based Spark Pipelines | ||
- `viz.AssertionVisualizer` for visualizing prediction outputs of Assertion based Spark Pipelines | ||
|
||
|
||
## The **ocr** Module | ||
|
||
`ocr` module with annotator classes and methods from [Spark OCR](https://nlp.johnsnowlabs.com/docs/en/ocr) like `ocr.VisualDocumentClassifier` and `ocr.helpful_method() | ||
- [Pipeline Components](https://nlp.johnsnowlabs.com/docs/en/ocr_pipeline_components) i.e. `ocr.ImageToPdf` | ||
- [Table Recognizers](https://nlp.johnsnowlabs.com/docs/en/ocr_table_recognition) i.e. `ocr.ImageTableDetector` | ||
- [Visual Document Understanding](https://nlp.johnsnowlabs.com/docs/en/ocr_visual_document_understanding) i.e. `ocr.VisualDocumentClassifier` | ||
- [Object detectors](https://nlp.johnsnowlabs.com/docs/en/ocr_object_detection) i.e. `ocr.ImageHandwrittenDetector` | ||
- [Enums, Structures and helpers](https://nlp.johnsnowlabs.com/docs/en/ocr_structures) i.e. `ocr.Color` | ||
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/ocr.py) | ||
|
||
## The **medical** Module | ||
|
||
|
||
`medical` module with annotator classes and methods from [Spark NLP for Medicine](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) like `medical.RelationExtractionDL` and `medical.profile()` | ||
- [Medical Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `medical.DeIdentification` | ||
- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training) i.e. `medical.AnnotationToolJsonReader` | ||
- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `medical.NerDLEvaluation` | ||
- **NOTE:** Any class which has `Medical` in its name is available, but the `Medical` prefix has been omitted. I.e. `medical.NerModel` maps to `sparknlp_jsl.annotator.MedicalNerModel` | ||
- This is achieved via `from sparknlp_jsl.annotator import MedicalNerModel as NerModel` under the hood. | ||
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/medical.py) | ||
|
||
## The **legal** Module | ||
|
||
`legal` module with annotator classes and methods from [Spark NLP for Legal](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) like `legal.RelationExtractionDL` and `legal.profile()` | ||
- [Legal Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `legal.DeIdentification` | ||
- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training) i.e. `legal.AnnotationToolJsonReader` | ||
- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `legal.NerDLEvaluation` | ||
- **NOTE:** Any class which has `Legal` in its name is available, but the `Legal` prefix has been omitted. I.e. `legal.NerModel` maps to `sparknlp_jsl.annotator.LegalNerModel` | ||
- This is achieved via `from sparknlp_jsl.annotator import LegalNerModel as NerModel` under the hood. | ||
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/legal.py) | ||
|
||
|
||
## The **finance** Module | ||
|
||
|
||
`finance` module with annotator classes and methods from [Spark NLP for Finance](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) like `finance.RelationExtractionDL` and `finance.profile()` | ||
- [Finance Annotators](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) , i.e. `finance.DeIdentification` | ||
- [Training Methods](https://nlp.johnsnowlabs.com/docs/en/licensed_training) i.e. `finance.AnnotationToolJsonReader` | ||
- [Evaluation Methods](https://nlp.johnsnowlabs.com/docs/en/evaluation), i.e. `finance.NerDLEvaluation` | ||
- **NOTE:** Any class which has `Finance` in its name is available, but the `Finance` prefix has been omitted. I.e. `finance.NerModel` maps to `sparknlp_jsl.annotator.FinanceNerModel` | ||
- This is achieved via `from sparknlp_jsl.annotator import FinanceNerModel as NerModel` under the hood. | ||
- To see all the imports see [the source](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/johnsnowlabs/finance.py) | ||
- | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
layout: docs | ||
seotitle: NLU | John Snow Labs | ||
title: Starting a Spark Session | ||
permalink: /docs/en/start-a-sparksession | ||
key: docs-install | ||
modify_date: "2020-05-26" | ||
header: true | ||
--- | ||
|
||
<div class="main-docs" markdown="1"> | ||
|
||
To use most features you must start a Spark Session with `jsl.start()`first. | ||
This will launch a [Java Virtual Machine(JVM)](https://en.wikipedia.org/wiki/Java_virtual_machine) process on your machine | ||
which has all of John Snow Labs and Sparks [Scala/Java Libraries(JARs)](https://de.wikipedia.org/wiki/Java_Archive) you have access to loaded into memory. | ||
|
||
The `jsl.start()` method downloads loads and caches all jars for which credentials are provided if they are missing into `~/.jsl_home/java_installs`. | ||
If you have installed via `jsl.install()` you can most likely **skip the rest of this page**, since your secrets have been cached in `~/.jsl_home` and will be re-used. | ||
If you **disabled license caching** while installing or if you want to **tweak settings about your spark session** continue reading this section further. | ||
|
||
Outputs of running `jsl.start()` tell you which jars are loaded and versions of all relevant libraries. | ||
![access_token1.png](/assets/images/jsl_lib/start/start.png) | ||
|
||
|
||
|
||
## Authorization Flow Parameters | ||
Most of the authorization Flows and Parameters of `jsl.install()` are supported. | ||
Review detailed [docs here](https://nlu.johnsnowlabs.com/docs/en/install#authorization-flows-overview) | ||
|
||
| Parameter | Description | Example | Default | | ||
|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|---------| | ||
| `None` | Load license automatically via one of the **Auto-Detection Mechanisms** | `jsl.start()` | `False` | | ||
| `browser_login` | Browser based authorization, Button to click on Notebooks and Browser Pop-Up otherwise. | `jsl.start(browser_login=True)` | `False` | | ||
| `access_token` | Vist [my.johnsnowlabs.com](https://my.johnsnowlabs.com/) to extract a token which you can provide to enable license access. See [Access Token Example](http://nlu.johnsnowlabs.com/docs/en/install#via-access-token) | `jsl.start(access_token='myToken')` | `None` | | ||
| `secrets_file` | Define JSON license file with keys defined by [License Variable Overview](https://nlu.johnsnowlabs.com/docs/en/install#license-variables-names-for-json-and-os-variables) and provide file path | `jsl.start(secrets_file='path/to/license.json')` | `None` | | ||
| `store_in_jsl_home` | Disable caching of new licenses to `~./jsl_home` | `jsl.start(store_in_jsl_home=False)` | `True` | | ||
| `license_number` | Specify which license to use, if you have access to multiple locally cached or are loading one from [my.jsl.com](https://my.johnsnowlabs.com/) | `jsl.start(license_number=5)` | `0` | | ||
|
||
|
||
### Manually specify License Parameters | ||
These can be omitted according to the [License Variable Overview](https://nlu.johnsnowlabs.com/docs/en/install#license-variables-names-for-json-and-os-variables) | ||
|
||
| Parameter | Description | | ||
|-------------------------|----------------------------------------| | ||
| `aws_access_key` | Corresponds to `AWS_ACCESS_KEY_ID` | | ||
| `aws_key_id` | Corresponds to `AWS_SECRET_ACCESS_KEY` | | ||
| `enterprise_nlp_secret` | Corresponds to `HC_SECRET` | | ||
| `ocr_secret` | Corresponds to `OCR_SECRET` | | ||
| `hc_license` | Corresponds to `HC_LICENSE` | | ||
| `ocr_license` | Corresponds to `OCR_LICENSE` | | ||
| `fin_license` | Corresponds to `JSL_LEGAL_LICENSE` | | ||
| `leg_license` | Corresponds to `JSL_FINANCE_LICENSE` | | ||
|
||
## Sparksession Parameters | ||
These parameters configure how your spark Session is started up. | ||
See [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html) for a comprehensive overview of all spark settings | ||
|
||
| Parameter | Default | Description | Example | | ||
|----------------------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------| | ||
| `spark_conf` | `None` | Dictionary Key/Value pairs of [Spark Configurations](https://spark.apache.org/docs/latest/configuration.html) for the Spark Session | `jsl.start(spark_conf={'spark.executor.memory':'6g'})` | | ||
| `master_url` | `local[*]` | URL to Spark Cluster master | `jsl.start(master_url='spark://my.master')` | | ||
| `jar_paths` | `None` | List of paths to jars which should be loaded into the Spark Session | `jsl.start(jar_paths=['my/jar_folder/jar1.zip','my/jar_folder/jar2.zip'] )` | | ||
| `exclude_nlp` | `False` | Whether to include Spark NLP jar in Session or not. This will always load the jar if available, unless set to `True`. | `jsl.start(exclude_nlp=True)` | | ||
| `exclude_healthcare` | `False` | Whether to include licensed NLP Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to `True`. | `jsl.start(exclude_healthcare=True)` | | ||
| `exclude_ocr` | `False` | Whether to include licensed OCR Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to `True`. | `jsl.start(exclude_ocr=True)` | | ||
| `hardware_target` | `cpu` | Specify for which hardware Jar should be optimized. Valid values are `gpu`,`cpu`,`m1`,`aarch` | `jsl.start(hardware_target='m1')` | | ||
| `model_cache_folder` | `None` | Specify where models should be downloaded to when using `model.pretrained()` | `jsl.start(model_cache_folder=True)` | | ||
|
||
|
||
|
||
|
||
|
||
</div> |