Skip to content

usbrandon/provider-data-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdt - Provider Data Tools

Version: 0.6.1

This reposiory contains a number of command-line utilities and related code libraries for parsing, creating, and validating provider data data. They are:

  • chop-nppes-public - Parse the npi public data dissemination into flattened files
  • csv2pjson-public.py - Parse the npi public data dissemination into ProviderJSON files
  • validate-pjson - Parse a Provider JSON document and output errors and warnings as JSON.
  • validate-pjson-dir - Bulk validation of the output of csv2pjson-public.py
  • create-provider-indexes - Create default MongoDB indexes on Provides JSON data to support public search on common fields.
  • loadnppes.py - Download public, parse to JSON, and load to MongoDB in one step.

Please note the utilities csv2json, json2mongo, and jsondir2mongo have been moved from pdt and placed in their own package called jdt. These tools are generic and have utility outside health provider data.

Installation

You can install the tool using pip.

To install with pip just type:

~$ sudo pip install pdt

Note: If you use sudo, the scripts will be installed at the system level and used by all users. Add --upgrade to the above install instructions to ensure you fetch the newest version.

chop-nppes-public

To make use of this script you need first fecth the "NPPES Data Dissemination" file.

To obtain the "NPPES Data Dissemination", go to http://nppes.viva-it.com/NPI_Files.html. Get the "Full Replacement Monthly" zip file. Unzip the file with the unzip tool of your choice.

To run the utility simply call it on a command line and proivde one command line argument, the csv file to parse:

~$ chop-nppes-public npidata_20050523-20140413.csv

The file name npidata_20050523-20140413.csv will vary depending on the date.

The script make take a few minutes to complete. When it completes you will have more files in your current directory. Everything is still indexed by NPI. These files are described below.

  • _basic.csv - Contains basic demographic info
  • _addresses_flat.csv - one address per line identifier as practice or mailing
  • _identifiers_flat.csv - one identifer per line
  • _licenses_flat.csv - one license per line
  • _taxonomy_flat.csv - one taxonomy code per line and identified as primary or not.

csv2pjson.py

Convert the GAO CSV file format to a directory of files in ProviderJSON format.

Usage:

~$ csv2pjson.py [CSV_FILE] [OUTPUT_DIR]

Example:

csv2pjson.py gao-csvfile.csv output

Output:

One file is created per line in the CSV file file inside the directory output. Files are fanned out into a directory structure so as not to create millions of files in one directory.

csv2pjson-public.py

Convert the NPPES Public Data Dissemination CSV file format to a directory og files in ProviderJSON format.

Usage:

csv2pjson.py [CSV_FILE] [OUTPUT_DIR]

Example:

csv2pjson.py public-csvfile.csv output

Output:

One file is created per line in the CSV file file inside the directoryoutput. Files are fanned out into a directory structure so as not to create millions of files in one directory.

validate-pjson

Validate the PJSON for complaince with a create/update request. It returns errors and warnings in JSON to stdout.

Usage:

validate-pjson [ProivderJSON] [update|create]

Example:

validate-pjson  1003819723.json update

Example Output:

{
"errors": [
    "authorized_official_telephone_number must be in XXX-XXX-XXXX format.", 
    "EIN is required for a type-2 organization provider."
], 
"warnings": [
    "Enumeration date is generated by CMS. The provided value will be ignored.", 
    "Last updated date is generated by CMS. The provided value will be ignored.", 
    "status is determined by CMS. The provided value will be ignored."
]
}

loadnppes.py

By streamlining several of the pdt utilities, the script loadnppes.py combines functionalty for automatic setup. The script will download public data, parse to JProvider SON, and load to MongoDB in one step. Note this script requires unzip and wget to be installed.

Usage:

loadnppes.py [PROCESS_FULL Y/N] [DOWNLOAD_FROM_PUBLIC_FILE Y/N]"

Example:

loadnppes.py y y 

Example Output:

Downloading http://nppes.viva-it.com/NPPES_Data_Dissemination_March_2015.zip
--2015-04-13 14:14:57--  http://nppes.viva-it.com/NPPES_Data_Dissemination_March_2015.zip
Resolving nppes.viva-it.com (nppes.viva-it.com)... 68.142.118.4, 68.142.118.254
Connecting to nppes.viva-it.com (nppes.viva-it.com)|68.142.118.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 514406694 (491M) [application/zip]
Saving to: `NPPES_Data_Dissemination_March_2015.zip'

0% [                                       ] 2,691,064   58.1K/s  eta 3h 38m 
.
.
.

About

Tools for working with CMS Health Provider Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published