LEADS

LEADS is a Lazy Exploratory Analysis Data Summarizer.

Writing the same boilerplate exploratory analysis code in a Jupyter notebook or Excel spreadsheet for each new dataset can be tedious. This tool automates the generation of a consistent, comprehensive, and human readable exploratory analysis report that allows you to immediately become familiar with a dataset. The generated PDF report contains the below features.

Currently supports .csv, .tsv, and .parquet files for inputs and .pdf files for report formats (eventually will work on additional report formats such as markdown).

Feature List

Report features:
- Title page.
- Table of contents.
- Page numbers.
- Run metadata.
- Glossary of statistical terms (will be continually updated as new features are built out).
Report analysis sections:
- Data type analysis:
  - Identification of feature data types.
- Basic dataset information and descriptive statistics:
  - Number of rows and columns.
  - Column names and data types.
  - Min, max, mean, median, standard deviation.
  - Quartiles and interquartile ranges.
  - Skewness and kurtosis.
- Missing value analysis:
  - Count and percentage of missing values per column.
  - Visualization of missing value patterns.
- Distribution analysis:
  - Normality tests (Shapiro-Wilk, Anderson-Darling).
  - Q-Q plots.
- Outlier detection:
  - Z-score method.
  - IQR method.
  - Local outlier factor (LOF).
  - Visualization of outliers.
- Visualizations:
  - Histograms.
  - Box plots.
  - Scatter plots.
  - Correlation heatmaps.
  - Pair plots for multivariate data.
  - Unique value counts for categorical variables.
- Multicollinearity checks:
  - Correlation matrix.
  - Variance inflation factor (VIF).
- Pairwise data exploration:
  - Scatter plot matrix.
  - Correlation analysis.
- Dimensionality reduction:
  - Principal component analysis (PCA).
  - t-SNE visualization.
- Feature importance:
  - For categorical variables: chi-squared test.
  - For numerical target variables: correlation analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
examples/data		examples/data
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEADS

Feature List

About

Releases

Packages

Languages

seankim658/leads

Folders and files

Latest commit

History

Repository files navigation

LEADS

Feature List

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages