Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A Python Library to support running data quality rules while the spark job is running⚡
Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
Open, Multi-modal Catalog for Data & AI
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data P…
Apache DataFusion Comet Spark Accelerator
OpenTofu lets you declaratively manage your cloud infrastructure.
pyspark methods to enhance developer productivity 📣 👯 🎉
Work with your web service, database, and streaming schemas in a single format.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipeline
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
A curated list of useful resources for gRPC
Pre-trained models and language resources for Natural Language Processing in Polish
ERP beyond your fridge - Grocy is a web-based self-hosted groceries & household management solution for your home
A better notebook for Scala (and more)
Command-line program to download videos from YouTube.com and other video sites
ROADMAP(Mind Map) and KEYWORD for students those who have interest in learning NLP
A personal knowledge management and sharing system for VSCode
Clean Code concepts adapted for Java. Based on @ryanmcdermott repository.
sbt / sbt-assembly
Forked from softprops/assembly-sbtDeploy über-JARs. Restart processes. (port of codahale/assembly-sbt)