Stars
A complete computer science study plan to become a software engineer.
DuckDB is an analytical in-process SQL database management system
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch…
A GPU-powered real-time analytics storage and query engine.
Generic Data Ingestion & Dispersal Library for Hadoop
Uber's cross-platform mobile architecture framework.
This is a repo with links to everything you'd ever want to learn about data engineering
Upserts, Deletes And Incremental Processing on Big Data.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
The source code for the book Modern Data Engineering with Apache Spark
This repository provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally.
pquintero / docker-images
Forked from oracle/docker-imagesOfficial source for Docker configurations, images, and examples of Dockerfiles for Oracle products and projects
Official source of container configurations, images, and examples for Oracle products and projects
A Spark plugin for reading and writing Excel files
Curated list of resources about Apache Airflow
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.
Easy to maintain open source documentation websites.
This repository contains the basic definition for the AWS Glue DataCatalog Database
This repository contains the basic definition for the AWS Glue job deployment
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Git repo to accompany the AWS DevOps Blog: Using AWS DevOps Tools to model and provision AWS Glue workflows
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.