Skip to content
View ches's full-sized avatar

Sponsoring

@mgmeyers

Highlights

  • Pro

Organizations

@barcampbangkok @scalajp @hspec @bkkhack @go-kafka @feast-dev

Block or report ches

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Stars

Data

24 repositories

lakeFS - Data version control for your data lake | Git for data

Go 4,358 346 Updated Sep 20, 2024

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Java 984 125 Updated Sep 21, 2024

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Python 1,973 94 Updated Sep 20, 2024

DuckDB is an analytical in-process SQL database management system

C++ 22,836 1,815 Updated Sep 20, 2024

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

C 17,503 879 Updated Sep 20, 2024

Voilร  turns Jupyter notebooks into standalone web applications

Python 5,398 501 Updated Sep 2, 2024

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.

C++ 8,295 1,156 Updated Sep 18, 2024

๐Ÿฆœ๐Ÿ”— Build context-aware reasoning applications

Jupyter Notebook 92,515 14,812 Updated Sep 21, 2024

Making data lake work for time series

Python 1,121 60 Updated Aug 21, 2024

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 33,110 5,602 Updated Sep 21, 2024

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Rust 29,341 1,856 Updated Sep 20, 2024

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Scala 283 66 Updated Aug 21, 2024

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

Rust 7,697 727 Updated Sep 21, 2024

Typesafe wrapper for Apache Spark DataFrame API

Scala 136 8 Updated Jul 8, 2024

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅAI-driven database tool and SQL client, The hottest GUI client, supporting MySQL, Oracle, PostgreSQL, DB2, SQL Server, DB2, SQLite, H2, ClickHouse, and more.

Java 14,889 1,664 Updated Sep 17, 2024

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

Go 376 44 Updated Sep 21, 2024

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team coโ€ฆ

TypeScript 5,208 989 Updated Sep 21, 2024

EventStoreDB, the event-native database. Designed for Event Sourcing, Event-Driven, and Microservices architectures

C# 5,240 639 Updated Sep 20, 2024

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixir 2,399 159 Updated Jul 30, 2024

A library that provides useful extensions to Apache Spark and PySpark.

Scala 193 26 Updated Sep 13, 2024

A generative AI extension for JupyterLab

Python 3,132 311 Updated Sep 16, 2024

A code-first agent framework for seamlessly planning and executing data analytics tasks.

Python 5,204 661 Updated Sep 14, 2024

๐Ÿฆ” PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.

Python 20,940 1,244 Updated Sep 21, 2024

Latency Tester for Apache Cassandra

Rust 176 19 Updated Aug 17, 2024