Skip to content
View woestler's full-sized avatar

Highlights

  • Pro

Block or report woestler

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era

Python 3,948 478 Updated Aug 31, 2024

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC ac…

Go 22,414 2,257 Updated Sep 27, 2024

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Python 7,008 780 Updated Sep 27, 2024

Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand

Jupyter Notebook 42 40 Updated Sep 30, 2023

A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.

Python 39 18 Updated Mar 29, 2021

pyspark🍒🥭 is delicious,just eat it!😋😋

Python 772 209 Updated Sep 22, 2022

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Python 20,981 1,119 Updated Sep 27, 2024

Personal AI search copilot, open-source Perplexity

Python 721 69 Updated Jul 2, 2024

🔍 AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your da…

Python 16,931 1,850 Updated Sep 26, 2024

Search for words, documents, images, videos, news, maps and text translation using the DuckDuckGo.com search engine. Downloading files and images to a local hard drive.

Python 1,118 130 Updated Sep 23, 2024

SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

Python 12,601 1,346 Updated Sep 28, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 18,209 1,844 Updated Sep 27, 2024

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text

C++ 2,456 266 Updated Sep 28, 2024

Source-agnostic distributed change data capture system

Java 3,634 732 Updated Sep 28, 2023

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Python 176 20 Updated Jan 10, 2023

Parallel computing with task scheduling

Python 12,445 1,698 Updated Sep 28, 2024

A library that provides an embeddable, persistent key-value store for fast storage.

C++ 28,346 6,289 Updated Sep 27, 2024

Distributed reliable key-value store for the most critical data of a distributed system

Go 47,494 9,737 Updated Sep 27, 2024

A python based HTML to text conversion library, command line client and Web service.

Python 270 28 Updated Mar 5, 2024

Knowledge extraction from semi-structured web.

Python 12 3 Updated Mar 25, 2024

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 10,116 1,166 Updated Sep 1, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,612 2,501 Updated Aug 28, 2024

Schema-Driven Information Extraction from Heterogeneous Tables

Python 20 2 Updated Apr 1, 2024

Simplified DOM Trees for Transferable Attribute Extraction from the Web

Python 36 7 Updated Sep 27, 2024

Finetune LayoutLM on SROIE dataset using W&B tools

Python 18 1 Updated Dec 2, 2021

总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力

Python 2 Updated Mar 1, 2022

SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval

Python 47 11 Updated Sep 20, 2022

[EMNLP 2021] The baseline code for WebSRC dataset.

HTML 46 9 Updated Aug 27, 2022
Next