Musoq

Musoq brings SQL power to your data, wherever it lives. Query files, directories, CSVs, and more with familiar SQL syntax – no database required.

🌟 Key Features

Versatility: Data sources come as plugins. Visit the Musoq.DataSources repository where they are all stored..
SQL Syntax Variant: The engine uses SQL syntax variant with support for complex queries.
Cross-Platform: Runs on Linux, Windows, and Docker. MacOS compatibility is anticipated.
In-place querying without data movement: Query data where it resides, without the need to move or load it into a central data store.
Extensible architecture for custom data sources: Add support for custom data sources through a plugin architecture.

🚀 Quick Start

To try out Musoq, follow the instructions in our CLI repository.

💡 Where To Use It

Musoq might be using in various places, including:

📂 File System Analysis

-- Look for files greater than 1 gig
SELECT 
	FullName 
FROM #os.files('/some/path', true) 
WHERE ToDecimal(Length) / 1024 / 1024 / 1024 > 1

-- Look for how many space does the extensions occupies within some directory
SELECT
    Extension,
    Round(Sum(Length) / 1024 / 1024 / 1024, 1) as SpaceOccupiedInGB,
    Count(Extension) as HowManyFiles
FROM #os.files('/some/directory', true)
GROUP BY Extension
HAVING Round(Sum(Length) / 1024 / 1024 / 1024, 1) > 0

-- Query your images folder, filter to include only .jpg files and show it's EXIF metadata
SELECT
    f.Name,
    m.DirectoryName,
    m.TagName,
    m.Description
FROM #os.files('./Images', false) f CROSS APPLY #os.metadata(f.FullName) m
WHERE f.Extension = '.jpg'

-- Get first, last 5 bits from files and consecutive 10 bytes of file with offset of 5 from tail
SELECT
	ToHex(Head(5), '|'),
	ToHex(Tail(5), '|'),
	ToHex(GetFileBytes(10, 5), '|')
FROM #os.files('/some/directory', false)

-- Diff between two folders
SELECT 
    (CASE WHEN SourceFile IS NOT NULL 
     THEN SourceFileRelative 
     ELSE DestinationFileRelative 
     END) AS FullName, 
    (CASE WHEN State = 'TheSame' 
     THEN 'The Same' 
     ELSE State 
     END) AS Status 
FROM #os.dirscompare('E:\DiffDirsTests\A', 'E:\DiffDirsTests\B')

-- Compute Sha on files
SELECT
   FullName,
   f.Sha256File()
FROM #os.files('@qfs/', false) f

📦 Archive Exploration

-- Query .csv files from archive file
table PeopleDetails {
	Name 'System.String',
	Surname 'System.String',
	Age 'System.Int32'
};
couple #separatedvalues.comma with table PeopleDetails as SourceOfPeopleDetails;
with Files as (
	select 
		a.Key as InZipPath
	from #archives.file('./Files/Example2/archive.zip') a
	where 
		a.IsDirectory = false and
		a.Contains(a.Key, '/') = false and 
		a.Key like '%.csv'
)
select 
	f.InZipPath, 
	b.Name, 
	b.Surname, 
	b.Age 
from #archives.file('./Files/Example2/archive.zip') a
inner join Files f on f.InZipPath = a.Key
cross apply SourceOfPeopleDetails(a.GetStreamContent(), true, 0) as b;

🖼️ Image Analysis with AI

-- Describe images using AI
SELECT
    llava.DescribeImage(photo.Base64File()),
    photo.FullName
FROM #os.files('/path/to/directory', false) photo 
INNER JOIN #ollama.models('llava:13b', 0.0) llava ON 1 = 1

-- Count tokens in Markdown and C files
SELECT 
   SUM(gpt.CountTokens(f.GetFileContent())) AS TokensCount 
FROM #os.files('/path/to/directory', true) f 
INNER JOIN #openai.gpt('gpt-4') gpt ON 1 = 1 
WHERE f.Extension IN ('.md', '.c')

-- Extract data from recipe image
select s.Shop, s.ProductName, s.Price from #stdin.image('OpenAi', 'gpt-4o') s

-- Compute sentiment on a comments
SELECT 
    csv.PostId,
    csv.Comment,
    gpt.Sentiment(csv.Comment) as Sentiment,
    csv.Date
FROM #separatedvalues.csv('/home/somebody/comments_sample.csv', true, 0) csv
INNER JOIN #openai.gpt('gpt-4-1106-preview') gpt on 1 = 1

🔍 SQL-Powered Data Extraction

-- Extract imports from proto file:
-- import "some/some_message_1"
-- ant turn them into:
-- some/SomeMessage1
with Events as (
    select
        Replace(
            Replace(
                Line,
                'import "',
                ''
            ),
            '.proto";',
            ''
        ) as Namespace
    from #flat.file('/path/to/file.proto') f
    where
        Length(Line) > 6 and
        Head(Line, 6) = 'import' and
        IndexOf(Line, 'some') <> -1
)
select
    Choose(
        0,
        Split(e.Namespace, '/')
    ) +
    '/' +
    Replace(
        ToTitleCase(
            Choose(
                1,
                Split(e.Namespace, '/')
            )
        ),
        '_',
        ''
    ) as Events
from Events e

-- Count word frequencies within text
with p as (
    select 
        Replace(Replace(ToLowerInvariant(w.Value), '.', ''), ',', '') as Word
    from #flat.file('/some/path/to/text/file.txt') f cross apply f.Split(f.Line, ' ') w
)
select
    Count(p.Word, 1) as AllWordsCount, 
    Count(p.Word) as SpecificWordCount,
    Round(ToDecimal((Count(p.Word) * 100)) / Count(p.Word, 1), 2) as WordFrequencies,
    Word
from p group by p.Word having Count(p.Word) > 1

🤖 AI-Assisted Text Structuring

-- Extract structured data from unstructured text
select s.Who, s.Age from #stdin.text('Ollama', 'llama3.1') s where ToInt32(s.Age) > 26 and ToInt32(s.Age) < 75

🔄 Universal Table Querying

-- Count occurrences of each name in a table with headers
select t.Name, Count(t.Name) from #stdin.table(true) t group by t.Name having Count(t.Name) > 1

🔧 CAN DBC File Analysis

select 
    m.Id, 
    m.Name, 
    m.DLC, 
    m.Transmitter, 
    m.Comment as MessageComment, 
    m.CycleTime,
    s.Name, 
    s.StartBit, 
    s.Length, 
    s.ByteOrder, 
    s.InitialValue, 
    s.Factor, 
    s.IsInteger, 
    s.Offset, 
    s.Minimum, 
    s.Maximum, 
    s.Unit,
    s.Comment as SignalsComment
from #can.messages('@qfs/Model3CAN.dbc') m cross apply m.Signals s

🎬 Watch It Live

🔧 Syntax Features

Musoq supports a rich set of SQL-like features:

Parameterizable sources
Optional query reordering (FROM ... WHERE ... GROUP BY ... HAVING ... SELECT ... SKIP N TAKE N2)
Use of * to select all columns
GROUP BY and HAVING operators
SKIP & TAKE operators
Set operators (UNION, UNION ALL, EXCEPT, INTERSECT)
LIKE / NOT LIKE operator
RLIKE / NOT RLIKE operator (regex)
CONTAINS operator
CTE expressions
IN operator
INNER, LEFT OUTER, RIGHT OUTER JOIN operator
ORDER BY operator
CROSS / OUTER APPLY operator

🛠 Supported Data Sources

Airtable (allows to query tables from Airtable)
Archives (allows to treat archives as tables)
CANBus (allows to treat CAN .dbc files and corresponding .csv files that contains records of a CAN bus as tables)
Docker (allows to treat docker containers, images, etc as tables)
FlatFile (allows to treat flat files as table)
Json (allows to treat json files as tables)
Kubernetes (allows to treat kubernetes pods, services, etc as tables) - experimental
OpenAI (exists mainly to be combined with other plugins to allow fuzzy search by GPT models)
Postgres (allows to treat postgres database as tables) - experimental
SeparatedValues (allows to treat separated values files as tables)
Sqlite (allows to treat sqlite database as tables) - experimental
System (mostly utils, ranges and dual table resides here) -
Time (allows to treat time as table)

🧭 Roadmap

The order is accidental. I will work on things that are the most urgent from the perspective of my current or near future work I will be using it with.

Comprehensive documentation
Roslyn data source
Improve runtime efficiency
Parallelize query execution
Recursive CTE
Rework JSON & XML support
Subqueries
More tests & better handling of syntax / runtime exceptions

If you think something might be important for the project to broaden its capabilities, feel free to submit a feature request.

🌱 Project Maturity

Musoq is an evolving project designed primarily for querying and analyzing smaller datasets, with a focus on user-friendly and efficient operations. Here's an overview of its current state:

Primary Use Case: Musoq serves as a tool for ad-hoc querying data and manipulation tasks. It intentionally support only reads. It excels at handling smaller datasets where its SQL-like syntax can provide more intuitive and efficient data operations.
Innovative SQL Syntax: I introduce new SQL syntax variants to simplify some complex queries and reduce the effort required for specific operations. This approach prioritizes user efficiency and ease of use, even if it means deviating from standard SQL in some cases.
Development Stage: Musoq is in active development, continuously improving its core functionality and expanding its syntax to better serve its primary use case. This includes introduction of new syntax features sometimes.
Dataset Size: At the current stage, Musoq is best suited for smaller to medium-sized datasets. For very large datasets or big data scenarios, traditional big data tools will be more appropriate.
Real-World Usage: As the project creator, I use Musoq in various workplaces to facilitate my daily tasks and improve my workflow efficiency. It has proven to be a valuable tool in real-world scenarios, helping me perform data operations more effectively across different professional environments.
API and Syntax Stability: The core functionality is stable. These changes are always aimed at improving usability and efficiency. While I strive for backwards compatibility, new syntax features may be introduced regularly.
Project Suitability: Musoq is well-suited for projects that involve data analysis, file system operations, and other tasks typically handled by scripting languages. It's designed to be a reliable and efficient tool for these scenarios, especially where its unique syntax features can simplify complex operations.

I'm commited to improving Musoq within its intended scope, with a particular focus on innovative SQL syntax that makes data querying tasks easier. I welcome feedback, bug reports, and contributions from the community, especially those that align with the goal of simplifying complex data operations through clever syntax innovations.

🏗 Architecture

High-level Overview

Plugins

Musoq offers a plugin API that all sources use. To learn how to implement your own plugin, you should examine how existing plugins are created.

💡 Motivation

I hate loops. Developed out of a need for a versatile tool that could query various data sources with SQL syntax, without those horrible loops, Musoq aims to minimize the effort and time required for data querying and analysis.

📄 License

Musoq is licensed under the MIT License - see the LICENSE file for details.

Note: While Musoq uses SQL-like syntax, it may not be fully SQL compliant. Some differences may appear, and Musoq implements some experimental syntax and behaviors that are not used by traditional database engines and this is intended!

Name		Name	Last commit message	Last commit date
Latest commit History 493 Commits
.github/workflows		.github/workflows
Musoq.Converter.Tests		Musoq.Converter.Tests
Musoq.Converter		Musoq.Converter
Musoq.Evaluator.Tests		Musoq.Evaluator.Tests
Musoq.Evaluator		Musoq.Evaluator
Musoq.Parser.Tests		Musoq.Parser.Tests
Musoq.Parser		Musoq.Parser
Musoq.Plugins.Tests		Musoq.Plugins.Tests
Musoq.Plugins		Musoq.Plugins
Musoq.Schema.Tests		Musoq.Schema.Tests
Musoq.Schema		Musoq.Schema
Musoq.Tests.Common		Musoq.Tests.Common
badges		badges
docs		docs
nuget		nuget
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Musoq-Architecture-Engine.png		Musoq-Architecture-Engine.png
Musoq.sln		Musoq.sln
Publish.Nuget.ps1		Publish.Nuget.ps1
README.md		README.md
_config.yml		_config.yml
global.json		global.json
musoq.gif		musoq.gif
musoq_anim_3.gif		musoq_anim_3.gif
musoq_sim_agg_pict.png		musoq_sim_agg_pict.png
query_res.png		query_res.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Musoq

🌟 Key Features

🚀 Quick Start

💡 Where To Use It

📂 File System Analysis

📦 Archive Exploration

🖼️ Image Analysis with AI

🤖 AI-Assisted Text Structuring

🔄 Universal Table Querying

🔧 CAN DBC File Analysis

🎬 Watch It Live

🔧 Syntax Features

🛠 Supported Data Sources

🧭 Roadmap

🌱 Project Maturity

🏗 Architecture

High-level Overview

Plugins

💡 Motivation

📄 License

About

Releases

Packages

Contributors 6

Languages

License

Puchaczov/Musoq

Folders and files

Latest commit

History

Repository files navigation

Musoq

🌟 Key Features

🚀 Quick Start

💡 Where To Use It

📂 File System Analysis

📦 Archive Exploration

🖼️ Image Analysis with AI

🤖 AI-Assisted Text Structuring

🔄 Universal Table Querying

🔧 CAN DBC File Analysis

🎬 Watch It Live

🔧 Syntax Features

🛠 Supported Data Sources

🧭 Roadmap

🌱 Project Maturity

🏗 Architecture

High-level Overview

Plugins

💡 Motivation

📄 License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages