Skip to content

Helper scripts for expanding GabLeaks for easier analysis

License

Notifications You must be signed in to change notification settings

ddosecrets/gableaks-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GabLeaks-Analysis

This codebase relates to the GabLeaks release from Distributed Denial of Secrets. While the Gab data itself is limited distribution, this helper code is public.

The original release format is a Postgresql database dump that includes data in four tables: accounts, statuses, gabgroups, and verifications. However, the 'statuses', 'accounts', and 'gabgroups' tables contain a 'data' column where most of the relevant details are stored in JSON.

The mixed JSON / SQL format is not the most convenient for analysis. This script expands the data, creating new statuses_expanded, accounts_expanded, and gabgroups_expanded tables with individual typed fields instead of one big JSON blob. The schemas are as follows:

statuses_expanded

Field Type
id bigint
bookmark_collection_id bigint
card jsonb
content text
created_at timestamp with time zone
emojis jsonb
expires_at timestamp with time zone
favourited boolean
favourites_count bigint
group_ jsonb
has_quote boolean
in_reply_to_account_id bigint
in_reply_to_id bigint
language text
media_attachments jsonb
mentions jsonb
pinnable boolean
pinnable_by_group boolean
plain_markdown text
poll jsonb
quote jsonb
quote_of_id bigint
reblog jsonb
reblogged boolean
reblogs_count bigint
replies_count bigint
revised_at text
rich_content text
sensitive boolean
spoiler_text text
tags jsonb
url text
visibility text

accounts_expanded

Field Type
id bigint
email text
password text
name text
bot boolean
url text
note text
avatar text
emojis jsonb
fields jsonb
header text
is_pro boolean
locked boolean
is_donor boolean
created timestamp with time zone
is_investor boolean
is_verified boolean
display_name text
avatar_static text
header_static text
statuses_count bigint
followers_count bigint
following_count bigint
is_flagged_as_spam boolean

gabgroups_expanded

Field Type
id int
password text
url text
slug jsonb
tags jsonb
title text
created_at timestamp with time zone
is_private boolean
is_visible boolean
description text
is_archived boolean
has_password boolean
member_count int
group_category jsonb
cover_image_url text
description_html text

Usage

First, ensure you already have the dataset loaded into Postgresql. Installing and configuring postgres, creating a database and user, and importing the database contents from a SQL dump, are out of scope for this README.

Second, install dependencies for this project:

pip3 install -r requirements.txt

Or install manually with:

pip3 install psycopg2 tqdm

Finally, launch the script like:

./expand.py <sql_host> <sql_user> <sql_db_name>

For example:

./expand.py localhost postgres gableaks

You'll be prompted for your postgres user's password, and then the script should take care of everything from there. There's a progress bar included - since there are about 39 million statuses, expanding the table took a little over two hours during development tests.

About

Helper scripts for expanding GabLeaks for easier analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages