Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2-Way Replication to S3 #15

Open
elliotforbes opened this issue Jan 23, 2021 · 8 comments
Open

2-Way Replication to S3 #15

elliotforbes opened this issue Jan 23, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@elliotforbes
Copy link

Hey, this is awesome!

As a potential enhancement, would it be possible to periodically watch the S3 bucket for updates and restore whenever a new version is published?

This would be excellent for low-load distributed systems. Multiple nodes would be able to write to their local SQLite database, this is pushed up to the S3 bucket and then litestream sees this and updates the local SQLite databases on all of the other nodes within that system.

@elliotforbes elliotforbes changed the title 2 Way Replication to S3 2-Way Replication to S3 Jan 23, 2021
@benbjohnson benbjohnson added the enhancement New feature or request label Jan 23, 2021
@benbjohnson
Copy link
Owner

Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.

@lambrospetrou
Copy link

Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.

Isn't this issue the same as #8 ?
I was looking for a way to bring changes back to read replicas and found these two issues, but just want to confirm I am not misunderstanding what this one is about.

@yanc0
Copy link

yanc0 commented Apr 1, 2021

@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.

In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.

But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.

@lambrospetrou
Copy link

@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.

In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.

But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.

Ah OK, in that case the S3 approach is what I was after as well :)

@andrewchambers
Copy link

andrewchambers commented Jun 11, 2021

I was thinking these issues are so similar, perhaps #8 really just needs to be a way to notify when a new WAL frame is pushed so any polling can be made more timely.

@benbjohnson
Copy link
Owner

@andrewchambers I was thinking that I would just add a polling version of #8. Cloud storage tends to be expensive for downloads so it’s a lot more cost sensitive. I could add something like SQS to notify but that feels overly complicated.

@andrewchambers
Copy link

andrewchambers commented Jun 11, 2021

@benbjohnson fwiw, my main use case is to have a website running on a single server, while i have multiple read only auth gateways which just follow whatever the website has configured.

@tejpochiraju
Copy link

This would obviously be a pretty nifty addition to Litestream. While the feature is built, here's what I am using as a simple Python wrapper that meets my needs (read replica for use with Grafana).

import boto3
import subprocess
import os
from datetime import datetime, timezone
from time import sleep

INTERVAL=60
DB_NAME="my_db.db"
TEMP_NAME="my_db_temp.db"
S3_KEY="path/to/my_db.db"
BUCKET_NAME="my_bucket"

s3 = boto3.client('s3', region_name='ap-south-1')

# 1-1-1970 00:00:00
last_sync = datetime.fromtimestamp(0).replace(tzinfo=timezone.utc)

start_after = ''

while True:
    # Small optimisation to ensure we sync only if S3 has an updated object
    objects = s3.list_objects_v2(Bucket=BUCKET_NAME, Prefix=S3_KEY, StartAfter=start_after)
    contents = objects.get('Contents')
    last_modified = last_sync
    if contents:
        for o in contents:
            modified = o['LastModified']
            if modified > last_modified:
                start_after = o['Key']
                last_modified = modified
        if last_modified > last_sync:
            last_sync = last_modified
            print("Sync started with last_modified:", last_sync)
            subprocess.run(["litestream", "restore", "-o", TEMP_NAME, "s3://{}/{}".format(BUCKET_NAME, S3_KEY)])
            # we need to mv the file as litestream will not restore over an existing DB.
            subprocess.call(["mv", TEMP_NAME, DB_NAME])
            print("Sync completed")
    sleep(INTERVAL)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants