2-Way Replication to S3 #15

elliotforbes · 2021-01-23T11:06:03Z

Hey, this is awesome!

As a potential enhancement, would it be possible to periodically watch the S3 bucket for updates and restore whenever a new version is published?

This would be excellent for low-load distributed systems. Multiple nodes would be able to write to their local SQLite database, this is pushed up to the S3 bucket and then litestream sees this and updates the local SQLite databases on all of the other nodes within that system.

benbjohnson · 2021-01-23T14:16:52Z

Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.

lambrospetrou · 2021-03-31T20:37:16Z

Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.

Isn't this issue the same as #8 ?
I was looking for a way to bring changes back to read replicas and found these two issues, but just want to confirm I am not misunderstanding what this one is about.

yanc0 · 2021-04-01T06:02:37Z

@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.

In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.

But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.

lambrospetrou · 2021-04-01T08:59:37Z

@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.

In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.

But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.

Ah OK, in that case the S3 approach is what I was after as well :)

andrewchambers · 2021-06-11T02:05:17Z

I was thinking these issues are so similar, perhaps #8 really just needs to be a way to notify when a new WAL frame is pushed so any polling can be made more timely.

benbjohnson · 2021-06-11T02:45:42Z

@andrewchambers I was thinking that I would just add a polling version of #8. Cloud storage tends to be expensive for downloads so it’s a lot more cost sensitive. I could add something like SQS to notify but that feels overly complicated.

andrewchambers · 2021-06-11T04:47:10Z

@benbjohnson fwiw, my main use case is to have a website running on a single server, while i have multiple read only auth gateways which just follow whatever the website has configured.

tejpochiraju · 2021-06-22T12:46:22Z

This would obviously be a pretty nifty addition to Litestream. While the feature is built, here's what I am using as a simple Python wrapper that meets my needs (read replica for use with Grafana).

import boto3
import subprocess
import os
from datetime import datetime, timezone
from time import sleep

INTERVAL=60
DB_NAME="my_db.db"
TEMP_NAME="my_db_temp.db"
S3_KEY="path/to/my_db.db"
BUCKET_NAME="my_bucket"

s3 = boto3.client('s3', region_name='ap-south-1')

# 1-1-1970 00:00:00
last_sync = datetime.fromtimestamp(0).replace(tzinfo=timezone.utc)

start_after = ''

while True:
    # Small optimisation to ensure we sync only if S3 has an updated object
    objects = s3.list_objects_v2(Bucket=BUCKET_NAME, Prefix=S3_KEY, StartAfter=start_after)
    contents = objects.get('Contents')
    last_modified = last_sync
    if contents:
        for o in contents:
            modified = o['LastModified']
            if modified > last_modified:
                start_after = o['Key']
                last_modified = modified
        if last_modified > last_sync:
            last_sync = last_modified
            print("Sync started with last_modified:", last_sync)
            subprocess.run(["litestream", "restore", "-o", TEMP_NAME, "s3://{}/{}".format(BUCKET_NAME, S3_KEY)])
            # we need to mv the file as litestream will not restore over an existing DB.
            subprocess.call(["mv", TEMP_NAME, DB_NAME])
            print("Sync completed")
    sleep(INTERVAL)

elliotforbes changed the title ~~2 Way Replication to S3~~ 2-Way Replication to S3 Jan 23, 2021

benbjohnson added the enhancement New feature or request label Jan 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2-Way Replication to S3 #15

2-Way Replication to S3 #15

elliotforbes commented Jan 23, 2021

benbjohnson commented Jan 23, 2021

lambrospetrou commented Mar 31, 2021

yanc0 commented Apr 1, 2021

lambrospetrou commented Apr 1, 2021

andrewchambers commented Jun 11, 2021 •

edited

Loading

benbjohnson commented Jun 11, 2021

andrewchambers commented Jun 11, 2021 •

edited

Loading

tejpochiraju commented Jun 22, 2021

2-Way Replication to S3 #15

2-Way Replication to S3 #15

Comments

elliotforbes commented Jan 23, 2021

benbjohnson commented Jan 23, 2021

lambrospetrou commented Mar 31, 2021

yanc0 commented Apr 1, 2021

lambrospetrou commented Apr 1, 2021

andrewchambers commented Jun 11, 2021 • edited Loading

benbjohnson commented Jun 11, 2021

andrewchambers commented Jun 11, 2021 • edited Loading

tejpochiraju commented Jun 22, 2021

andrewchambers commented Jun 11, 2021 •

edited

Loading

andrewchambers commented Jun 11, 2021 •

edited

Loading