Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and complain about use of pgx-pg-sys functions from multiple threads #777

Merged
merged 4 commits into from
Nov 8, 2022

Conversation

thomcc
Copy link
Contributor

@thomcc thomcc commented Oct 18, 2022

Doing this will likely break stuff like #775, although that code is already thread-unsafe (due to our setjmp shim)... That said, it reduces UB from data races (at best) to a panic with a reasonable backtrace.

@thomcc
Copy link
Contributor Author

thomcc commented Oct 18, 2022

It looks like I was mislead by the docs I read -- this should always be fine then.

@thomcc
Copy link
Contributor Author

thomcc commented Oct 18, 2022

Hmm, actually, maybe I need to set an atfork handler to clear this in the child...

pgx-pg-sys/src/submodules/thread_check.rs Outdated Show resolved Hide resolved
unsafe fn pg_guard_ffi_boundary_impl<T, F: FnOnce() -> T>(f: F) -> T {
//! This is the version that uses sigsetjmp and all that, for "normal" Rust/PGX interfaces.
use crate as pg_sys;

// The next code is definitely thread-unsafe (it manipulates statics in an
// unsynchronized manner), so we may as well check here.
super::thread_check::check_active_thread();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we #[cfg] this out of --release builds? Or make it an assert!() or have it behind a feature flag that's disabled by default or something?

We really don't need additional overhead for every call back into Postgres for something that's most likely not going to happen in the typical pgx extension.

Copy link
Contributor Author

@thomcc thomcc Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, well, it's still unsound. I think we could have a feature to disable this check, but by default it feels like a bit of a footgun...

That said I could also just make this be a single load from %fs:0/tpidr_el0, as mentioned in the PR. In which case it should only be a few-cycle load and a (easily predicted) branch in the worst case, likely not much worse than array bounds checks.

I'm more concerned that this might break the fork case, which I need to do some testing with.

Copy link
Contributor

@eeeebbbbrrrr eeeebbbbrrrr Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is that the ideal situation is that it's compiled out of --release builds.

The reasoning here is that basically all pgx extensions are going to be single-threaded, just because Postgres is. If an extension does threading, it'd likely be "accidentally" from some 3rd-party crate like rayon or crossbeam.

Keeping the check for debug builds would, in a perfect world, catch these problems during development and tests.

If we want a feature like pedantic-threadness-checks (or whatever) that can be enabled by the developer for --release builds, that'd be fine. I do see the usefulness of this in a release build for overly-paranoid runtime environments.

EDIT: As an aside, what kind of runtime overhead does #[track_caller] add per call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It seems pretty easy to accidentally end up calling into PGX from multiple threads given that so much of rust assumes thread-safety by default (outside of unsafe)...

EDIT: As an aside, what kind of runtime overhead does #[track_caller] add per call?

Adds a secret argument of an &'static Location<'static>, which gets loaded from static data and passed in. Concretely, less than passing an extra &'static str argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. Then I'd like to see some kind of "benchmark" to see if this shows up.

And I don't really mean anything scientific here. Like, does calling this function (https://github.com/tcdi/pgx/blob/d1ffdb2969f414652d48548341e9d9ff9d02e672/pgx-examples/srf/src/lib.rs#L15) through SQL as something like SELECT srf.generate_series(1, 10000000); show any measurable difference with and without this check?

Where I'm concerned about overhead is in a case like this where a #[pg_extern] function is being called 10s-or-100s-of-millions-of-times by some SQL statement.

Copy link
Contributor Author

@thomcc thomcc Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, a request for a benchmark is totally reasonable. I'll work on that (after I make sure the atfork handling is sufficient in the cross-process case).

I'll re-request review when that's ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the Timescale folks have mentioned setjmp overhead already. My intuition says this would pop up on their radar too. ZomboDB too, for that matter.

If it doesn't turn out to be significant, then great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be significantly cheaper than the call to sigsetjmp, but I'll take the time to ensure that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured this in a few ways, both as standalone benchmarks, benchmarks that include the rest of the pgx sigsetjmp error-handling dance, and in real queries (SELECT my_generate_series(0, 10000000, 1), with \timing on). I did this on linux and macOS, which reported similar numbers (although Linux numbers were like 10x slower across the board due to the machine being literally the cheapest laptop I could find 3y ago).

The check in this PR takes around 700ps standalone, and increases the average cost the error handling from ~6ns to ~6.5ns in my benchmarks (there's probably around 200ps of overhead from measurement).

When used in a query, these checks had no reliable overhead that I could detect. With or without the thread safety check, SELECT my_generate_series(0, 10000000, 1), reported times between 950ms and 975ms (well, 4.5-5ms on my linux box). I tested this pretty thoroughly (several DBs/extensions), and am confident that it's not a case of the old code remaining in use.

We could get this down to essentially 0-cost1 via the use of inline asm. I have implemented this in the thomcc/threadsafe-asm branch (the relevant code is https://github.com/tcdi/pgx/blob/thomcc/threadsafe-asm/pgx-pg-sys/src/submodules/thread_check.rs#L115-L194). It wouldn't be that hard for me to maintain (I only implemented versions I understood), but is definitely way worse than the version in this PR, so I don't know that I think this is worth it, since the change is not measurable in queries2.

It's definitely your call, though -- I'd be totally fine to pull that into this PR if you'd rather take the optimization.

Footnotes

  1. Small enough to be comparable with black_box(0) on its own (100ps), and to cause no measurable difference when run as part of the other pg error code.

  2. I wouldn't have bothered with this had I realized that, but my initial measurements accidentally compared versions with different compiler flags, so I thought the overhead was much more significant at first.

@thomcc
Copy link
Contributor Author

thomcc commented Oct 29, 2022

Because it got mentioned in #780 (and because I realized I was making a dumb mistake in the macOS reloading issue, and needed to look at something else), I finished this up. See #777 (comment) for a description of the overhead (very small, not measurable in queries) and a link to an alternate approach that is (much) worse to maintain but is basically 0 cost on the platforms we care about.

@workingjubilee
Copy link
Member

I'm gonna need to make sure this works with PL/Rust before we merge this.

@thomcc
Copy link
Contributor Author

thomcc commented Oct 31, 2022

I'm gonna need to make sure this works with PL/Rust before we merge this.

The only call to the check is inside a cfg(not(feature = "postgrestd")) function, but yeah, worth double-checking. (Actually, there might be a lot of unused code warnings if that feature is on...).

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR passed PL/Rust's test suite in tcdi/plrust#105

700 picoseconds is obviously worth the improvement in soundness, and unlike the previous version, this isn't likely to break on no-op thread renaming antics.

pgx-pg-sys/src/submodules/thread_check.rs Outdated Show resolved Hide resolved
Comment on lines +56 to +57
fn nonzero_thread_id() -> NonZeroUsize {
// Returns the `addr()` of a thread local variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow.
cute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't think the asm! version was cuter? 🐱

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow.

pgx-pg-sys/src/submodules/thread_check.rs Outdated Show resolved Hide resolved
pgx-pg-sys/src/submodules/thread_check.rs Outdated Show resolved Hide resolved
fn init_active_thread(tid: NonZeroUsize) {
match ACTIVE_THREAD.compare_exchange(0, tid.get(), Ordering::Relaxed, Ordering::Relaxed) {
Ok(_) => unsafe {
// We won the race. Register an atfork handler to clear the atomic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Jubilee <46493976+workingjubilee@users.noreply.github.com>
@workingjubilee workingjubilee dismissed eeeebbbbrrrr’s stale review November 8, 2022 00:06

Eric confirmed this was okay in Discord

@eeeebbbbrrrr eeeebbbbrrrr merged commit 3e4b188 into pgcentralfoundation:develop Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants