-
-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and complain about use of pgx-pg-sys
functions from multiple threads
#777
Conversation
It looks like I was mislead by the docs I read -- this should always be fine then. |
Hmm, actually, maybe I need to set an atfork handler to clear this in the child... |
unsafe fn pg_guard_ffi_boundary_impl<T, F: FnOnce() -> T>(f: F) -> T { | ||
//! This is the version that uses sigsetjmp and all that, for "normal" Rust/PGX interfaces. | ||
use crate as pg_sys; | ||
|
||
// The next code is definitely thread-unsafe (it manipulates statics in an | ||
// unsynchronized manner), so we may as well check here. | ||
super::thread_check::check_active_thread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we #[cfg]
this out of --release
builds? Or make it an assert!()
or have it behind a feature flag that's disabled by default or something?
We really don't need additional overhead for every call back into Postgres for something that's most likely not going to happen in the typical pgx extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, well, it's still unsound. I think we could have a feature to disable this check, but by default it feels like a bit of a footgun...
That said I could also just make this be a single load from %fs:0
/tpidr_el0
, as mentioned in the PR. In which case it should only be a few-cycle load and a (easily predicted) branch in the worst case, likely not much worse than array bounds checks.
I'm more concerned that this might break the fork
case, which I need to do some testing with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view is that the ideal situation is that it's compiled out of --release
builds.
The reasoning here is that basically all pgx extensions are going to be single-threaded, just because Postgres is. If an extension does threading, it'd likely be "accidentally" from some 3rd-party crate like rayon or crossbeam.
Keeping the check for debug builds would, in a perfect world, catch these problems during development and tests.
If we want a feature like pedantic-threadness-checks
(or whatever) that can be enabled by the developer for --release
builds, that'd be fine. I do see the usefulness of this in a release build for overly-paranoid runtime environments.
EDIT: As an aside, what kind of runtime overhead does #[track_caller]
add per call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. It seems pretty easy to accidentally end up calling into PGX from multiple threads given that so much of rust assumes thread-safety by default (outside of unsafe)...
EDIT: As an aside, what kind of runtime overhead does #[track_caller] add per call?
Adds a secret argument of an &'static Location<'static>
, which gets loaded from static data and passed in. Concretely, less than passing an extra &'static str
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well. Then I'd like to see some kind of "benchmark" to see if this shows up.
And I don't really mean anything scientific here. Like, does calling this function (https://github.com/tcdi/pgx/blob/d1ffdb2969f414652d48548341e9d9ff9d02e672/pgx-examples/srf/src/lib.rs#L15) through SQL as something like SELECT srf.generate_series(1, 10000000);
show any measurable difference with and without this check?
Where I'm concerned about overhead is in a case like this where a #[pg_extern]
function is being called 10s-or-100s-of-millions-of-times by some SQL statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, a request for a benchmark is totally reasonable. I'll work on that (after I make sure the atfork handling is sufficient in the cross-process case).
I'll re-request review when that's ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the Timescale folks have mentioned setjmp
overhead already. My intuition says this would pop up on their radar too. ZomboDB too, for that matter.
If it doesn't turn out to be significant, then great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be significantly cheaper than the call to sigsetjmp, but I'll take the time to ensure that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured this in a few ways, both as standalone benchmarks, benchmarks that include the rest of the pgx sigsetjmp error-handling dance, and in real queries (SELECT my_generate_series(0, 10000000, 1)
, with \timing
on). I did this on linux and macOS, which reported similar numbers (although Linux numbers were like 10x slower across the board due to the machine being literally the cheapest laptop I could find 3y ago).
The check in this PR takes around 700ps standalone, and increases the average cost the error handling from ~6ns to ~6.5ns in my benchmarks (there's probably around 200ps of overhead from measurement).
When used in a query, these checks had no reliable overhead that I could detect. With or without the thread safety check, SELECT my_generate_series(0, 10000000, 1)
, reported times between 950ms and 975ms (well, 4.5-5ms on my linux box). I tested this pretty thoroughly (several DBs/extensions), and am confident that it's not a case of the old code remaining in use.
We could get this down to essentially 0-cost1 via the use of inline asm. I have implemented this in the thomcc/threadsafe-asm
branch (the relevant code is https://github.com/tcdi/pgx/blob/thomcc/threadsafe-asm/pgx-pg-sys/src/submodules/thread_check.rs#L115-L194). It wouldn't be that hard for me to maintain (I only implemented versions I understood), but is definitely way worse than the version in this PR, so I don't know that I think this is worth it, since the change is not measurable in queries2.
It's definitely your call, though -- I'd be totally fine to pull that into this PR if you'd rather take the optimization.
Footnotes
-
Small enough to be comparable with
black_box(0)
on its own (100ps), and to cause no measurable difference when run as part of the other pg error code. ↩ -
I wouldn't have bothered with this had I realized that, but my initial measurements accidentally compared versions with different compiler flags, so I thought the overhead was much more significant at first. ↩
Because it got mentioned in #780 (and because I realized I was making a dumb mistake in the macOS reloading issue, and needed to look at something else), I finished this up. See #777 (comment) for a description of the overhead (very small, not measurable in queries) and a link to an alternate approach that is (much) worse to maintain but is basically 0 cost on the platforms we care about. |
I'm gonna need to make sure this works with PL/Rust before we merge this. |
The only call to the check is inside a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR passed PL/Rust's test suite in tcdi/plrust#105
700 picoseconds is obviously worth the improvement in soundness, and unlike the previous version, this isn't likely to break on no-op thread renaming antics.
fn nonzero_thread_id() -> NonZeroUsize { | ||
// Returns the `addr()` of a thread local variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow.
cute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't think the asm!
version was cuter? 🐱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow.
fn init_active_thread(tid: NonZeroUsize) { | ||
match ACTIVE_THREAD.compare_exchange(0, tid.get(), Ordering::Relaxed, Ordering::Relaxed) { | ||
Ok(_) => unsafe { | ||
// We won the race. Register an atfork handler to clear the atomic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Jubilee <46493976+workingjubilee@users.noreply.github.com>
Eric confirmed this was okay in Discord
Doing this will likely break stuff like #775, although that code is already thread-unsafe (due to our setjmp shim)... That said, it reduces UB from data races (at best) to a panic with a reasonable backtrace.