-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43535: [C++] support the AWS S3 SSE-C encryption #43601
base: main
Are you sure you want to change the base?
Conversation
|
7e1f200
to
1134b07
Compare
@pitrou @mapleFU @felipecrv can you help to review the PR, Thanks |
I took a quick look and here are some assorted thoughts:
|
Also, for the other readers wondering, I've found a useful (though possibly outdated?) comparison of S3 encryption options here: |
@pitrou Thanks for your review, for #1, i would try to add ut for the change, for #2, yes, actually ONLY the Key is needed, that's why i only expose one SetSSECKey function, and three Get function, and private the new added datamember. |
@pitrou I've submitted another commit to only expose the ssec-key in S3Options, and then calculate the MD5 every time on the fly, can you help to review whether there are other comments? I would try to add the ut if there is no interface change required, Thanks |
bb2e024
to
539fa7c
Compare
@pitrou Can you help to review the PR again?Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a careful around this weekend
/// \param expect_input_key_size, default 32 | ||
/// \return true if the decode and calculate MD5 success, otherwise return false | ||
ARROW_EXPORT | ||
bool CalculateSSECKeyMD5(const std::string& base64_encoded_key, std::string& md5_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind change to Result<std::string>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I would try to refine the return value for the function.
// Decode the Base64-encoded key to get the raw binary key | ||
Aws::Utils::ByteBuffer rawKey = | ||
Aws::Utils::HashingUtils::Base64Decode(base64_encoded_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do this handles error if base64_encoded_key
is not valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://github.com/aws/aws-sdk-cpp/blob/ac2da09e6930e3988d1289717e2df5d4b7408f17/src/aws-cpp-sdk-core/source/utils/base64/Base64.cpp#L91-L121, seems this aws util API didn't validate the input properly,but directly calculate the output size and allocate the buffer, so i add the check to ensure every character is the valid character here.
Meanwhile, I also add the related Unit test for the CalculateSSECKeyMD5 to test the different input value.
|
||
// the key needs to be // 256 bits(32 bytes)according to | ||
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerSideEncryptionCustomerKeys.html#specifying-s3-c-encryption | ||
if (rawKey.GetLength() != expect_input_key_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this not equal to 256?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the expect_input_key_size is exposed via S3Options, so it's possible for this value to be set as any length.
de05328
to
8649388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix style by pre-commit run --show-diff-on-failure --color=always --all
?
ASSERT_FALSE(CalculateSSECKeyMD5("").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("%^H").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("INVALID").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("MTIzNDU2Nzg5").ok()); // invalid, the input key size not match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use ASSERT_RAISES_WITH_MESSAGES()
instead?
@@ -443,6 +443,7 @@ class TestS3FS : public S3TestMixin { | |||
// Most tests will create buckets | |||
options_.allow_bucket_creation = true; | |||
options_.allow_bucket_deletion = true; | |||
options_.sse_customer_key = "1WH9aTJ0+Tn0NLbTMHZn9aCW3Li3ViAdBsoIldPCREw="; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to cover the workflow with sse_customer_key nonempty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we always set sse_custom_key
, we can't test no sse_custom_key
case.
Can we add a test that uses sse_custom_key
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and ideally the SSE-C encryption test would write a file with the encryption key, and check that trying to read it with another key (or with no key at all) fails.
d5db532
to
e2f6ec1
Compare
e2f6ec1
to
f019bc5
Compare
f019bc5
to
954176b
Compare
@kou Can you help to review and help to trigger the CI again? Meanwhile, I've rebased the main branch,thanks |
@kou The reason for the failure is, the minio test server is start with http, while the SSEC-Key would require https. |
I think it's ok to switch the tests to access Minio using HTTPS with a self-signed certificate. |
Just because it was easier :-) No particular reason otherwise. |
sse_customer_key[31] = '\xFA'; // non-ASCII | ||
std::string sse_customer_key_string(sse_customer_key, sizeof(sse_customer_key)); | ||
ASSERT_OK_AND_ASSIGN(auto md5, CalculateSSECustomerKeyMD5(sse_customer_key_string)) | ||
ASSERT_STREQ(md5.c_str(), "97FTa6lj0hE7lshKdBy61g=="); // valid case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT_STREQ(md5.c_str(), "97FTa6lj0hE7lshKdBy61g=="); // valid case | |
ASSERT_EQ(md5, "97FTa6lj0hE7lshKdBy61g=="); // valid case |
We can always use HTTPS Minio. |
FYI: I created https://github.com/apache/arrow-flight-sql-postgresql/blob/main/dev/prepare-tls.sh for other Arrow related project but this is not for Windows... |
There is no need to generate a self-signed cert everytime, it can just be stored in the repository. |
In general, it's not a good idea. (If we don't have another approach, we can use the approach.) |
If someone is silly enough to report a self-signed certificate used for testing purposes as a security vulnerability, then they deserve to be ignored. CPython has stored a ton of fake certificates in its repository for decades and we don't get any complaints: |
Rationale for this change
What changes are included in this PR?
Extend S3Options to support server-side encryption with customer-provided keys (SSE-C).
Are these changes tested?
Yes
Are there any user-facing changes?
yes, the new member sse_customer_key added for S3Option