Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43535: [C++] support the AWS S3 SSE-C encryption #43601

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ripplehang
Copy link

@ripplehang ripplehang commented Aug 7, 2024

Rationale for this change

What changes are included in this PR?

Extend S3Options to support server-side encryption with customer-provided keys (SSE-C).

Are these changes tested?

Yes

Are there any user-facing changes?

yes, the new member sse_customer_key added for S3Option

Copy link

github-actions bot commented Aug 7, 2024

⚠️ GitHub issue #43535 has been automatically assigned in GitHub to PR creator.

@ripplehang
Copy link
Author

@pitrou @mapleFU @felipecrv can you help to review the PR, Thanks

@pitrou
Copy link
Member

pitrou commented Aug 22, 2024

I took a quick look and here are some assorted thoughts:

  • there are no unit tests
  • only the encryption key needs to be on S3Options, not the MD5 and algorithm string (which can be computed on the fly)

@pitrou
Copy link
Member

pitrou commented Aug 22, 2024

Also, for the other readers wondering, I've found a useful (though possibly outdated?) comparison of S3 encryption options here:
https://www.linkedin.com/pulse/delusion-s3-encryption-benefits-ravi-ivaturi/

@ripplehang
Copy link
Author

I took a quick look and here are some assorted thoughts:

@pitrou Thanks for your review, for #1, i would try to add ut for the change, for #2, yes, actually ONLY the Key is needed, that's why i only expose one SetSSECKey function, and three Get function, and private the new added datamember.
do you suggest to only add one datamember for S3Options? but if there is only one memeber, we may needs to calculate everywhere?

@ripplehang
Copy link
Author

@pitrou I've submitted another commit to only expose the ssec-key in S3Options, and then calculate the MD5 every time on the fly, can you help to review whether there are other comments? I would try to add the ut if there is no interface change required, Thanks

@ripplehang ripplehang force-pushed the support_ssec_for_aws_s3 branch 3 times, most recently from bb2e024 to 539fa7c Compare August 27, 2024 06:00
@ripplehang
Copy link
Author

@pitrou Can you help to review the PR again?Thanks

@ripplehang
Copy link
Author

@kou @mapleFU can you help to review the change, Thanks

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a careful around this weekend

/// \param expect_input_key_size, default 32
/// \return true if the decode and calculate MD5 success, otherwise return false
ARROW_EXPORT
bool CalculateSSECKeyMD5(const std::string& base64_encoded_key, std::string& md5_result,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind change to Result<std::string>?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would try to refine the return value for the function.

Comment on lines 278 to 280
// Decode the Base64-encoded key to get the raw binary key
Aws::Utils::ByteBuffer rawKey =
Aws::Utils::HashingUtils::Base64Decode(base64_encoded_key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do this handles error if base64_encoded_key is not valid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://github.com/aws/aws-sdk-cpp/blob/ac2da09e6930e3988d1289717e2df5d4b7408f17/src/aws-cpp-sdk-core/source/utils/base64/Base64.cpp#L91-L121, seems this aws util API didn't validate the input properly,but directly calculate the output size and allocate the buffer, so i add the check to ensure every character is the valid character here.
Meanwhile, I also add the related Unit test for the CalculateSSECKeyMD5 to test the different input value.


// the key needs to be // 256 bits(32 bytes)according to
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerSideEncryptionCustomerKeys.html#specifying-s3-c-encryption
if (rawKey.GetLength() != expect_input_key_size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this not equal to 256?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the expect_input_key_size is exposed via S3Options, so it's possible for this value to be set as any length.

@github-actions github-actions bot added awaiting committer review Awaiting committer review awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Aug 30, 2024
@ripplehang ripplehang force-pushed the support_ssec_for_aws_s3 branch 3 times, most recently from de05328 to 8649388 Compare August 30, 2024 12:24
@kou kou changed the title GH-43535:[C++] support the AWS S3 SSE-C encryption GH-43535: [C++] support the AWS S3 SSE-C encryption Aug 31, 2024
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix style by pre-commit run --show-diff-on-failure --color=always --all?

Comment on lines 58 to 61
ASSERT_FALSE(CalculateSSECKeyMD5("").ok()); // invalid base64
ASSERT_FALSE(CalculateSSECKeyMD5("%^H").ok()); // invalid base64
ASSERT_FALSE(CalculateSSECKeyMD5("INVALID").ok()); // invalid base64
ASSERT_FALSE(CalculateSSECKeyMD5("MTIzNDU2Nzg5").ok()); // invalid, the input key size not match
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ASSERT_RAISES_WITH_MESSAGES() instead?

cpp/src/arrow/filesystem/filesystem_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/filesystem_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
@@ -443,6 +443,7 @@ class TestS3FS : public S3TestMixin {
// Most tests will create buckets
options_.allow_bucket_creation = true;
options_.allow_bucket_deletion = true;
options_.sse_customer_key = "1WH9aTJ0+Tn0NLbTMHZn9aCW3Li3ViAdBsoIldPCREw=";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to cover the workflow with sse_customer_key nonempty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always set sse_custom_key, we can't test no sse_custom_key case.
Can we add a test that uses sse_custom_key instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and ideally the SSE-C encryption test would write a file with the encryption key, and check that trying to read it with another key (or with no key at all) fails.

cpp/src/arrow/filesystem/util_internal.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/util_internal.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/util_internal.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/util_internal.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review and removed awaiting review Awaiting review labels Aug 31, 2024
@lidavidm lidavidm removed their request for review September 20, 2024 08:41
@ripplehang ripplehang requested a review from kou September 20, 2024 09:07
@ripplehang
Copy link
Author

@kou Can you help to review and help to trigger the CI again? Meanwhile, I've rebased the main branch,thanks

@ripplehang
Copy link
Author

Could you rebase on main?

Could you check S3 filesystem related CI failures?

@kou The reason for the failure is, the minio test server is start with http, while the SSEC-Key would require https.
do you know why the minio server is started with http rather than https? is there any concern for the self-signed certificate?
I tend to start the minio server with self-signed certificate for the new test case only, and keep the current http case unchanged, do you think it's acceptable?Thanks

@pitrou
Copy link
Member

pitrou commented Sep 20, 2024

I think it's ok to switch the tests to access Minio using HTTPS with a self-signed certificate.

@pitrou
Copy link
Member

pitrou commented Sep 20, 2024

do you know why the minio server is started with http rather than https?

Just because it was easier :-) No particular reason otherwise.

sse_customer_key[31] = '\xFA'; // non-ASCII
std::string sse_customer_key_string(sse_customer_key, sizeof(sse_customer_key));
ASSERT_OK_AND_ASSIGN(auto md5, CalculateSSECustomerKeyMD5(sse_customer_key_string))
ASSERT_STREQ(md5.c_str(), "97FTa6lj0hE7lshKdBy61g=="); // valid case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ASSERT_STREQ(md5.c_str(), "97FTa6lj0hE7lshKdBy61g=="); // valid case
ASSERT_EQ(md5, "97FTa6lj0hE7lshKdBy61g=="); // valid case

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 20, 2024
@kou
Copy link
Member

kou commented Sep 20, 2024

We can always use HTTPS Minio.
Can we prepare a self-signed certificate with portable way? (Do we need openssl command for it?)

@kou
Copy link
Member

kou commented Sep 20, 2024

FYI: I created https://github.com/apache/arrow-flight-sql-postgresql/blob/main/dev/prepare-tls.sh for other Arrow related project but this is not for Windows...

@pitrou
Copy link
Member

pitrou commented Sep 20, 2024

There is no need to generate a self-signed cert everytime, it can just be stored in the repository.

@kou
Copy link
Member

kou commented Sep 21, 2024

In general, it's not a good idea.
If we add it to this repository, someone may report it a security vulnerability. We need to explain that it's not danger.
FYI: GH-6399

(If we don't have another approach, we can use the approach.)

@pitrou
Copy link
Member

pitrou commented Sep 21, 2024

If someone is silly enough to report a self-signed certificate used for testing purposes as a security vulnerability, then they deserve to be ignored.

CPython has stored a ton of fake certificates in its repository for decades and we don't get any complaints:
https://github.com/python/cpython/tree/main/Lib/test/certdata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants