Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Jan 14, 2026

DO NOT MERGE until the arrow-testing subrepo is updated with the new regression file

Rationale for this change

When reading an encrypted Parquet file with a plaintext footer, the Parquet reader is able to verify footer integrity by comparing the signature in the file with the one computed by encrypting the footer.

However, the way it does this is to first re-serializes the deserialized footer using Thrift. This has several issues:

  1. it's inefficient
  2. it's not obvious that it will always produce the same Thrift encoding as the original, leading to spurious signature verification failures
  3. if the original footer deserializes to invalid enum values, attempting to serialize it again will lead to undefined behavior

Reason 3 is what allowed this to be uncovered by OSS-Fuzz (see https://oss-fuzz.com/testcase-detail/4740205688193024).

This PR switches to reusing the original serialized metadata.

Are these changes tested?

Yes, by existing tests and new fuzz regression file.

Are there any user-facing changes?

No.

@pitrou pitrou requested review from EnricoMi and adamreeve January 14, 2026 15:24
@pitrou pitrou marked this pull request as ready for review January 14, 2026 15:28
@pitrou pitrou requested a review from wgtmac as a code owner January 14, 2026 15:28
Comment on lines +335 to 336
PARQUET_DEPRECATED("Deprecated in 24.0.0. Use the two-argument overload instead.")
bool VerifySignature(const void* signature);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to deprecate this but we might also remove it, as it seems this API is meant for internal use? @adamreeve @EnricoMi

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 14, 2026
Comment on lines +837 to +838
bool VerifySignature(std::span<const uint8_t> serialized_metadata,
std::span<const uint8_t> signature) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this as a method of FileMetaData but the only member it uses is the FileDecryptor, so perhaps this should be moved elsewhere (or made static?).

@pitrou
Copy link
Member Author

pitrou commented Jan 14, 2026

Hmm, it looks like we need to wait for #48819 for the R failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant