GH-48858: [C++][Parquet] Avoid re-serializing footer for signature verification #48859
+101
−70
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DO NOT MERGE until the arrow-testing subrepo is updated with the new regression file
Rationale for this change
When reading an encrypted Parquet file with a plaintext footer, the Parquet reader is able to verify footer integrity by comparing the signature in the file with the one computed by encrypting the footer.
However, the way it does this is to first re-serializes the deserialized footer using Thrift. This has several issues:
Reason 3 is what allowed this to be uncovered by OSS-Fuzz (see https://oss-fuzz.com/testcase-detail/4740205688193024).
This PR switches to reusing the original serialized metadata.
Are these changes tested?
Yes, by existing tests and new fuzz regression file.
Are there any user-facing changes?
No.