Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace ZipInputStream with ZipFile #10899

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

jo-pol
Copy link
Contributor

@jo-pol jo-pol commented Oct 1, 2024

What this PR does / why we need it:

It will no longer fail to upload individual files from a zip downloaded from an ownCloud service.
See issue for further details.

Which issue(s) this PR closes:

Closes #10898

Special notes for your reviewer:

You will get less differences when ignoring white space changes:
https://github.com/IQSS/dataverse/compare/develop...DANS-KNAW-jp:dataverse:10898-own-cloud-zips?w=1

Replaced ZipInputStream with ZipFile in:

  • CreateNewDataFilesCommand
  • ShapefileHandler (abandoned ShapefileHandler constructor with FileInputStream to allow the use of ZipFile)

Additional

  • for the new iteration method over zip entries in CreateNewDataFilesCommand: extracted methods filteredZipEntries, openZipFile, getShortName and isFileToSkip The extracted code is slightly different from the ShapeFileHandler.isFileToSkip but changing behavior is beyond the scope of the issue.
  • introduced some unit tests for CreateNewDataFilesCommand or should I call it an integragtion test for the changed classes.
  • to allow (or at least simplify) the new unit test, FileUtil.determineFileType catches Bag exceptions The method will now return application/zip rather than throw. Without catching I would need complex mocking to get a BagitFileHandler via CDI just to test the rest.

Suggestions on how to test this:

see issue

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

Zip file downloaded from an own cloud service will no longer be ingested as-is.

Additional documentation:

@pdurbin pdurbin added the Size: 10 A percentage of a sprint. 7 hours. label Oct 1, 2024

if (!zipEntry.isDirectory()) {
try (var zipFile = openZipFile(tempFile, charset)) {
for (var entry : filteredZipEntries(zipFile)) {
Copy link
Member

@qqmyers qqmyers Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid re opening the zip and getting entries since you already have them from lines 319-320 (if you stay in the same try block)?

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'd suggest adding a one-line release note, e.g. Unzip during upload now supports more variations of the zip format, including the zip files generated by ownCloud.

@qqmyers qqmyers added Size: 3 A percentage of a sprint. 2.1 hours. and removed Size: 10 A percentage of a sprint. 7 hours. labels Oct 1, 2024
@cmbz cmbz added the FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: Ready for QA ⏩
Development

Successfully merging this pull request may close these issues.

zip files created with an own cloud service are ingested as is
4 participants