Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/8889 - second try to limit file pids management #9721

15 changes: 15 additions & 0 deletions doc/release-notes/8889-2-filepids-in-collections-changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
The default for whether PIDs are registered for files or not is now false.

Installations where file PIDs were enabled by default will have to add the :FilePIDsEnabled = true setting to maintain the existing functionality.

Add step to install:

If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:

curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled



It is now possible to allow File PIDs to be enabled/disabled per collection. See the [:AllowEnablingFilePIDsPerCollection](https://guides.dataverse.org/en/latest/installation/config.html#allowenablingfilepidspercollection) section of the Configuration guide for details.

For example, registration of PIDs for files can now be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default.
7 changes: 5 additions & 2 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ In the following example, the database id of the file is 42::

export FILE_ID=42
curl "http://localhost:8080/api/admin/$FILE_ID/registerDataFile"

This method will return a FORBIDDEN response if minting of file PIDs is not enabled for the collection the file is in. (Note that it is possible to have file PIDs enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)

Mint PIDs for all unregistered published files in the specified collection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -162,7 +164,8 @@ The following API will register the PIDs for all the yet unregistered published

curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}"

It will not attempt to register the datafiles in its sub-collections, so this call will need to be repeated on any sub-collections where files need to be registered as well. File-level PID registration must be enabled on the collection. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)
It will not attempt to register the datafiles in its sub-collections, so this call will need to be repeated on any sub-collections where files need to be registered as well.
File-level PID registration must be enabled on the collection. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)

This API will sleep for 1 second between registration calls by default. A longer sleep interval can be specified with an optional ``sleep=`` parameter::

Expand All @@ -171,7 +174,7 @@ This API will sleep for 1 second between registration calls by default. A longer
Mint PIDs for ALL unregistered files in the database
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following API will attempt to register the PIDs for all the published files in your instance that do not yet have them::
The following API will attempt to register the PIDs for all the published files in your instance, in collections that allow file PIDs, that do not yet have them::

curl http://localhost:8080/api/admin/registerDataFileAll

Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -753,7 +753,7 @@ The following attributes are supported:
* ``name`` Name
* ``description`` Description
* ``affiliation`` Affiliation
* ``filePIDsEnabled`` ("true" or "false") Enables or disables registration of file-level PIDs in datasets within the collection (overriding the instance-wide setting).
* ``filePIDsEnabled`` ("true" or "false") Restricted to use by superusers and only when the :ref:`:AllowEnablingFilePIDsPerCollection <:AllowEnablingFilePIDsPerCollection>` setting is true. Enables or disables registration of file-level PIDs in datasets within the collection (overriding the instance-wide setting).


Datasets
Expand Down
31 changes: 26 additions & 5 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ this provider.
- :ref:`:Shoulder <:Shoulder>`
- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional)
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)

.. _pids-handle-configuration:

Expand Down Expand Up @@ -297,7 +297,7 @@ Here are the configuration options for PermaLinks:
- :ref:`:Shoulder <:Shoulder>`
- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional)
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)

.. _auth-modes:

Expand Down Expand Up @@ -2775,14 +2775,35 @@ timestamps.
:FilePIDsEnabled
++++++++++++++++

Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be false. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default this setting is no longer absent, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - a new installation will not have it set, which we now interpret as false. Existing installs which had it not set will now have it set true.


If you don't want to register file-based PIDs for your installation, set:
It is possible to override the installation-wide setting for specific collections, see :ref:`:AllowEnablingFilePIDsPerCollection <:AllowEnablingFilePIDsPerCollection>`. For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.

To enable file-level PIDs for the entire installation::

``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled``


If you don't want to register file-based PIDs for your entire installation::

``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FilePIDsEnabled``

.. _:AllowEnablingFilePIDsPerCollection:

:AllowEnablingFilePIDsPerCollection
+++++++++++++++++++++++++++++++++++

Toggles whether superusers can change the File PIDs policy per collection. By default this setting is absent and Dataverse Software assumes it to be false.

For example, if this setting is true, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.

To enable setting file-level PIDs per collection::

``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:AllowEnablingFilePIDsPerCollection``


When :AllowEnablingFilePIDsPerCollection is true, setting File PIDs to be enabled/disabled for a given collection can be done via the Native API - see :ref:`collection-attributes-api` in the Native API Guide.

It is possible to override the installation-wide setting for specific collections. For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.

.. _:IndependentHandleService:

Expand Down
52 changes: 35 additions & 17 deletions src/main/java/edu/harvard/iq/dataverse/api/Admin.java
Original file line number Diff line number Diff line change
Expand Up @@ -1514,6 +1514,9 @@ public Response registerDataFile(@Context ContainerRequestContext crc, @PathPara
User u = getRequestUser(crc);
DataverseRequest r = createDataverseRequest(u);
DataFile df = findDataFileOrDie(id);
if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) {
return forbidden("PIDs are not enabled for this file's collection.");
}
if (df.getIdentifier() == null || df.getIdentifier().isEmpty()) {
execCommand(new RegisterDvObjectCommand(r, df));
} else {
Expand All @@ -1537,48 +1540,64 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) {
Integer alreadyRegistered = 0;
Integer released = 0;
Integer draft = 0;
Integer skipped = 0;
logger.info("Starting to register: analyzing " + count + " files. " + new Date());
logger.info("Only unregistered, published files will be registered.");
User u = null;
try {
u = getRequestAuthenticatedUserOrDie(crc);
} catch (WrappedResponse e1) {
return error(Status.UNAUTHORIZED, "api key required");
}
DataverseRequest r = createDataverseRequest(u);
for (DataFile df : fileService.findAll()) {
try {
if ((df.getIdentifier() == null || df.getIdentifier().isEmpty())) {
if (df.isReleased()) {
if(!systemConfig.isFilePIDsEnabledForCollection(df.getOwner().getOwner())) {
skipped++;
if (skipped % 100 == 0) {
logger.info(skipped + " of " + count + " files not in collections that allow file PIDs. " + new Date());
}
} else if (df.isReleased()) {
released++;
User u = getRequestAuthenticatedUserOrDie(crc);
DataverseRequest r = createDataverseRequest(u);
execCommand(new RegisterDvObjectCommand(r, df));
successes++;
if (successes % 100 == 0) {
logger.info(successes + " of " + count + " files registered successfully. " + new Date());
}
try {
Thread.sleep(1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}
} else {
draft++;
logger.info(draft + " of " + count + " files not yet published");
if (draft % 100 == 0) {
logger.info(draft + " of " + count + " files not yet published");
}
}
} else {
alreadyRegistered++;
logger.info(alreadyRegistered + " of " + count + " files are already registered. " + new Date());
if(alreadyRegistered % 100 == 0) {
logger.info(alreadyRegistered + " of " + count + " files are already registered. " + new Date());
}
}
} catch (WrappedResponse ex) {
released++;
logger.info("Failed to register file id: " + df.getId());
Logger.getLogger(Datasets.class.getName()).log(Level.SEVERE, null, ex);
} catch (Exception e) {
logger.info("Unexpected Exception: " + e.getMessage());
}

try {
Thread.sleep(1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}

}
logger.info("Final Results:");
logger.info(alreadyRegistered + " of " + count + " files were already registered. " + new Date());
logger.info(draft + " of " + count + " files are not yet published. " + new Date());
logger.info(released + " of " + count + " unregistered, published files to register. " + new Date());
logger.info(successes + " of " + released + " unregistered, published files registered successfully. "
+ new Date());
logger.info(skipped + " of " + count + " files not in collections that allow file PIDs. " + new Date());

return ok("Datafile registration complete." + successes + " of " + released
+ " unregistered, published files registered successfully.");
Expand Down Expand Up @@ -1633,6 +1652,11 @@ public Response registerDataFilesInCollection(@Context ContainerRequestContext c
if (countSuccesses % 100 == 0) {
logger.info(countSuccesses + " out of " + count + " files registered successfully. " + new Date());
}
try {
Thread.sleep(sleepInterval * 1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}
} else {
countDrafts++;
logger.fine(countDrafts + " out of " + count + " files not yet published");
Expand All @@ -1648,12 +1672,6 @@ public Response registerDataFilesInCollection(@Context ContainerRequestContext c
} catch (Exception e) {
logger.info("Unexpected Exception: " + e.getMessage());
}

try {
Thread.sleep(sleepInterval * 1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}
}

logger.info(countAlreadyRegistered + " out of " + count + " files were already registered. " + new Date());
Expand Down
6 changes: 6 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,12 @@ public Response updateAttribute(@Context ContainerRequestContext crc, @PathParam
break;
*/
case "filePIDsEnabled":
if(!user.isSuperuser()) {
return forbidden("You must be a superuser to change this setting");
}
if(!settingsService.isTrueForKey(SettingsServiceBean.Key.AllowEnablingFilePIDsPerCollection, false)) {
return forbidden("Changing File PID policy per collection is not enabled on this server");
}
collection.setFilePIDsEnabled(parseBooleanOrDie(value));
break;
default:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -358,9 +358,8 @@ private DataFile createPackageDataFile(List<File> files) {
dataset.getLatestVersion().getFileMetadatas().add(fmd);
fmd.setDatasetVersion(dataset.getLatestVersion());

String isFilePIDsEnabled = commandEngine.getContext().settings().getValueForKey(SettingsServiceBean.Key.FilePIDsEnabled, "true"); //default value for file PIDs is 'true'
if ("true".contentEquals( isFilePIDsEnabled )) {

if (commandEngine.getContext().systemConfig().isFilePIDsEnabledForCollection(dataset.getOwner())) {

GlobalIdServiceBean idServiceBean = GlobalIdServiceBean.getBean(packageFile.getProtocol(), commandEngine.getContext());
if (packageFile.getIdentifier() == null || packageFile.getIdentifier().isEmpty()) {
packageFile.setIdentifier(idServiceBean.generateDataFileIdentifier(packageFile));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ public Dataverse execute(CommandContext ctxt) throws CommandException {
if (ctxt.dataverses().findByAlias(created.getAlias()) != null) {
throw new IllegalCommandException("A dataverse with alias " + created.getAlias() + " already exists", this);
}

if(created.getFilePIDsEnabled()!=null && !ctxt.settings().isTrueForKey(SettingsServiceBean.Key.AllowEnablingFilePIDsPerCollection, false)) {
throw new IllegalCommandException("File PIDs cannot be enabled per collection", this);
}

// Save the dataverse
Dataverse managedDv = ctxt.dataverses().save(created);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,11 @@ Whether Harvesting (OAI) service is enabled
* True/false(default) option deciding whether the dataset file table display should include checkboxes
* allowing users to dynamically turn folder and category ordering on/off.
*/
AllowUserManagementOfOrder
AllowUserManagementOfOrder,
/*
* True/false(default) option deciding whether file PIDs can be enabled per collection - using the Dataverse/collection set attribute API call.
*/
AllowEnablingFilePIDsPerCollection
;

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1011,7 +1011,7 @@ public boolean isFilePIDsEnabledForCollection(Dataverse collection) {
// hasn't been explicitly enabled, therefore we presume that it is
// subject to how the registration is configured for the
// entire instance:
return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, true);
return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, false);
}
thisCollection = thisCollection.getOwner();
}
Expand Down
Loading
Loading