Skip to content

MarkLogic Content Pump (mlcp) and Gradle

Rob Rudin edited this page Sep 20, 2024 · 20 revisions

The MlcpTask class allows you to invoke MarkLogic's Content Pump tool (mlcp) via a Gradle task.

One benefit of using MlcpTask vs JavaExec is that MlcpTask will use the mlHost/(mlUsername or mlRestAdminUsername)/(mlPassword or mlRestAdminPassword) properties by default, which are defined in the mlAppConfig instance that ml-gradle instantiates in Gradle. Another benefit is you don't need to download mlcp and put the executable in your path - you can run this from anywhere, as all of mlcp's libraries are downloaded via Gradle. That's also handy for something like running mlcp on a Jenkins CI server.

MlcpTask also provides task properties for most of mlcp's command-line arguments. These are just syntactic sugar - since MlcpTask extends JavaExec, you can always pass properties through JavaExec's "args" property.

Note that you don't need to use MlcpTask either to use mlcp - just use JavaExec, and configure all of the command line arguments yourself. In particular, if you are using an MLCP options file to specify arguments for MLCP, the syntactic sugar provided by MlcpTask won't be of any help.

Recommended Gradle version

The behavior of Gradle's JavaExec task changed between Gradle 6.3 and 6.4 such that if you wish to use MlcpTask in ml-gradle 4.3.2 or higher, you must use at least Gradle 6.4.

If you are using Gradle 7.0 or higher, you must use ml-gradle 4.3.1 or higher.

If you are using ml-gradle 4.2.x or older, it is recommended to use at least Gradle 6, but Gradle 5 and possibly Gradle 4 may work as well.

Example

Below is an example of using MlcpTask and pulling in the mlcp dependencies - see the mlcp-project build file for a more complete example, which shows both import and export tasks:

plugins {
  id "com.marklogic.ml-gradle" version "5.0.0"
}

repositories {
  mavenCentral()

  // This MarkLogic-specific repository is only needed for older versions of MLCP. If you receive an error that Gradle 
  // cannot download version "1.5.2-marklogic" of the "commons-csv" dependency, then add this. Otherwise, it can be omitted. 
  // maven { url "https://developer.marklogic.com/maven2/" }
}

configurations {
  mlcp {
    // MLCP 11.1.0 and higher requires this modification.
    attributes {
      attribute(TargetJvmEnvironment.TARGET_JVM_ENVIRONMENT_ATTRIBUTE, objects.named(TargetJvmEnvironment.class, TargetJvmEnvironment.STANDARD_JVM))
    }
  }
}

dependencies {
  mlcp "com.marklogic:mlcp:11.3.0"
}

task sample(type: com.marklogic.gradle.task.MlcpTask) {
  classpath = configurations.mlcp
  command = "IMPORT"
  database = "my-database"
  input_file_path = "my-input-file.txt"
  input_file_type = "delimited_text"
  output_collections = "my-collection"
  // Can also override the default properties
  // username = "some-other-username"
  etc...
}

Adding arguments not present in MlcpTask

If you need to pass any arguments to MLCP that are not yet present as parameters in MlcpTask, simply use the args parameter that MlcpTask inherits from Gradle's JavaExec task:

args = ["-ssl_protocol", "TLSv1.2"]

You can specify any number of arguments this way.

Avoid duplication across MLCP tasks

See Dynamically creating tasks for tips on reducing duplication across many MLCP tasks.

MLCP and logging

When you depend on MLCP via a dependency, you don't get a default logging configuration file like you do in the MLCP zip file. And thus, you won't get any logging from MLCP.

You can fix this by adding the following to your build.gradle file:

dependencies {
    mlcp 'com.marklogic:mlcp:11.3.0'
    mlcp 'ch.qos.logback:logback-classic:1.3.14'
    mlcp files('lib')
}

You can then add a logback.xml file to the ./lib directory to configure MLCP logging - for example:

<configuration>
  <statusListener class="ch.qos.logback.core.status.NopStatusListener"/>
  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>
  <root level="WARN">
    <appender-ref ref="STDOUT"/>
  </root>
  <logger name="com.marklogic" level="INFO" additivity="false">
    <appender-ref ref="STDOUT"/>
  </logger>
</configuration>

Suppressing logging of all MLCP arguments

If you execute an instance of MlcpTask with Gradle's info or debug logging enabled, all of the arguments passed to MLCP - including passwords - will be logged via the JavaExec parent class. To avoid this, choose one of the following options:

  1. Don't use info or debug logging when running an instance of MlcpTask, or any instance of JavaExec where passwords are passed as plaintext.
  2. Use an MLCP options file - in which case you should just use JavaExec so that you do not inherit what will be the unwanted behavior where MlcpTask automatically sets a password based on mlRestAdminPassword.

Note that if neither info or debug logging is enabled, MlcpTask will print all of the non-password arguments passed to it.

Using an MLCP transform

Be aware that MlcpTask defaults to using port 8000. IF you specify a transform parameter in your MlcpTask, then you will need to set the "port" parameter to that of your XDBC server or REST server that supports XDBC requests.

Writing log output to MarkLogic

New in ml-gradle 2.6.0 - you can set the logOutputUri parameter to define a URI for mlcp log output to be written to:

task sample(type: com.marklogic.gradle.task.MlcpTask) {
  ...
  logOutputUri = "/mlcp-output.txt"
}

And new in 3.12.0 - you can provide a custom DatabaseClient to control what database the log output is written to (it defaults to mlAppConfig.newDatabaseClient()):

task sample(type: com.marklogic.gradle.task.MlcpTask) {
  ...
  logOutputUri = "/mlcp-output.txt"
  logClient = mlAppConfig.newModulesDatabaseClient() // Just notional - reference or construct any DatabaseClient you want
}

Suppressing Hadoop binary messages on Windows

When running mlcp via Gradle on Windows, you're likely to see the following message logged:

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

It reads as an exception, but unless you're using certain Hadoop-based features within mlcp, you can safely ignore this. If MLCP is instead throwing an error later on, you likely should use the MLCP standalone distribution instead of using MlcpTask.

You can also suppress the message by performing the following steps:

  1. Create a dummy lib\bin\winutils.exe file in your project
  2. Add the following to your task that extends MlcpTask:
systemProperties = ["hadoop.home.dir" : "$project.rootDir/lib"]
Clone this wiki locally