Skip to content
rjrudin edited this page Aug 15, 2019 · 9 revisions

New in 3.3.0, ml-gradle provides several tasks for easily exporting any set of documents in MarkLogic to either a single file/zip or many files/zips. These tasks make use of the Data Movement SDK and the DMSDK Jobs provided by the ml-javaclient-util library.

The tasks provided by 3.3.0 are:

  • mlExportToFile
  • mlExportToZip
  • mlExportBatchesToDirectory
  • mlExportBatchesToZips

Note that if none of these tasks support your requirements for exporting data, you can likely use the Data Movement SDK by itself to write a program that will meet your requirements.

Each of the tasks for exporting data can be configured via several properties. To see the available properties for any task, just run the task with "-PjobProperties" (no value needed) - for example:

gradle mlExportToFile -PjobProperties

Exporting data to a file

All documents selected by a query can be exported to a single file via mlExportToFile:

gradle mlExportToFile -PexportPath=export.xml -PwhereCollections=example

This task is similar to the other DMSDK Tasks in that a "where" property is required to specify the documents to export. "whereCollections", "whereUriPattern", and "whereUrisQuery" are the current supported properties, e.g.:

gradle mlExportToFile -PwhereUriPattern=*.xml
gradle mlExportToFile -PwhereUrisQuery="cts:element-value-query(xs:QName('hello'), 'world')"

This export capability is simply wrapping existing DMSDK functionality, specifically the ExportToWriterListener class. So you can utilize some of the properties on that class, e.g.:

gradle mlExportToFile -PrecordPrefix="<wrapper>" -PrecordSuffix="</wrapper>" -PwhereCollections=example

You can also specify content to be written to the beginning and end of the file:

gradle mlExportToFile -PwhereCollections=example -PfileHeader="<results>" -PfileFooter="</results>"

Exporting JSON documents to a file

In version 3.9.0 of ml-gradle, you can write JSON documents to a valid JSON array by using the new "omitLastRecordSuffix" property:

gradle mlExportToFile -PwhereCollections=some-json-documents -PfileHeader="[" -PfileFooter="]" -PrecordSuffix="," -PomitLastRecordSuffix=true

This will result in a comma being written after every JSON document except for the last one, thus resulting in a valid JSON array.

Exporting data to CSV

With mlExportToFile, you can reference a REST API transform, which enables exporting data to CSV - i.e. write a transform that converts a document to the exact CSV that you want (and of course you can load that transform with ml-gradle):

gradle mlExportToFile -Ptransform=my-csv-transform -PwhereCollections=example

You can use ml-gradle to stub out that transform first:

gradle mlCreateTransform -PtransformName=my-csv-transform -PtransformType=sjs|xqy|xsl

Of course, the REST API transform can produce any content that you want.

Exporting data to a zip

All documents selected by a query can be exported to a single zip via mlExportToZip:

gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example

Like exporting to a file, you can also apply a transform on each document:

gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example -Ptransform=my-transform

Each URI as is used for creating a zip entry for each document. The URI can be "flattened" - i.e. everything up to and including the last "/" will be dropped:

gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example -PflattenUri=true

You can also provide a prefix on each zip entry:

gradle mlExportToZip -PexportPath=export.zip -PwhereCollections=example -PuriPrefix=/my-prefix

Exporting batches to a directory

Instead of writing all documents to a single file or zip, you can export each batch as a separate file to a given directory:

gradle mlExportBatchesToDirectory -PexportPath=/path/to/batches -PwhereCollections=example

The "batchSize" property controls the number of documents processed at one time by the task, and thus controls the number of documents written to each file.

The following properties supported by mlExportToFile are also supported by mlExportBatchesToDirectory:

  • fileHeader
  • fileFooter
  • recordPrefix
  • recordSuffix
  • transform

In addition, you can customize the name of each file that's written per batch - "filenamePrefix" defaults to "batch-" and "filenameExtension" defaults to ".xml":

gradle mlExportBatchesToDirectory -PfilenamePrefix=my-batch -PfilenameExtension=.json -PexportPath=/path/to/batches -PwhereCollections=example

Exporting batches to zips

Another option for exporting batches is to write each one to a zip:

gradle mlExportBatchesToZips -PexportPath=/path/to/zips -PwhereCollections=example

The following properties supported by mlExportToZip are also supported by mlExportBatchesToZips:

  • flattenUri
  • transform
  • uriPrefix

And you can set "filenamePrefix" (defaults to "batch-") and "filenameExtension" (defaults to ".zip") just like you can for mlExportBatchesToDirectory:

gradle mlExportBatchesToZips -PfilenamePrefix=my-zip- -PfilenameExtension=.jar -PexportPath=/path/to/zips -PwhereCollections=example
Clone this wiki locally