[HUDI-7563] Add support to drop index using sql #11951

codope · 2024-09-17T16:03:52Z

Change Logs

Support dropping index through SQL
Add a test to drop sec index and func index (RLI is already covered)
Some renames and refactoring

Impact

Users can now drop index using SQL

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

...ommon/src/main/java/org/apache/hudi/table/action/index/functional/BaseHoodieIndexClient.java

nsivabalan

left few minor comments

nsivabalan

few minor nits

...ommon/src/main/java/org/apache/hudi/table/action/index/functional/BaseHoodieIndexClient.java

yihua · 2024-09-21T03:54:46Z

...spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkIndexClient.java

+  public static HoodieSparkIndexClient getInstance(SparkSession sparkSession) {
    if (_instance == null) {
-      synchronized (HoodieSparkFunctionalIndexClient.class) {
+      synchronized (HoodieSparkIndexClient.class) {
        if (_instance == null) {
-          _instance = new HoodieSparkFunctionalIndexClient(sparkSession);
+          _instance = new HoodieSparkIndexClient(sparkSession);


This logic is a bit weird for concurrency control. We should revisit it later.

This is the usual double-checked locking pattern. Probably you are concerned about memory visibility issues. The _instance could be non-null and yet not fully constructed when the first if (_instance == null) check passes, but before the object is fully initialized. However, note that _instance is declared as volatile which ensures that any writes to _instance are visible to all threads immediately and prevents the reordering of instructions during construction.

yihua · 2024-09-21T04:15:10Z

...spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkIndexClient.java

+    if (!indexExists(metaClient, indexName)) {
+      if (ignoreIfNotExists) {
+        return;
+      } else {
+        throw new HoodieFunctionalIndexException("Index does not exist: " + indexName);
+      }
+    }
+
+    LOG.info("Dropping index {}", indexName);
+    HoodieIndexDefinition indexDefinition = metaClient.getIndexMetadata().get().getIndexDefinitions().get(indexName);
+    try (SparkRDDWriteClient writeClient = HoodieCLIUtils.createHoodieWriteClient(
+        sparkSession, metaClient.getBasePath().toString(), mapAsScalaImmutableMap(buildWriteConfig(metaClient, indexDefinition)), toScalaOption(Option.empty()))) {
+      writeClient.dropIndex(Collections.singletonList(indexName));


For future reference, this logic does not have much specifics to engine itself, so it can be abstracted to the index client by plugging in the engine-specific write client.

Don't we need the engine-specific write client to call the base API BaseHoodieWriteClient.dropIndex?

yihua · 2024-09-21T04:16:19Z

...k-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala

+      HoodieSparkIndexClient.getInstance(sparkSession).drop(metaClient, indexName, ignoreIfNotExists)
+    } catch {
+      case _: IllegalArgumentException =>
+        SecondaryIndexManager.getInstance().drop(metaClient, indexName, ignoreIfNotExists)


Why drop here again?

This is legacy code due to incomplete RFC-52. SecondaryIndexManager was introduced in RFC-52 but it's just a wrapper code and does not really manage any index underneath. From RFC, it's supposed to support index built on third party libraries such as Lucene. But, we have not yet added any support so far. In my opinion, we should remove all that code. Just keeping it here to make some tests pass. If you agree, I can take the cleanup as a followup later.

...asource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala

.../hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala

hudi-bot · 2024-09-21T07:46:35Z

CI report:

e8a0631 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

github-actions bot added the size:M PR with lines of changes in (100, 300] label Sep 17, 2024

codope force-pushed the drop-index-hudi-7563 branch from 5fc8434 to 0ea02b9 Compare September 18, 2024 17:47

nsivabalan self-assigned this Sep 20, 2024

nsivabalan approved these changes Sep 20, 2024

View reviewed changes

...ommon/src/main/java/org/apache/hudi/table/action/index/functional/BaseHoodieIndexClient.java Outdated Show resolved Hide resolved

nsivabalan reviewed Sep 20, 2024

View reviewed changes

nsivabalan requested changes Sep 20, 2024

View reviewed changes

yihua reviewed Sep 21, 2024

View reviewed changes

codope force-pushed the drop-index-hudi-7563 branch from 0ea02b9 to 235756d Compare September 21, 2024 04:05

yihua reviewed Sep 21, 2024

View reviewed changes

codope added 3 commits September 21, 2024 11:47

[HUDI-7563] Add support to drop index using sql

9af485c

Address comments, remove some arg, javadoc

7400bfe

simplify test

e8a0631

codope force-pushed the drop-index-hudi-7563 branch from 235756d to e8a0631 Compare September 21, 2024 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-7563] Add support to drop index using sql #11951

[HUDI-7563] Add support to drop index using sql #11951

codope commented Sep 17, 2024 •

edited

Loading

nsivabalan left a comment

nsivabalan left a comment

yihua Sep 21, 2024

codope Sep 21, 2024 •

edited

Loading

yihua Sep 21, 2024

codope Sep 21, 2024

yihua Sep 21, 2024

codope Sep 21, 2024

hudi-bot commented Sep 21, 2024

[HUDI-7563] Add support to drop index using sql #11951

Are you sure you want to change the base?

[HUDI-7563] Add support to drop index using sql #11951

Conversation

codope commented Sep 17, 2024 • edited Loading

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

nsivabalan left a comment

Choose a reason for hiding this comment

nsivabalan left a comment

Choose a reason for hiding this comment

yihua Sep 21, 2024

Choose a reason for hiding this comment

codope Sep 21, 2024 • edited Loading

Choose a reason for hiding this comment

yihua Sep 21, 2024

Choose a reason for hiding this comment

codope Sep 21, 2024

Choose a reason for hiding this comment

yihua Sep 21, 2024

Choose a reason for hiding this comment

codope Sep 21, 2024

Choose a reason for hiding this comment

hudi-bot commented Sep 21, 2024

CI report:

codope commented Sep 17, 2024 •

edited

Loading

codope Sep 21, 2024 •

edited

Loading