-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7563] Add support to drop index using sql #11951
base: master
Are you sure you want to change the base?
Conversation
5fc8434
to
0ea02b9
Compare
...ommon/src/main/java/org/apache/hudi/table/action/index/functional/BaseHoodieIndexClient.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left few minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few minor nits
...ommon/src/main/java/org/apache/hudi/table/action/index/functional/BaseHoodieIndexClient.java
Outdated
Show resolved
Hide resolved
...ommon/src/main/java/org/apache/hudi/table/action/index/functional/BaseHoodieIndexClient.java
Outdated
Show resolved
Hide resolved
public static HoodieSparkIndexClient getInstance(SparkSession sparkSession) { | ||
if (_instance == null) { | ||
synchronized (HoodieSparkFunctionalIndexClient.class) { | ||
synchronized (HoodieSparkIndexClient.class) { | ||
if (_instance == null) { | ||
_instance = new HoodieSparkFunctionalIndexClient(sparkSession); | ||
_instance = new HoodieSparkIndexClient(sparkSession); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is a bit weird for concurrency control. We should revisit it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the usual double-checked locking pattern. Probably you are concerned about memory visibility issues. The _instance
could be non-null and yet not fully constructed when the first if (_instance == null)
check passes, but before the object is fully initialized. However, note that _instance
is declared as volatile
which ensures that any writes to _instance
are visible to all threads immediately and prevents the reordering of instructions during construction.
0ea02b9
to
235756d
Compare
if (!indexExists(metaClient, indexName)) { | ||
if (ignoreIfNotExists) { | ||
return; | ||
} else { | ||
throw new HoodieFunctionalIndexException("Index does not exist: " + indexName); | ||
} | ||
} | ||
|
||
LOG.info("Dropping index {}", indexName); | ||
HoodieIndexDefinition indexDefinition = metaClient.getIndexMetadata().get().getIndexDefinitions().get(indexName); | ||
try (SparkRDDWriteClient writeClient = HoodieCLIUtils.createHoodieWriteClient( | ||
sparkSession, metaClient.getBasePath().toString(), mapAsScalaImmutableMap(buildWriteConfig(metaClient, indexDefinition)), toScalaOption(Option.empty()))) { | ||
writeClient.dropIndex(Collections.singletonList(indexName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, this logic does not have much specifics to engine itself, so it can be abstracted to the index client by plugging in the engine-specific write client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need the engine-specific write client to call the base API BaseHoodieWriteClient.dropIndex
?
HoodieSparkIndexClient.getInstance(sparkSession).drop(metaClient, indexName, ignoreIfNotExists) | ||
} catch { | ||
case _: IllegalArgumentException => | ||
SecondaryIndexManager.getInstance().drop(metaClient, indexName, ignoreIfNotExists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why drop here again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is legacy code due to incomplete RFC-52. SecondaryIndexManager
was introduced in RFC-52 but it's just a wrapper code and does not really manage any index underneath. From RFC, it's supposed to support index built on third party libraries such as Lucene. But, we have not yet added any support so far. In my opinion, we should remove all that code. Just keeping it here to make some tests pass. If you agree, I can take the cleanup as a followup later.
...asource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala
Outdated
Show resolved
Hide resolved
.../hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala
Outdated
Show resolved
Hide resolved
.../hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala
Outdated
Show resolved
Hide resolved
235756d
to
e8a0631
Compare
Change Logs
Impact
Users can now drop index using SQL
Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist