Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add a utils method to getColumnReader with SQLConf #360

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

huaxingao
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Add a Utils method getColumnReader which can passed in SQLConf, so Iceberg can pass in CometConf for ColumnReader

What changes are included in this PR?

How are these changes tested?

DataType type, ColumnDescriptor descriptor, int batchSize, SQLConf conf) {
boolean useDecimal128 = (Boolean) CometConf.COMET_USE_DECIMAL_128().get(conf);
boolean useLazyMaterialization = (Boolean) CometConf.COMET_USE_LAZY_MATERIALIZATION().get(conf);
boolean useLegacyDateTimestamp = (Boolean) CometConf.COMET_USE_DECIMAL_128().get(conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure... Is it correct to use COMET_USE_DECIMAL_128 for useLegacyDateTimestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to have COMET_EXCEPTION_ON_LEGACY_DATE_TIMESTAMP

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this area of code, but replacing a hard-coded constant with a config setting LGTM

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

boolean useDecimal128,
boolean useLazyMaterialization) {
// TODO: support `useLegacyDateTimestamp` for Iceberg
DataType type, ColumnDescriptor descriptor, int batchSize, SQLConf conf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we do not need the old signature? It is ok to change now as we have not made a release yet, but just asking because this is a public method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anybody uses the old signature yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make it harder to make the column reader Spark agnostic as you're adding one more point of dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make it harder to make the column reader Spark agnostic as you're adding one more point of dependency.

+1 for this point. Maybe we can accept an options parameters in this class, and pass all options from SQLConf instead?

public static ColumnReader getColumnReader(
    DataType type, ColumnDescriptor descriptor, Map<String, String> options) {
...
}

In the iceberg-spark side, we can simply call conf.getAllConfs.asJava to this method?

return getColumnReader(
type, descriptor, batchSize, useDecimal128, useLazyMaterialization, true);
type, descriptor, batchSize, useDecimal128, useLazyMaterialization, useLegacyDateTimestamp);
}

public static ColumnReader getColumnReader(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, maybe you can just replace the old getColumnReader.

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, though not a show stopper.

boolean useDecimal128,
boolean useLazyMaterialization) {
// TODO: support `useLegacyDateTimestamp` for Iceberg
DataType type, ColumnDescriptor descriptor, int batchSize, SQLConf conf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make it harder to make the column reader Spark agnostic as you're adding one more point of dependency.

Copy link
Contributor

@snmvaughan snmvaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huaxingao
Copy link
Contributor Author

huaxingao commented May 1, 2024

This will make it harder to make the column reader Spark agnostic as you're adding one more point of dependency.

We need a way for users to config useLazyMaterialization etc, and the iceberg folks don't want to parse any of the CometConf in iceberg code. That's why I need to pass SQLConf instead of useLazyMaterialization etc. Also I think it's more flexible to pass SQLConf. If we need to add more ColumnReader related CometConf in the future, we don't need to change this API any more.

@viirya
Copy link
Member

viirya commented May 1, 2024

Hmm, I think @parthchandra meant that by adding SQLConf, common will depend on Spark. Does it conflict with the direction to make common not Spark dependent?

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants