-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Extended schema JSON serde to support collations #3628
base: master
Are you sure you want to change the base?
[Kernel] Extended schema JSON serde to support collations #3628
Conversation
@vkorukanti can you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Add bit more comments on how the collation property is stored. You can look at the method docs in ColumnMapping
(where similar nested level field ids are stored in metadata) for an example docs.
|
||
import java.util.Optional; | ||
|
||
public class CollationIdentifier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc, @since
version and @evolving
tag
import java.util.Optional; | ||
|
||
public class CollationIdentifier { | ||
public static final String PROVIDER_SPARK = "SPARK"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do these constant mean? add some comment?
if (parts.length == 1) { | ||
throw new IllegalArgumentException( | ||
String.format("Invalid collation identifier: %s", identifier)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkArgument(parts.length != 1, String.format("Invalid collation identifier: %s", identifier));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or switch(parts.length) {
case 2:
case 3:
default: throw error
}
return String.format("%s.%s", provider, name); | ||
} | ||
|
||
public static CollationIdentifier fromString(String identifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All public APIs are going to show up in API docs. Please add proper javadoc.
this.provider = provider; | ||
this.name = name; | ||
this.version = version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null checks. Objects.requireNonNull
@@ -16,6 +16,7 @@ | |||
package io.delta.kernel.types; | |||
|
|||
import io.delta.kernel.annotation.Evolving; | |||
import io.delta.kernel.expressions.CollationIdentifier; | |||
|
|||
/** | |||
* The data type representing {@code string} type values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the doc to include collation info.
nestedCollatedFields.addAll( | ||
getNestedCollatedFields(((ArrayType) parent).getElementType(), path + ".element")); | ||
} | ||
return nestedCollatedFields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment, why we don't care about the DataType being StructType (basically we store the collation to the nearest structfield).
@@ -130,6 +132,14 @@ class DataTypeJsonSerDeSuite extends AnyFunSuite { | |||
} | |||
} | |||
|
|||
test("parseDataType: types with collated strings") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some negative tests which cause the parser to fail.
case(json, structType) => | ||
assert(parse(json) == structType) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about serialize tests?
|
||
import org.scalatest.funsuite.AnyFunSuite | ||
|
||
class StructTypeSuite extends AnyFunSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Why not just add the tests in DataTypeJsonSerDeSuite
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, you can roundtrip and test both serialize and deserialize works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. thanks
Which Delta project/connector is this regarding?
Description
Extended serialization and deserialization to support collations in metadata.
How was this patch tested?
Tests added to
DataTypeJsonSerDe.java
andStructTypeSuite.scala
.Does this PR introduce any user-facing changes?
No.