Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPC truncation test case for StructArray #2083

Merged
merged 1 commit into from
Jul 15, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions arrow/src/ipc/writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1784,4 +1784,52 @@ mod tests {

assert_eq!(record_batch_slice, deserialized_batch);
}

#[test]
fn truncate_ipc_struct_array() {
fn create_batch() -> RecordBatch {
let strings: StringArray = [Some("foo"), None, Some("bar"), Some("baz")]
.into_iter()
.collect();
let ints: Int32Array =
[Some(0), Some(2), None, Some(1)].into_iter().collect();

let struct_array = StructArray::from(vec![
(
Field::new("s", DataType::Utf8, true),
Arc::new(strings) as ArrayRef,
),
(
Field::new("c", DataType::Int32, false),
Arc::new(ints) as ArrayRef,
),
]);

let schema = Schema::new(vec![Field::new(
"struct_array",
struct_array.data_type().clone(),
true,
)]);

RecordBatch::try_new(Arc::new(schema), vec![Arc::new(struct_array)]).unwrap()
}

let record_batch = create_batch();
let record_batch_slice = record_batch.slice(1, 2);
Copy link
Contributor

@tustvold tustvold Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently an odd hack in ArrayData::slice for StructArray that pushes down the offset to the child arrays. You will need to manually construct the ArrayData for the StructArray in order to properly test the case of a StructArray with a non-zero offset. See #1750 for more information

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #2085

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayData::Slice contains a special case for StructArray where it recurses the offset into its children. However, it preserves the offset on the parent ArrayData, in order for the validity buffer to work correctly.

I do see there is an special case, but not understand "it preserves the offset on the parent ArrayData".

let new_offset = self.offset + offset;
let new_data = ArrayData {
  ...
  offset: new_offset,
  ...
};

It updates new ArrayData with new offset, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it also applies the offset to the child data, which is the topic of that ticket. We have partially pushed down the offset, which at best is extremely confusing, but is likely just plain incorrect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are basically concerned about "This makes it unclear if an offset should or should not be applied to the child_array". As the updated offset is applied on both parent array and child arrays, it is somehow obscure that whether we should take parent's offset when working with child arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely 👍

let deserialized_batch = deserialize(serialize(&record_batch_slice));

assert!(serialize(&record_batch).len() > serialize(&record_batch_slice).len());

let structs = deserialized_batch
.column(0)
.as_any()
.downcast_ref::<StructArray>()
.unwrap();

assert!(structs.column(0).is_null(0));
assert!(structs.column(0).is_valid(1));
assert!(structs.column(1).is_valid(0));
assert!(structs.column(1).is_null(1));
assert_eq!(record_batch_slice, deserialized_batch);
}
}