Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector #6962

Merged
merged 5 commits into from
Aug 29, 2024

Conversation

jinchengchenghh
Copy link
Contributor

No description provided.

Copy link

#6961

@github-actions github-actions bot added the VELOX label Aug 21, 2024
throw new UnsupportedOperationException();
}
// Arrow not need to setNotNull, set the valus is enough.
void setNotNull(int rowId) {}
Copy link
Member

@zhztheplayer zhztheplayer Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason of the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use MutableProjection, it will call MutableColumnarBatchRow, we can construct MutableColumnarBatchRow by ArrowWritableColumnVector. https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java#L286

Copy link
Member

@zhztheplayer zhztheplayer Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is abstract and the method should be overridable by impl classes.

See

as example.

And abstract class ArrowVectorWriter doesn't likely have to be abstract. Could make it an interface as it almost doesn't override anything from the hierarchy. (needs more design)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it has member ValueVector,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class should not be abstract since its subclass will not implement the functions.

Copy link
Member

@zhztheplayer zhztheplayer Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we route putDecimal calls to writer.setDecimal? I am not that familiar with the design here but looks to be a manner we should follow

Copy link
Contributor Author

@jinchengchenghh jinchengchenghh Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writer does not have the setDecimal function, I copied most code from https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L429 but is different in putByteArray, because Arrow DecimalVector does not support put byte array range.
Put the all bytes is final function in Spark.

public abstract int putByteArray(int rowId, byte[] value, int offset, int count);
  public final int putByteArray(int rowId, byte[] value) {
    return putByteArray(rowId, value, 0, value.length);
  }

I will add a test for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will override putByteArray but check if it put all bytes, I will refactor it.

Comment on lines 40 to 48
def checkSchema(schema: StructType): Boolean = {
try {
SparkSchemaUtil.toArrowSchema(schema)
true
} catch {
case _: Exception =>
false
}
}
Copy link
Member

@zhztheplayer zhztheplayer Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find here will throw exception, https://github.com/apache/incubator-gluten/blob/main/gluten-data/src/main/scala/org/apache/spark/sql/utils/SparkArrowUtil.scala#L54

But it will not run to here actually, I will drop the change.

@jinchengchenghh
Copy link
Contributor Author

Anymore comments? @zhztheplayer

@zhztheplayer zhztheplayer changed the title [GLUTEN-6961] [VL] [feat] ArrowWritableColumnVector support write to decimal [GLUTEN-6961][VL][feat] ArrowWritableColumnVector support write to decimal Aug 29, 2024
Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh Would you help fill in the PR description? Thanks.

@zhztheplayer
Copy link
Member

Anymore comments? @zhztheplayer

Just some nits, I updated the PR to save some time. Thanks.

@zhztheplayer zhztheplayer changed the title [GLUTEN-6961][VL][feat] ArrowWritableColumnVector support write to decimal [GLUTEN-6961][VL][feat] Add decimal write support for ArrowWritableColumnVector Aug 29, 2024
@zhztheplayer zhztheplayer merged commit e7caab1 into apache:main Aug 29, 2024
44 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_master_08_29_2024_time.csv log/native_master_08_28_2024_3928dc2526_time.csv difference percentage
q1 13.88 14.02 0.140 101.01%
q2 13.25 13.32 0.077 100.58%
q3 4.36 3.86 -0.491 88.73%
q4 72.35 72.05 -0.301 99.58%
q5 8.05 9.26 1.213 115.07%
q6 2.25 2.36 0.110 104.91%
q7 6.53 6.72 0.188 102.88%
q8 4.87 5.08 0.215 104.43%
q9 24.26 24.18 -0.079 99.67%
q10 10.56 9.64 -0.918 91.30%
q11 39.02 39.48 0.460 101.18%
q12 1.53 1.40 -0.134 91.24%
q13 6.50 6.61 0.118 101.81%
q14a 47.24 46.08 -1.162 97.54%
q14b 43.24 42.22 -1.020 97.64%
q15 2.78 2.78 0.006 100.23%
q16 46.45 46.08 -0.371 99.20%
q17 5.50 5.39 -0.111 97.99%
q18 8.00 6.91 -1.090 86.37%
q19 2.42 2.11 -0.310 87.22%
q20 1.60 1.51 -0.089 94.46%
q21 1.24 1.10 -0.140 88.66%
q22 7.98 7.71 -0.275 96.56%
q23a 104.75 103.66 -1.089 98.96%
q23b 128.72 128.88 0.164 100.13%
q24a 111.28 109.30 -1.977 98.22%
q24b 111.05 107.88 -3.171 97.14%
q25 4.11 4.23 0.114 102.77%
q26 3.02 3.24 0.221 107.31%
q27 4.20 4.02 -0.185 95.60%
q28 33.64 33.79 0.153 100.46%
q29 11.19 11.05 -0.139 98.75%
q30 5.93 4.94 -0.993 83.26%
q31 7.94 7.50 -0.440 94.46%
q32 1.17 1.45 0.285 124.40%
q33 4.30 4.51 0.212 104.93%
q34 3.81 4.11 0.301 107.91%
q35 7.89 8.32 0.432 105.48%
q36 4.73 4.74 0.010 100.22%
q37 4.73 5.87 1.142 124.15%
q38 13.59 14.50 0.911 106.70%
q39a 3.44 3.27 -0.166 95.17%
q39b 3.06 2.83 -0.225 92.65%
q40 3.84 3.95 0.107 102.79%
q41 0.64 0.70 0.062 109.68%
q42 1.02 0.97 -0.056 94.56%
q43 4.84 4.83 -0.003 99.93%
q44 9.99 9.98 -0.011 99.89%
q45 3.23 3.28 0.053 101.63%
q46 3.82 3.74 -0.081 97.89%
q47 18.93 19.36 0.426 102.25%
q48 5.46 5.37 -0.092 98.32%
q49 9.03 9.27 0.243 102.69%
q50 21.58 21.79 0.207 100.96%
q51 9.97 9.73 -0.237 97.63%
q52 1.13 1.06 -0.071 93.69%
q53 2.44 2.44 -0.005 99.78%
q54 3.99 3.98 -0.012 99.70%
q55 1.10 1.10 -0.002 99.81%
q56 4.35 4.15 -0.193 95.55%
q57 11.07 10.86 -0.205 98.15%
q58 2.51 2.39 -0.118 95.31%
q59 11.12 11.44 0.323 102.90%
q60 4.10 4.10 -0.002 99.96%
q61 4.19 4.12 -0.063 98.50%
q62 4.89 4.59 -0.301 93.84%
q63 2.50 2.35 -0.153 93.90%
q64 60.58 62.31 1.723 102.84%
q65 17.29 17.60 0.314 101.82%
q66 4.06 5.74 1.672 141.17%
q67 408.72 385.75 -22.971 94.38%
q68 3.79 3.55 -0.242 93.62%
q69 5.56 5.32 -0.241 95.66%
q70 12.10 11.37 -0.724 94.01%
q71 2.41 2.30 -0.116 95.19%
q72 217.90 212.84 -5.057 97.68%
q73 2.46 2.31 -0.147 94.01%
q74 24.09 24.25 0.156 100.65%
q75 27.30 26.53 -0.769 97.18%
q76 11.60 11.79 0.186 101.60%
q77 2.44 2.32 -0.121 95.04%
q78 50.23 49.67 -0.558 98.89%
q79 4.08 3.94 -0.149 96.36%
q80 12.35 12.33 -0.023 99.82%
q81 4.88 4.98 0.103 102.12%
q82 7.29 7.79 0.500 106.85%
q83 1.69 1.74 0.052 103.06%
q84 2.86 2.82 -0.032 98.89%
q85 7.12 8.36 1.241 117.45%
q86 4.25 4.09 -0.161 96.20%
q87 15.68 14.21 -1.469 90.63%
q88 20.60 21.80 1.203 105.84%
q89 3.72 3.51 -0.205 94.50%
q90 3.07 3.58 0.505 116.45%
q91 2.21 2.43 0.221 110.01%
q92 1.36 1.31 -0.051 96.23%
q93 39.22 39.93 0.707 101.80%
q94 24.32 24.47 0.150 100.62%
q9 89.61 85.89 -3.720 95.85%
q5 2.80 2.65 -0.153 94.54%
q96 17.49 17.68 0.193 101.10%
q97 1.89 2.01 0.119 106.30%
q98 10.03 10.52 0.491 104.90%
q99 10.03 10.52 0.491 104.90%
total 2219.17 2183.21 -35.958 98.38%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_08_29_2024_time.csv log/native_master_08_28_2024_3928dc2526_time.csv difference percentage
q1 39.06 40.45 1.391 103.56%
q2 30.61 30.31 -0.301 99.02%
q3 53.88 53.48 -0.396 99.26%
q4 41.59 41.72 0.130 100.31%
q5 104.66 102.35 -2.314 97.79%
q6 11.08 12.91 1.836 116.57%
q7 118.69 116.14 -2.552 97.85%
q8 115.51 117.06 1.556 101.35%
q9 168.74 168.17 -0.565 99.67%
q10 64.93 63.46 -1.470 97.74%
q11 27.73 26.85 -0.881 96.82%
q12 31.17 30.31 -0.860 97.24%
q13 52.16 52.28 0.119 100.23%
q14 27.25 25.88 -1.378 94.95%
q15 54.32 56.08 1.760 103.24%
q16 18.16 18.31 0.153 100.84%
q17 130.29 133.77 3.483 102.67%
q18 199.00 199.77 0.763 100.38%
q19 30.88 24.78 -6.099 80.25%
q20 43.14 42.75 -0.396 99.08%
q21 387.80 390.58 2.778 100.72%
q22 17.22 15.25 -1.977 88.52%
total 1767.88 1762.66 -5.220 99.70%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants