Skip to content

Latest commit

 

History

History
855 lines (762 loc) · 69 KB

CHANGELOG.md

File metadata and controls

855 lines (762 loc) · 69 KB

Change log

Generated on 2021-09-02

Release 1.2.0

Gazelle Plugin

Features

#394 Support ColumnarArrowEvalPython operator
#368 Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0
#375 Implement a series of datetime functions
#183 Add Date/Timestamp type support
#362 make arrow-unsafe allocator as the default
#343 configurable codegen opt level
#333 Arrow Data Source: CSV format support fix
#223 Add Parquet write support to Arrow data source
#320 Add build option to enable unsafe Arrow allocator
#337 UDF: Add test case for validating basic row-based udf
#326 Update Scala unit test to spark-3.1.1

Performance

#400 Optimize ColumnarToRow Operator in NSE.
#411 enable ccache on C++ code compiling

Bugs Fixed

#358 Running TPC DS all queries with native-sql-engine for 10 rounds will have performance degradation problems in the last few rounds
#481 JVM heap memory leak on memory leak tracker facilities
#436 Fix for Arrow Data Source test suite
#317 persistent memory cache issue
#382 Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc
#384 ColumnarBatchScanExec reading parquet failed on java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#370 Failed to get time zone: NoSuchElementException: None.get
#360 Cannot compile master branch.
#341 build failed on v2 with -Phadoop-3.2

PRs

#489 [NSE-481] JVM heap memory leak on memory leak tracker facilities (Arrow Allocator)
#486 [NSE-475] restore coalescebatches operator before window
#482 [NSE-481] JVM heap memory leak on memory leak tracker facilities
#470 [NSE-469] Lazy Read: Iterator objects are not correctly released
#464 [NSE-460] fix decimal partial sum in 1.2 branch
#439 [NSE-433]Support pre-built Jemalloc
#453 [NSE-254] remove arrow-data-source-common from jar with dependency
#452 [NSE-254]Fix redundant arrow library issue.
#432 [NSE-429] TPC-DS Q14a/b get slowed down within setting spark.oap.sql.columnar.sortmergejoin.lazyread=true
#426 [NSE-207] Fix aggregate and refresh UT test script
#442 [NSE-254]Issue0410 jar size
#441 [NSE-254]Issue0410 jar size
#440 [NSE-254]Solve the redundant arrow library issue
#437 [NSE-436] Fix for Arrow Data Source test suite
#387 [NSE-383] Release SMJ input data immediately after being used
#423 [NSE-417] fix sort spill on inplsace sort
#416 [NSE-207] fix left/right outer join in SMJ
#422 [NSE-421]Disable the wholestagecodegen feature for the ArrowColumnarToRow operator
#369 [NSE-417] Sort spill support framework
#401 [NSE-400] Optimize ColumnarToRow Operator in NSE.
#413 [NSE-411] adding ccache support
#393 [NSE-207] fix scala unit tests
#407 [NSE-403]Add Dataproc integration section to README
#406 [NSE-404]Modify repo name in documents
#402 [NSE-368]Update emr-6.3.0 support
#395 [NSE-394]Support ColumnarArrowEvalPython operator
#346 [NSE-317]fix columnar cache
#392 [NSE-382]Support GCP Dataproc 2.0
#388 [NSE-382]Fix Hadoop version issue
#385 [NSE-384] "Select count(*)" without group by results in error: java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#374 [NSE-207] fix left anti join and support filter wo/ project
#376 [NSE-375] Implement a series of datetime functions
#373 [NSE-183] fix timestamp in native side
#356 [NSE-207] fix issues found in scala unit tests
#371 [NSE-370] Failed to get time zone: NoSuchElementException: None.get
#347 [NSE-183] Add Date/Timestamp type support
#363 [NSE-362] use arrow-unsafe allocator by default
#361 [NSE-273] Spark shim layer infrastructure
#364 [NSE-360] fix ut compile and travis test
#264 [NSE-207] fix issues found from join unit tests
#344 [NSE-343]allow to config codegen opt level
#342 [NSE-341] fix maven build failure
#324 [NSE-223] Add Parquet write support to Arrow data source
#321 [NSE-320] Add build option to enable unsafe Arrow allocator
#299 [NSE-207] fix unsuppored types in aggregate
#338 [NSE-337] UDF: Add test case for validating basic row-based udf
#336 [NSE-333] Arrow Data Source: CSV format support fix
#327 [NSE-326] update scala unit tests to spark-3.1.1

OAP MLlib

Features

#110 Update isOAPEnabled for Kmeans, PCA & ALS
#108 Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading
#93 [GPU] Add GPU support for PCA
#101 [Release] Add version update scripts and improve scripts for examples
#76 Reorganize Spark version specific code structure
#82 [Tests] Add NaiveBayes test and refactors

Bugs Fixed

#119 [SDLe][Klocwork] Security vulnerabilities found by static code scan
#121 Meeting freeing memory issue after the training stage when using Intel-MLlib to run PCA and K-means algorithms.
#122 Cannot run K-means and PCA algorithm with oap-mllib on Google Dataproc
#123 [Core] Improve locality handling for native lib loading
#116 Cannot run ALS algorithm with oap-mllib thanks to the commit "2883d3447d07feb55bf5d4fee8225d74b0b1e2b1"
#114 [Core] Improve native lib loading
#94 Failed to run KMeans workload with oap-mllib in JLSE
#95 Some shared libs are missing in 1.1.1 release
#105 [Core] crash when libfabric version conflict
#98 [SDLe][Klocwork] Security vulnerabilities found by static code scan
#88 [Test] Fix ALS Suite "ALS shuffle cleanup standalone"
#86 [NaiveBayes] Fix isOAPEnabled and add multi-version support

PRs

#124 [ML-123][Core] Improve locality handling for native lib loading
#118 [ML-116] use getOneCCLIPPort and fix lib loading
#115 [ML-114] [Core] Improve native lib loading
#113 [ML-110] Update isOAPEnabled for Kmeans, PCA & ALS
#112 [ML-105][Core] Fix crash when libfabric version conflict
#111 [ML-108] Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading
#104 [ML-93][GPU] Add GPU support for PCA
#103 [ML-98] [Release] Clean Service.java code
#102 [ML-101] [Release] Add version update scripts and improve scripts for examples
#90 [ML-88][Test] Fix ALS Suite "ALS shuffle cleanup standalone"
#87 [ML-86][NaiveBayes] Fix isOAPEnabled and add multi-version support
#83 [ML-82] [Tests] Add NaiveBayes test and refactors
#75 [ML-53] [CPU] Add Linear & Ridge Regression
#77 [ML-76] Reorganize multiple Spark version support code structure
#68 [ML-55] [CPU] Add Naive Bayes
#64 [ML-42] [PIP] Misc improvements and refactor code
#62 [ML-30][Coding Style] Add code style rules & scripts for Scala, Java and C++

SQL DS Cache

Features

#155 reorg to support profile based multi spark version

Bugs Fixed

#190 The function of vmem-cache and guava-cache should not be associated with arrow.
#181 [SDLe]Vulnerabilities scanned by Snyk

PRs

#182 [SQL-DS-CACHE-181][SDLe]Fix Snyk code scan issues
#191 [SQL-DS-CACHE-190]put plasma detector in seperate object to avoid unnecessary dependency of arrow
#189 [SQL-DS-CACHE-188][POAE7-1253] improvement of fallback from plasma cache to simple cache
#157 [SQL-DS-CACHE-155][POAE7-1187]reorg to support profile based multi spark version

PMem Shuffle

Bugs Fixed

#46 Cannot run Terasort with pmem-shuffle of branch-1.2
#43 Rpmp cannot be compiled due to the lack of boost header file.

PRs

#51 [PMEM-SHUFFLE-50] Remove description about download submodules manually since they can be downloaded automatically.
#49 [PMEM-SHUFFLE-48] Fix the bug about mapstatus tracking and add more connections for metastore.
#47 [PMEM-SHUFFLE-46] Fix the bug that off-heap memory is over used in shuffle reduce stage.
#40 [PMEM-SHUFFLE-39] Fix the bug that pmem-shuffle without RPMP fails to pass Terasort benchmark due to latest patch.
#38 [PMEM-SHUFFLE-37] Add start-rpmp.sh and stop-rpmp.sh
#33 [PMEM-SHUFFLE-28]Add RPMP with HA support and integrate it with Spark3.1.1
#27 [PMEM-SHUFFLE] Change artifact name to make it compatible with naming…

Remote Shuffle

Bugs Fixed

#24 Enhance executor memory release

PRs

#25 [REMOTE-SHUFFLE-24] Enhance executor memory release

Release 1.1.1

Native SQL Engine

Features

#304 Upgrade to Arrow 4.0.0
#285 ColumnarWindow: Support Date/Timestamp input in MAX/MIN
#297 Disable incremental compiler in CI
#245 Support columnar rdd cache
#276 Add option to switch Hadoop version
#274 Comment to trigger tpc-h RAM test
#256 CI: do not run ram report for each PR

Bugs Fixed

#325 java.util.ConcurrentModificationException: mutation occurred during iteration
#329 numPartitions are not the same
#318 fix Spark 311 on data source v2
#311 Build reports errors
#302 test on v2 failed due to an exception
#257 different version of slf4j-log4j
#293 Fix BHJ loss if key = 0
#248 arrow dependency must put after arrow installation

PRs

#332 [NSE-325] fix incremental compile issue with 4.5.x scala-maven-plugin
#335 [NSE-329] fix out partitioning in BHJ and SHJ
#328 [NSE-318]check schema before reuse exchange
#307 [NSE-304] Upgrade to Arrow 4.0.0
#312 [NSE-311] Build reports errors
#272 [NSE-273] support spark311
#303 [NSE-302] fix v2 test
#306 [NSE-304] Upgrade to Arrow 4.0.0: Change basic GHA TPC-H test target …
#286 [NSE-285] ColumnarWindow: Support Date input in MAX/MIN
#298 [NSE-297] Disable incremental compiler in GHA CI
#291 [NSE-257] fix multiple slf4j bindings
#294 [NSE-293] fix unsafemap with key = '0'
#233 [NSE-207] fix issues found from aggregate unit tests
#246 [NSE-245]Adding columnar RDD cache support
#289 [NSE-206]Update installation guide and configuration guide.
#277 [NSE-276] Add option to switch Hadoop version
#275 [NSE-274] Comment to trigger tpc-h RAM test
#271 [NSE-196] clean up configs in unit tests
#258 [NSE-257] fix different versions of slf4j-log4j12
#259 [NSE-248] fix arrow dependency order
#249 [NSE-241] fix hashagg result length
#255 [NSE-256] do not run ram report test on each PR

SQL DS Cache

Features

#118 port to Spark 3.1.1

Bugs Fixed

#121 OAP Index creation stuck issue

PRs

#132 Fix SampleBasedStatisticsSuite UnitTest case
#122 [ sql-ds-cache-121] Fix Index stuck issues
#119 [SQL-DS-CACHE-118][POAE7-1130] port sql-ds-cache to Spark3.1.1

OAP MLlib

Features

#26 [PIP] Support Spark 3.0.1 / 3.0.2 and upcoming 3.1.1

PRs

#39 [ML-26] Build for different spark version by -Pprofile

PMem Spill

Features

#34 Support vanilla spark 3.1.1

PRs

#41 [PMEM-SPILL-34][POAE7-1119]Port RDD cache to Spark 3.1.1 as separate module

PMem Common

Features

#10 add -mclflushopt flag to enable clflushopt for gcc
#8 use clflushopt instead of clflush

PRs

#11 [PMEM-COMMON-10][POAE7-1010]Add -mclflushopt flag to enable clflushop…
#9 [PMEM-COMMON-8][POAE7-896]use clflush optimize version for clflush

PMem Shuffle

Features

#15 Doesn't work with Spark3.1.1

PRs

#16 [pmem-shuffle-15] Make pmem-shuffle support Spark3.1.1

Remote Shuffle

Features

#18 upgrade to Spark-3.1.1
#11 Support DAOS Object Async API

PRs

#19 [REMOTE-SHUFFLE-18] upgrade to Spark-3.1.1
#14 [REMOTE-SHUFFLE-11] Support DAOS Object Async API

Release 1.1.0

Native SQL Engine

Features

#261 ArrowDataSource: Add S3 Support
#239 Adopt ARROW-7011
#62 Support Arrow's Build from Source and Package dependency library in the jar
#145 Support decimal in columnar window
#31 Decimal data type support
#128 Support Decimal in Aggregate
#130 Support decimal in project
#134 Update input metrics during reading
#120 Columnar window: Reduce peak memory usage and fix performance issues
#108 Add end-to-end test suite against TPC-DS
#68 Adaptive compression select in Shuffle.
#97 optimize null check in codegen sort
#29 Support mutiple-key sort without codegen
#75 Support HashAggregate in ColumnarWSCG
#73 improve columnar SMJ
#51 Decimal fallback
#38 Supporting expression as join keys in columnar SMJ
#27 Support REUSE exchange when DPP enabled
#17 ColumnarWSCG further optimization

Performance

#194 Arrow Parameters Update when compiling Arrow
#136 upgrade to arrow 3.0
#103 reduce codegen in multiple-key sort
#90 Refine HashAggregate to do everything in CPP

Bugs Fixed

#278 fix arrow dep in 1.1 branch
#265 TPC-DS Q67 failed with memmove exception in native split code.
#280 CMake version check
#241 TPC-DS q67 failed for XXH3_hashLong_64b_withSecret.constprop.0+0x180
#262 q18 has different digits compared with vanilla spark
#196 clean up options for native sql engine
#224 update 3rd party libs
#227 fix vulnerabilities from klockwork
#237 Add ARROW_CSV=ON to default C++ build commands
#229 Fix the deprecated code warning in shuffle_split_test
#119 consolidate batch size
#217 TPC-H query20 result not correct when use decimal dataset
#211 IndexOutOfBoundsException during running TPC-DS Q2
#167 Cannot successfully run q.14a.sql and q14b.sql when using double format for TPC-DS workload.
#191 libarrow.so and libgandiva.so not copy into the tmp directory
#179 Unable to find Arrow headers during build
#153 Fix incorrect queries after enabled Decimal
#173 fix the incorrect result of q69
#48 unit tests for c++ are broken
#101 ColumnarWindow: Remove obsolete debug code
#100 Incorrect result in Q45 w/ v2 bhj threshold is 10MB sf500
#81 Some ArrowVectorWriter implementations doesn't implement setNulls method
#82 Incorrect result in TPCDS Q72 SF1536
#70 Duplicate IsNull check in codegen sort
#64 Memleak in sort when SMJ is disabled
#58 Issues when running tpcds with DPP enabled and AQE disabled
#52 memory leakage in columnar SMJ
#53 Q24a/Q24b SHJ tail task took about 50 secs in SF1500
#42 reduce columnar sort memory footprint
#40 columnar sort codegen fallback to executor side
#1 columnar whole stage codegen failed due to empty results
#23 TPC-DS Q8 failed due to unsupported operation in columnar sortmergejoin
#22 TPC-DS Q95 failed due in columnar wscg
#4 columnar BHJ failed on new memory pool
#5 columnar BHJ failed on partitioned table with prefercolumnar=false

PRs

#288 [NSE-119] clean up on comments
#282 [NSE-280]fix cmake version check
#281 [NSE-280] bump cmake to 3.16
#279 [NSE-278]fix arrow dep in 1.1 branch
#268 [NSE-186] backport to 1.1 branch
#266 [NSE-265] Reserve enough memory before UnsafeAppend in builder
#270 [NSE-261] ArrowDataSource: Add S3 Support
#263 [NSE-262] fix remainer loss in decimal divide
#215 [NSE-196] clean up native sql options
#231 [NSE-176]Arrow install order issue
#242 [NSE-224] update third party code
#240 [NSE-239] Adopt ARROW-7011
#238 [NSE-237] Add ARROW_CSV=ON to default C++ build commands
#230 [NSE-229] Fix the deprecated code warning in shuffle_split_test
#225 [NSE-227]fix issues from codescan
#219 [NSE-217] fix missing decimal check
#212 [NSE-211] IndexOutOfBoundsException during running TPC-DS Q2
#187 [NSE-185] Avoid unnecessary copying when simply projecting on fields
#195 [NSE-194]Turn on several Arrow parameters
#189 [NSE-153] Following NSE-153, optimize fallback conditions for columnar window
#192 [NSE-191]Fix issue0191 for .so file copy to tmp.
#181 [NSE-179]Fix arrow include directory not include when using ARROW_ROOT
#175 [NSE-153] Fix window results
#174 [NSE-173] fix incorrect result of q69
#172 [NSE-62]Fixing issue0062 for package arrow dependencies in jar with refresh2
#171 [NSE-170]improve sort shuffle code
#165 [NSE-161] adding format check
#166 [NSE-130] support decimal round and abs
#164 [NSE-130] fix precision loss in divide w/ decimal type
#159 [NSE-31] fix SMJ divide with decimal
#156 [NSE-130] fix overflow and precision loss
#152 [NSE-86] Merge Arrow Data Source
#154 [NSE-153] Fix incorrect quries after enabled Decimal
#151 [NSE-145] Support decimal in columnar window
#129 [NSE-128]Support Decimal in Aggregate/HashJoin
#131 [NSE-130] support decimal in project
#107 [NSE-136]upgrade to arrow 3.0.0
#135 [NSE-134] Update input metrics during reading
#121 [NSE-120] Columnar window: Reduce peak memory usage and fix performance issues
#112 [NSE-97] optimize null check and refactor sort kernels
#109 [NSE-108] Add end-to-end test suite against TPC-DS
#69 [NSE-68][Shuffle] Adaptive compression select in Shuffle.
#98 [NSE-97] remove isnull when null count is zero
#102 [NSE-101] ColumnarWindow: Remove obsolete debug code
#105 [NSE-100]Fix an incorrect result error when using SHJ in Q45
#91 [NSE-90]Refactor HashAggregateExec and CPP kernels
#79 [NSE-81] add missing setNulls methods in ArrowWritableColumnVector
#44 [NSE-29]adding non-codegen framework for multiple-key sort
#76 [NSE-75]Support ColumnarHashAggregate in ColumnarWSCG
#83 [NSE-82] Fix Q72 SF1536 incorrect result
#72 [NSE-51] add more datatype fallback logic in columnar operators
#60 [NSE-48] fix c++ unit tests
#50 [NSE-45] BHJ memory leak
#74 [NSE-73]using data ref in multiple keys based SMJ
#71 [NSE-70] remove duplicate IsNull check in sort
#65 [NSE-64] fix memleak in sort when SMJ is disabled
#59 [NSE-58]Fix empty input issue when DPP enabled
#7 [OAP-1846][oap-native-sql] add more fallback logic
#57 [NSE-56]ColumnarSMJ: fallback on full outer join
#55 [NSE-52]Columnar SMJ: fix memory leak by closing stream batches properly
#54 [NSE-53]Partial fix Q24a/Q24b tail SHJ task materialization performance issue
#47 [NSE-17]TPCDS Q72 optimization
#39 [NSE-38]ColumnarSMJ: support expression as join keys
#43 [NSE-42] early release sort input
#33 [NSE-32] Use Spark managed spill in columnar shuffle
#41 [NSE-40] fixes driver failing to do sort codege
#28 [NSE-27]Reuse exchage to optimize DPP performance
#36 [NSE-1]fix columnar wscg on empty recordbatch
#24 [NSE-23]fix columnar SMJ fallback
#26 [NSE-22]Fix w/DPP issue when inside wscg smj both sides are smj
#18 [NSE-17] smjwscg optimization:
#3 [NSE-4]fix columnar BHJ on new memory pool
#6 [NSE-5][SCALA] Fix ColumnarBroadcastExchange didn't fallback issue w/ DPP

SQL DS Cache

Features

#36 HCFS doc for Spark
#38 update Plasma dependency for Plasma-based-cache module
#14 Add HCFS module
#17 replace arrow-plasma dependency for hcfs module

Bugs Fixed

#62 Upgrade hadoop dependencies in HCFS

PRs

#83 [SQL-DS-CACHE-82][SDLe]Upgrade Jetty version
#77 [SQL-DS-CACHE-62][POAE7-984] upgrade hadoop version to 3.3.0
#56 [SQL-DS-CACHE-47]Add plasma native get timeout
#37 [SQL-DS-CACHE-36][POAE7-898]HCFS docs for OAP 1.1
#39 [SQL-DS-CACHE-38][POAE7-892]update Plasma dependency
#18 [SQL-DS-CACHE-17][POAE7-905]replace intel-arrow with apache-arrow v3.0.0
#13 [SQL-DS-CACHE-14][POAE7-847] Port HCFS to OAP
#16 [SQL-DS-CACHE-15][POAE7-869]Refactor original code to make it a sub-module

OAP MLlib

Features

#35 Restrict printNumericTable to first 10 eigenvalues with first 20 dimensions
#33 Optimize oneCCL port detecting
#28 Use getifaddrs to get host ips for oneCCL kvs
#12 Improve CI and add pseudo cluster testing
#31 Print time duration for each PCA step
#13 Add ALS with new oneCCL APIs
#18 Auto detect KVS port for oneCCL to avoid port conflict
#10 Porting Kmeans and PCA to new oneCCL API

Bugs Fixed

#43 [Release] Error when installing intel-oneapi-dal-devel-2021.1.1 intel-oneapi-tbb-devel-2021.1.1
#46 [Release] Meet hang issue when running PCA algorithm.
#48 [Release] No performance benefit when using Intel-MLlib to run ALS algorithm.
#25 Fix oneCCL KVS port auto detect and improve logging

PRs

#51 [ML-50] Merge #47 and prepare for OAP 1.1
#49 Revert "[ML-41] Revert to old oneCCL and Prepare for OAP 1.1"
#47 [ML-44] [PIP] Update to oneAPI 2021.2 and Rework examples for validation
#40 [ML-41] Revert to old oneCCL and Prepare for OAP 1.1
#36 [ML-35] Restrict printNumericTable to first 10 eigenvalues with first 20 dimensions
#34 [ML-33] Optimize oneCCL port detecting
#20 [ML-12] Improve CI and add pseudo cluster testing
#32 [ML-31] Print time duration for each PCA step
#14 [ML-13] Add ALS with new oneCCL APIs
#24 [ML-25] Fix oneCCL KVS port auto detect and improve logging
#19 [ML-18] Auto detect KVS port for oneCCL to avoid port conflict

PMem Spill

Bugs Fixed

#22 [SDLe][Snyk]Upgrade Jetty version to fix vulnerability scanned by Snyk
#13 The compiled code failed because the variable name was not changed

PRs

#27 [PMEM-SPILL-22][SDLe]Upgrade Jetty version
#21 [POAE7-961] fix null pointer issue when offheap enabled.
#18 [POAE7-858] disable RDD cache related PMem intialization as default and add PMem related logic in SparkEnv
#19 [PMEM-SPILL-20][POAE7-912]add vanilla SparkEnv.scala for future update
#15 [POAE7-858] port memory extension options to OAP 1.1
#12 Change the variable name so that the passed parameters are correct
#10 Fixing one pmem path on AppDirect mode may cause the pmem initialization path to be empty Path

PMem Shuffle

Features

#7 Enable running in fsdax mode

Bugs Fixed

#10 [pmem-shuffle] There are potential issues reported by Klockwork.

PRs

#13 [PMEM-SHUFFLE-10] Fix potential issues reported by klockwork for branch 1.1.
#6 [PMEM-SHUFFLE-7] enable fsdax mode in pmem-shuffle

Remote Shuffle

Features

#6 refactor shuffle-daos by abstracting shuffle IO for supporting both synchronous and asynchronous DAOS Object API
#4 check-in remote shuffle based on DAOS Object API

Bugs Fixed

#12 [SDLe][Snyk]Upgrade org.mock-server:mockserver-netty to fix vulnerability scanned by Snyk

PRs

#13 [REMOTE-SHUFFLE-12][SDle][Snyk]Upgrade org.mock-server:mockserver-net…
#5 check-in remote shuffle based on DAOS Object API

Release 1.0.0

Features

#1823 [oap-native-sql][doc] Spark Native SQL Engine installation guide is obsolete and thus broken.
#1545 [oap-data-source][arrow] Add metric: output_batches
#1588 [OAP-CACHE] Make Parquet file splitable
#1337 [oap-cacnhe] Discard OAP data format
#1679 [OAP-CACHE]Remove the code related to reading and writing OAP data format
#1680 [OAP-CACHE]Decouple spark code includes FileFormatDataWriter, FileFormatWriter and OutputWriter
#1846 [oap-native-sql] spark sql unit test
#1811 [OAP-cache]provide one-step starting scripts like plasma-sever redis-server
#1519 [oap-native-sql] upgrade cmake
#1873 [oap-native-sql] Columnar shuffle split variable length use UnsafeAppend
#1835 [oap-native-sql] Support ColumnarBHJ to Build and Broadcast HashRelation in driver side
#1848 [OAP-CACHE]Decouple spark code include OneApplicationResource.scala
#1824 [OAP-CACHE]Decouple spark code includes DataSourceScanExec.scala.
#1838 [OAP-CACHE]Decouple spark code includes VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java
#1839 [oap-native-sql] Add prefetch to columnar shuffle split
#1756 [Intel MLlib] Add Kmeans "tolerance" support and test cases
#1818 [OAP-Cache]Make Spark webUI OAP Tab more user friendly
#1831 [oap-native-sql] ColumnarWindow: Support reusing same window spec in multiple functions
#1653 [SQL Data Source Cache]Consistency issue on "enable" and "enabled" configuration
#1765 [oap-native-sql] Support WSCG in nativesql
#1517 [oap-native-sql] implement SortMergeJoin
#1535 [oap-native-sql] Add ColumnarWindowExec
#1654 [oap-native-sql] Columnar shuffle TPCDS enabling
#1700 [oap-native-sql] Support inside join condition project
#1717 [oap-native-sql] support null in columnar literal and subquery
#1704 [oap-native-sql] Add ColumnarUnion and ColumnarExpand
#1647 [oap-native-sql] row to columnar for decimal
#1638 [oap-native-sql] adding full TPC-DS support
#1498 [oap-native-sql] stddev_samp support
#1547 [oap-native-sql] adding metrics for input/output batches

Performance

#1956 [OAP-MLlib]Cannot get 5x performance benefit comparing with vanilla spark.
#1955 [OAP-CACHE] Plasma shows lower performance comparing with vanilla spark.
#2023 [OAP-MLlib] Use oneAPI official release instead of beta versions
#1829 [oap-native-sql] Optimize columnar shuffle and option to use AVX512
#1734 [oap-native-sql] use non-codegen for sort with one key
#1706 [oap-native-sql] Optimize columnar shuffle write

Bugs Fixed

#2054 [OAP-MLlib] Faild run Intel mllib after updating the version of oneapi.
#2012 [SQL Data Source Cache] The task will be suspended when using plasma cache.
#1640 [SQL Data Source Cache] The task will be suspended when using plasma cache and starting 2 executors per worker.
#2028 [OAP-Cache]When using Plasma Spark webUI OAP Tab cache metrics are not right 
#1979 [SDLe][native-sql-engine] Issues from Static Code Analysis with Klocwork need to be fixed
#1938 [oap-native-sql] Stability test failed when running TPCH for 10 rounds.
#1924 [OAP-CACHE] Decouple hearbeat message and use conf to determine whether to report locailty information
#1937 [rpmem-shuffle] Cannot pass q64.sql of TPC-DS when enable RPmem shuffle.
#1951 [SDLe][PMem-Shuffle]Specify Scala version above 2.12.4 in pom.xml
#1921 [SDLe][rpmem-shuffle] The master branch and branch-1.0-spark-3.0 can't pass BDBA analysis with libsqlitejdbc dependency.
#1743 [oap-native-sql] Error not reported when creating CodeGenerator instance
#1864 [oap-native-sql] hash conflict in hashagg
#1934 [oap-native-sql] backport to 1.0
#1929 [oap-native-sql] memleak in non-codegen aggregate
#1907 [OAP-cache]Cannot find the class of redis-client
#1888 [oap-native-sql] Add hash collision check for all HashJoins and hashAggr
#1903 [oap-native-sql] BHJ related UT fix
#1881 [oap-native-sql] Fix split use avx512
#1742 [oap-native-sql] SortArraysToIndicesKernel: incorrect null ordering with multiple sort keys
#1553 [oap-native-sql] TPCH-Q7 fails in throughput tests
#1854 [oap-native-sql] Fix columnar shuffle file not deleted
#1844 [oap-native-sql] Fix columnar shuffle spilled file not deleted
#1580 [oap-native-sql] Hash Collision in multiple keys scenario
#1754 [Intel MLlib] Improve LibLoader creating temp dir name with UUID
#1815 [oap-native-sql] Memory management: Error on task end if there are unclosed child allocators
#1808 [oap-native-sql] ColumnarWindow: Memory leak on converting input/output batches
#1806 [oap-native-sql] Fix Columnar Shuffle Memory Leak
#1783 [oap-native-sql] ColumnarWindow: Rank() returns wrong result when input row number >= 65536
#1776 [oap-native-sql] memory leakage in native code
#1760 [oap-native-sql] fix columnar sorting on string
#1733 [oap-native-sql]TPCH Q18 memory leakage
#1694 [oap-native-sql] TPC-H q15 failed for ConditionedProbeArraysVisitorImpl MakeResultIterator does not support dependency type other than Batch
#1682 [oap-native-sql] fix aggregate without codegen
#1707 [oap-native-sql] Fix collect batch metric
#1642 [oap-native-sql] Support expression key in Join
#1669 [oap-native-sql] TPCH Q1 results is not correct w/ hashagg codegen off
#1629 [oap-native-sql] clean up building steps
#1602 [oap-native-sql] rework copyfromjar function
#1599 [oap-native-sql] Columnar BHJ fail on TPCH-Q15
#1567 [oap-native-sql] Spark thrift-server does not honor LIBARROW_DIR env
#1541 [oap-native-sql] TreeNode children not replaced by columnar operators

PRs

#2056 [OAP-2054][OAP-MLlib] Fix oneDAL libJavaAPI.so packaging for oneAPI 2021.1 production release
#2039 [OAP-2023][OAP-MLlib] Switch to oneAPI 2021.1.1 official release for OAP 1.0
#2043 [OAP-1981][OAP-CACHE][POAE7-617]fix binary cache core dump issue
#2002 [OAP-2001][oap-native-sql]fix coding style
#2035 [OAP-2028][OAP-cache][POAE7-635] Fix set concurrent access bug
#2037 [OAP-1640][OAP-CACHE][POAE7-593]Fix plasma hang due to threshold
#2036 [OAP-1955][OAP-CACHE][POAE7-660]preferLocation low hit rate fix master branch
#2013 [OAP-CACHE][POAE7-628]port missing commits from branch 0.8/0.9
#2015 [OAP-2016] fix klocwork issues in oap-common/oap-spark
#2022 [OAP-1980][rpmem-shuffle] Fix Klockwork issues for spark3.x version
#2011 [OAP-2010][oap-native-sql] Add abs support in wscg
#1996 [OAP-1998][oap-native-sql] Add support to do numa binding for Columnar Operations
#2004 [OAP-2012][OAP-CACHE][POAE7-635]bug fix: plasma hang - use java thread-safe set
#1988 [OAP-1983][oap-native-sql] Fix Q38 and Q87 when unsafeRow contains null
#1976 [OAP-1983][oap-native-sql] Fix hashCheck performance issue
#1970 [OAP-1947][oap-native-sql][C++] reduce sort kernel memory footprint
#1961 [OAP-1924][OAP-CACHE]Decouple hearbeat message and use conf to determine whether to report locailty information for branch branch-1.0-spark-3.x
#1982 [OAP-1981][OAP-CACHE][POAE7-617]Bug fix binary docache
#1952 [OAP-1951][PMem-Shuffle][SDLe]Specify Scala version in pom.xml
#1919 [OAP-1918][OAP-CACHE][POAE7-563]bug fix: plasma get an invalid value
#1589 [OAP-1588][OAP-CACHE][POAE7-363] Make Parquet splitable
#1954 [OAP-1884][OAP-dev]Small fix for arrow build in prepare_oap_env.sh.
#1933 [OAP-1934][oap-native-sql]Backport NativeSQL code to 1.0
#1889 [OAP-1888][oap-native-sql]Add hash collision check for all HashJoins and hashAggr
#1904 [OAP-1903][oap-native-sql] Fix Local Mode BHJ related UT fail issue
#1916 [OAP-1846][oap-native-sql] clean up travis test
#1923 [OAP-1921][rpmem-shuffle] For BDBA analysis to exclude unused library
#1890 [OAP-1846][oap-native-sql] add script for running unit test
#1905 [OAP-1813][POAE7-555] [OAP-CACHE] package redis related dependency
#1908 [OAP-1884][OAP-dev]Add cxx-compiler in oap conda recipes for native-sql.
#1901 [OAP-1884][OAP-dev]Add c-compiler in oap conda recipes for native-sql.
#1895 [OAP-1884][OAP-dev] Checkout arrow branch in case arrow in other branch
#1876 [OAP-1875]Generating changelog automatically for new releases
#1812 [OAP-1811][OAP-cache][POAE7-486]add sbin folder
#1882 [OAP-1881][oap-native-sql] Fix split use avx512
#1847 [OAP-1846][oap-native-sql] add unit tests from spark to native sql
#1836 [OAP-1835][oap-native-sql] Support ColumnarBHJ to build and broadcast hashrelation
#1885 [OAP-1884][OAP-dev]Add oap-mllib to parent pom and fix error when git clone oneccl.
#1868 [OAP-1653][OAP-Cache]Modify enabled and enable compatibility check
#1853 [OAP-1852][oap-native-sql] Memory Management: Use Arrow C++ memory po…
#1859 [OAP-1858][OAP-cache][POAE7-518] Decouple FilePartition.scala
#1857 [OAP-1833][oap-native-sql] Fix HashAggr hasNext won't stop issue
#1855 [OAP-1854][oap-native-sql] Fix columnar shuffle file not deleted
#1840 [OAP-1839][oap-native-sql] Add prefetch to columnar shuffle split
#1843 [OAP-1842][OAP-dev]Add arrow conda build action job.
#1849 [OAP-1848][SQL Data Source Cache] Decouple OneApplicationResource.scala
#1837 [OAP-1838][SQL Data Source Cache] Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java.
#1757 [OAP-1756][Intel MLlib] Add Kmeans "tolerance" support and test cases
#1845 [OAP-1844][oap-native-sql] Fix columnar shuffle spilled file not deleted
#1827 [OAP-1818][SQL-Data-Source-Cache]Modify Spark webUI OAP Tab expressio…
#1832 [OAP-1831][oap-native-sql] ColumnarWindow: Support reusing same windo…
#1834 [OAP-1833][oap-native-sql][Scala] fix CoalesceBatchs after HashAgg
#1830 [OAP-1829][oap-native-sql] Optimize columnar shuffle and option to use AVX-512
#1803 [OAP-1751][oap-native-sql]fix sort on TPC-DS
#1755 [OAP-1754][Intel MLlib] Improve LibLoader creating temp dir name with UUID
#1826 [OAP-1825] disable pmemblk test
#1802 [OAP-1653][OAP-Cache]Keep consistency on 'enabled' of OapConf configu…
#1810 [OAP-1771]Fix README for Arrow Data Source
#1816 [OAP-1815][oap-native-sql] Memory management: Error on task end if th…
#1809 [OAP-1808][oap-native-sql] ColumnarWindow: Memory leak on converting input/output batches
#1467 [OAP-1457][oap-native-sql] Reserve Spark off-heap execution memory after buffer allocation
#1807 [OAP-1806][oap-native-sql] Fix Columnar Shuffle Memory Leak
#1788 [OAP-1765][oap-native-sql] Fix for dropped CoalecseBatches before ColumnarBroadcastExchange
#1799 [OAP-CACHE][OAP-1690][POAE7-430] Cache backend fall back detect bug fix branch master
#1744 [OAP-CACHE][OAP-1748][POAE7-462] Enable externalDB to store CacheMetaInfo branch master
#1787 [OAP-1786][oap-native-sql] ColumnarWindow: Avoid unnecessary mem copies
#1773 [POAE7-471]Handle oap-common build issue about PMemKV
#1782 [OAP-1631]Update compile scripts from 0.9
#1785 [OAP-1765][oap-native-sql] Support WSCG for nativesql(PART 2)
#1781 [OAP-1765][oap-native-sql] fix codegen for SMJ and HashAgg
#1775 [OAP-1776][oap-native-sql]fix sort memleak
#1766 [OAP-1765][oap-native-sql] Support WSCG for nativesql and use non-codegen join for remainings
#1774 [OAP-1631]Add prepare_oap_env.sh.
#1769 [OAP-1768][POAE7-163][OAP-SPARK] Integrate block manager with chunk api
#1763 [OAP-1759][oap-native-sql] ColumnarWindow: Add execution metrics
#1656 [OAP-1517][oap-native-sql] Improve SortMergeJoin Part2
#1761 [oap-native-sql] quick fix sort on string by fallback to row
#1536 [OAP-1535][oap-native-sql] Add ColumnarWindowExec
#1735 [OAP-1734][oap-native-sql]use non-codegen for sort with single key
#1747 [OAP-1741][rpmem-shuffle]To make java side load native library from jar directly
#1725 [OAP-1727][POAE7-358] Spark integration: Memory Spill to PMem
#1738 [OAP-1733][oap-native-sql][Scala] fix mem leak
#1701 [OAP-1700][oap-native-sql] support join-inside condition project
#1736 [oap-1727][POAE7-358] Add native spark files for memory spill module
#1719 [oap-common][POAE7-347]Stream API for PMem storage store
#1723 [OAP-1679][OAP-CACHE] Remove the code related to reading and writing OAP data format
#1716 [OAP-1717][oap-native-sql] support null in columnar literal and subquery
#1713 [OAP-1712] [OAP-SPARK] Remove file change list from dev directory
#1711 [OAP-1694][oap-native-sql][Scala] fix hash join w/ empty batch
#1710 [OAP-1706][oap-native-sql] Optimize shuffle write
#1705 [OAP-1704][oap-native-sql] Support ColumnarUnion and ColumnarExpand
#1683 [OAP-1682][oap-native-sql] fix aggregate without codegen
#1708 [OAP-1707][oap-native-sql] Fix collect batch metric
#1675 [OAP-1651][oap-native-sql] Adding fallback rules for join/shuffle
#1674 [OAP-1673][oap-native-sql] Adding native double round function
#1632 [OAP-1631][Doc] Add Commit Message Requirements
#1672 [OAP-1610][Intel-MLlib]Upgrade the mahout-hdfs to version 14.1
#1641 [OAP-1651][OAP-1642][oap-native-sql] support TPCDS w/ AQE
#1670 [OAP-1669][oap-native-sql] use distinct ordinal list
#1655 [OAP-1654][oap-native-sql]Columnar shuffle tpcds enabling
#1630 [OAP-1629][oap-native-sql] clean up building scripts
#1601 [OAP-1602][oap-native-sql][Java] fix exract resource from jar
#1639 [OAP-1638][oap-native-sql] tpcds enabling (part2)
#1586 [OAP-1587][oap-native-sql] tpcds enabling (part1)
#1600 [oap-1599][oap-native-sql][Scala] fix broadcasthashjoin
#1555 [OAP-1541][oap-native-sql] TreeNode children not replaced by columnar…
#1546 [OAP-1547][oap-native-sql][Scala] Adding metrics for input/output batches
#1472 [OAP-1466] [RDD Cache] [POAE-354] Initialize pmem with AppDirect and KMemDax mode in block manager

Release 0.8.4

Features

#1865 [OAP-CACHE]Decouple spark code include DataSourceScanExec.scala, OneApplicationResource.scala, Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java for OAP-0.8.4.
#1813 [OAP-cache] package redis client jar into oap-cache

Bugs Fixed

#2044 [OAP-CACHE] Build error due to synchronizedSet on branch 0.8
#2027 [oap-shuffle] Should load native library from jar directly
#1981 [OAP-CACHE] Error runing q32 binary cache
#1980 [SDLe][RPMem-Shuffle]Issues from Static Code Analysis with Klocwork need to be fixed
#1918 [OAP-CACHE] Plasma throw exception:get an invalid value- branch 0.8

PRs

#2045 [OAP-2044][OAP-CACHE]bug fix: build error due to synchronizedSet
#2031 [OAP-1955][OAP-CACHE][POAE7-667]preferLocation low hit rate fix branch 0.8
#2029 [OAP-2027][rpmem-shuffle] Load native libraries from jar
#2018 [OAP-1980][SDLe][rpmem-shuffle] Fix potential risk issues reported by Klockwork
#1920 [OAP-1924][OAP-CACHE]Decouple hearbeat message and use conf to determine whether to report locailty information
#1949 [OAP-1948][rpmem-shuffle] Fix several vulnerabilities reported by BDBA
#1900 [OAP-1680][OAP-CACHE] Decouple FileFormatDataWriter, FileFormatWriter and OutputWriter
#1899 [OAP-1679][OAP-CACHE] Remove the code related to reading and writing OAP data format (#1723)
#1897 [OAP-1884][OAP-dev] Update memkind version and copy arrow plasma jar to conda package build path
#1883 [OAP-1568][OAP-CACHE] Cleanup Oap data format read/write related test cases
#1863 [OAP-1865][SQL Data Source Cache]Decouple spark code include DataSourceScanExec.scala, OneApplicationResource.scala, Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java for OAP-0.8.4.
#1841 [OAP-1579][OAP-cache]Fix web UI to show cache size
#1814 [OAP-cache][OAP-1813][POAE7-481]package redis client related dependency
#1790 [OAP-CACHE][OAP-1690][POAE7-430] Cache backend fallback bugfix
#1740 [OAP-CACHE][OAP-1748][POAE7-453]Enable externalDB to store CacheMetaInfo branch 0.8
#1731 [OAP-CACHE] [OAP-1730] [POAE-428] Add OAP cache runtime enable