Change log

Generated on 2021-09-02

Release 1.2.0

Gazelle Plugin

Features


#394	Support ColumnarArrowEvalPython operator
#368	Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0
#375	Implement a series of datetime functions
#183	Add Date/Timestamp type support
#362	make arrow-unsafe allocator as the default
#343	configurable codegen opt level
#333	Arrow Data Source: CSV format support fix
#223	Add Parquet write support to Arrow data source
#320	Add build option to enable unsafe Arrow allocator
#337	UDF: Add test case for validating basic row-based udf
#326	Update Scala unit test to spark-3.1.1

Performance


#400	Optimize ColumnarToRow Operator in NSE.
#411	enable ccache on C++ code compiling

Bugs Fixed


#358	Running TPC DS all queries with native-sql-engine for 10 rounds will have performance degradation problems in the last few rounds
#481	JVM heap memory leak on memory leak tracker facilities
#436	Fix for Arrow Data Source test suite
#317	persistent memory cache issue
#382	Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc
#384	ColumnarBatchScanExec reading parquet failed on java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#370	Failed to get time zone: NoSuchElementException: None.get
#360	Cannot compile master branch.
#341	build failed on v2 with -Phadoop-3.2

PRs


#489	[NSE-481] JVM heap memory leak on memory leak tracker facilities (Arrow Allocator)
#486	[NSE-475] restore coalescebatches operator before window
#482	[NSE-481] JVM heap memory leak on memory leak tracker facilities
#470	[NSE-469] Lazy Read: Iterator objects are not correctly released
#464	[NSE-460] fix decimal partial sum in 1.2 branch
#439	[NSE-433]Support pre-built Jemalloc
#453	[NSE-254] remove arrow-data-source-common from jar with dependency
#452	[NSE-254]Fix redundant arrow library issue.
#432	[NSE-429] TPC-DS Q14a/b get slowed down within setting spark.oap.sql.columnar.sortmergejoin.lazyread=true
#426	[NSE-207] Fix aggregate and refresh UT test script
#442	[NSE-254]Issue0410 jar size
#441	[NSE-254]Issue0410 jar size
#440	[NSE-254]Solve the redundant arrow library issue
#437	[NSE-436] Fix for Arrow Data Source test suite
#387	[NSE-383] Release SMJ input data immediately after being used
#423	[NSE-417] fix sort spill on inplsace sort
#416	[NSE-207] fix left/right outer join in SMJ
#422	[NSE-421]Disable the wholestagecodegen feature for the ArrowColumnarToRow operator
#369	[NSE-417] Sort spill support framework
#401	[NSE-400] Optimize ColumnarToRow Operator in NSE.
#413	[NSE-411] adding ccache support
#393	[NSE-207] fix scala unit tests
#407	[NSE-403]Add Dataproc integration section to README
#406	[NSE-404]Modify repo name in documents
#402	[NSE-368]Update emr-6.3.0 support
#395	[NSE-394]Support ColumnarArrowEvalPython operator
#346	[NSE-317]fix columnar cache
#392	[NSE-382]Support GCP Dataproc 2.0
#388	[NSE-382]Fix Hadoop version issue
#385	[NSE-384] "Select count(*)" without group by results in error: java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#374	[NSE-207] fix left anti join and support filter wo/ project
#376	[NSE-375] Implement a series of datetime functions
#373	[NSE-183] fix timestamp in native side
#356	[NSE-207] fix issues found in scala unit tests
#371	[NSE-370] Failed to get time zone: NoSuchElementException: None.get
#347	[NSE-183] Add Date/Timestamp type support
#363	[NSE-362] use arrow-unsafe allocator by default
#361	[NSE-273] Spark shim layer infrastructure
#364	[NSE-360] fix ut compile and travis test
#264	[NSE-207] fix issues found from join unit tests
#344	[NSE-343]allow to config codegen opt level
#342	[NSE-341] fix maven build failure
#324	[NSE-223] Add Parquet write support to Arrow data source
#321	[NSE-320] Add build option to enable unsafe Arrow allocator
#299	[NSE-207] fix unsuppored types in aggregate
#338	[NSE-337] UDF: Add test case for validating basic row-based udf
#336	[NSE-333] Arrow Data Source: CSV format support fix
#327	[NSE-326] update scala unit tests to spark-3.1.1

OAP MLlib

Features


#110	Update isOAPEnabled for Kmeans, PCA & ALS
#108	Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading
#93	[GPU] Add GPU support for PCA
#101	[Release] Add version update scripts and improve scripts for examples
#76	Reorganize Spark version specific code structure
#82	[Tests] Add NaiveBayes test and refactors

Bugs Fixed


#119	[SDLe][Klocwork] Security vulnerabilities found by static code scan
#121	Meeting freeing memory issue after the training stage when using Intel-MLlib to run PCA and K-means algorithms.
#122	Cannot run K-means and PCA algorithm with oap-mllib on Google Dataproc
#123	[Core] Improve locality handling for native lib loading
#116	Cannot run ALS algorithm with oap-mllib thanks to the commit "2883d3447d07feb55bf5d4fee8225d74b0b1e2b1"
#114	[Core] Improve native lib loading
#94	Failed to run KMeans workload with oap-mllib in JLSE
#95	Some shared libs are missing in 1.1.1 release
#105	[Core] crash when libfabric version conflict
#98	[SDLe][Klocwork] Security vulnerabilities found by static code scan
#88	[Test] Fix ALS Suite "ALS shuffle cleanup standalone"
#86	[NaiveBayes] Fix isOAPEnabled and add multi-version support

PRs


#124	[ML-123][Core] Improve locality handling for native lib loading
#118	[ML-116] use getOneCCLIPPort and fix lib loading
#115	[ML-114] [Core] Improve native lib loading
#113	[ML-110] Update isOAPEnabled for Kmeans, PCA & ALS
#112	[ML-105][Core] Fix crash when libfabric version conflict
#111	[ML-108] Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading
#104	[ML-93][GPU] Add GPU support for PCA
#103	[ML-98] [Release] Clean Service.java code
#102	[ML-101] [Release] Add version update scripts and improve scripts for examples
#90	[ML-88][Test] Fix ALS Suite "ALS shuffle cleanup standalone"
#87	[ML-86][NaiveBayes] Fix isOAPEnabled and add multi-version support
#83	[ML-82] [Tests] Add NaiveBayes test and refactors
#75	[ML-53] [CPU] Add Linear & Ridge Regression
#77	[ML-76] Reorganize multiple Spark version support code structure
#68	[ML-55] [CPU] Add Naive Bayes
#64	[ML-42] [PIP] Misc improvements and refactor code
#62	[ML-30][Coding Style] Add code style rules & scripts for Scala, Java and C++

SQL DS Cache

Features


#155	reorg to support profile based multi spark version

Bugs Fixed


#190	The function of vmem-cache and guava-cache should not be associated with arrow.
#181	[SDLe]Vulnerabilities scanned by Snyk

PRs


#182	[SQL-DS-CACHE-181][SDLe]Fix Snyk code scan issues
#191	[SQL-DS-CACHE-190]put plasma detector in seperate object to avoid unnecessary dependency of arrow
#189	[SQL-DS-CACHE-188][POAE7-1253] improvement of fallback from plasma cache to simple cache
#157	[SQL-DS-CACHE-155][POAE7-1187]reorg to support profile based multi spark version

PMem Shuffle

Bugs Fixed


#46	Cannot run Terasort with pmem-shuffle of branch-1.2
#43	Rpmp cannot be compiled due to the lack of boost header file.

PRs


#51	[PMEM-SHUFFLE-50] Remove description about download submodules manually since they can be downloaded automatically.
#49	[PMEM-SHUFFLE-48] Fix the bug about mapstatus tracking and add more connections for metastore.
#47	[PMEM-SHUFFLE-46] Fix the bug that off-heap memory is over used in shuffle reduce stage.
#40	[PMEM-SHUFFLE-39] Fix the bug that pmem-shuffle without RPMP fails to pass Terasort benchmark due to latest patch.
#38	[PMEM-SHUFFLE-37] Add start-rpmp.sh and stop-rpmp.sh
#33	[PMEM-SHUFFLE-28]Add RPMP with HA support and integrate it with Spark3.1.1
#27	[PMEM-SHUFFLE] Change artifact name to make it compatible with naming…

Remote Shuffle

Bugs Fixed


#24	Enhance executor memory release

PRs


#25	[REMOTE-SHUFFLE-24] Enhance executor memory release

Release 1.1.1

Native SQL Engine

Features


#304	Upgrade to Arrow 4.0.0
#285	ColumnarWindow: Support Date/Timestamp input in MAX/MIN
#297	Disable incremental compiler in CI
#245	Support columnar rdd cache
#276	Add option to switch Hadoop version
#274	Comment to trigger tpc-h RAM test
#256	CI: do not run ram report for each PR

Bugs Fixed


#325	java.util.ConcurrentModificationException: mutation occurred during iteration
#329	numPartitions are not the same
#318	fix Spark 311 on data source v2
#311	Build reports errors
#302	test on v2 failed due to an exception
#257	different version of slf4j-log4j
#293	Fix BHJ loss if key = 0
#248	arrow dependency must put after arrow installation

PRs


#332	[NSE-325] fix incremental compile issue with 4.5.x scala-maven-plugin
#335	[NSE-329] fix out partitioning in BHJ and SHJ
#328	[NSE-318]check schema before reuse exchange
#307	[NSE-304] Upgrade to Arrow 4.0.0
#312	[NSE-311] Build reports errors
#272	[NSE-273] support spark311
#303	[NSE-302] fix v2 test
#306	[NSE-304] Upgrade to Arrow 4.0.0: Change basic GHA TPC-H test target …
#286	[NSE-285] ColumnarWindow: Support Date input in MAX/MIN
#298	[NSE-297] Disable incremental compiler in GHA CI
#291	[NSE-257] fix multiple slf4j bindings
#294	[NSE-293] fix unsafemap with key = '0'
#233	[NSE-207] fix issues found from aggregate unit tests
#246	[NSE-245]Adding columnar RDD cache support
#289	[NSE-206]Update installation guide and configuration guide.
#277	[NSE-276] Add option to switch Hadoop version
#275	[NSE-274] Comment to trigger tpc-h RAM test
#271	[NSE-196] clean up configs in unit tests
#258	[NSE-257] fix different versions of slf4j-log4j12
#259	[NSE-248] fix arrow dependency order
#249	[NSE-241] fix hashagg result length
#255	[NSE-256] do not run ram report test on each PR

SQL DS Cache

Features


#118	port to Spark 3.1.1

Bugs Fixed


#121	OAP Index creation stuck issue

PRs


#132	Fix SampleBasedStatisticsSuite UnitTest case
#122	[ sql-ds-cache-121] Fix Index stuck issues
#119	[SQL-DS-CACHE-118][POAE7-1130] port sql-ds-cache to Spark3.1.1

OAP MLlib

Features


#26	[PIP] Support Spark 3.0.1 / 3.0.2 and upcoming 3.1.1

PRs


#39	[ML-26] Build for different spark version by -Pprofile

PMem Spill

Features


#34	Support vanilla spark 3.1.1

PRs


#41	[PMEM-SPILL-34][POAE7-1119]Port RDD cache to Spark 3.1.1 as separate module

PMem Common

Features


#10	add -mclflushopt flag to enable clflushopt for gcc
#8	use clflushopt instead of clflush

PRs


#11	[PMEM-COMMON-10][POAE7-1010]Add -mclflushopt flag to enable clflushop…
#9	[PMEM-COMMON-8][POAE7-896]use clflush optimize version for clflush

PMem Shuffle

Features


#15	Doesn't work with Spark3.1.1

PRs


#16	[pmem-shuffle-15] Make pmem-shuffle support Spark3.1.1

Remote Shuffle

Features


#18	upgrade to Spark-3.1.1
#11	Support DAOS Object Async API

PRs


#19	[REMOTE-SHUFFLE-18] upgrade to Spark-3.1.1
#14	[REMOTE-SHUFFLE-11] Support DAOS Object Async API

Release 1.1.0

Native SQL Engine

Features


#261	ArrowDataSource: Add S3 Support
#239	Adopt ARROW-7011
#62	Support Arrow's Build from Source and Package dependency library in the jar
#145	Support decimal in columnar window
#31	Decimal data type support
#128	Support Decimal in Aggregate
#130	Support decimal in project
#134	Update input metrics during reading
#120	Columnar window: Reduce peak memory usage and fix performance issues
#108	Add end-to-end test suite against TPC-DS
#68	Adaptive compression select in Shuffle.
#97	optimize null check in codegen sort
#29	Support mutiple-key sort without codegen
#75	Support HashAggregate in ColumnarWSCG
#73	improve columnar SMJ
#51	Decimal fallback
#38	Supporting expression as join keys in columnar SMJ
#27	Support REUSE exchange when DPP enabled
#17	ColumnarWSCG further optimization

Performance


#194	Arrow Parameters Update when compiling Arrow
#136	upgrade to arrow 3.0
#103	reduce codegen in multiple-key sort
#90	Refine HashAggregate to do everything in CPP

Bugs Fixed


#278	fix arrow dep in 1.1 branch
#265	TPC-DS Q67 failed with memmove exception in native split code.
#280	CMake version check
#241	TPC-DS q67 failed for XXH3_hashLong_64b_withSecret.constprop.0+0x180
#262	q18 has different digits compared with vanilla spark
#196	clean up options for native sql engine
#224	update 3rd party libs
#227	fix vulnerabilities from klockwork
#237	Add ARROW_CSV=ON to default C++ build commands
#229	Fix the deprecated code warning in shuffle_split_test
#119	consolidate batch size
#217	TPC-H query20 result not correct when use decimal dataset
#211	IndexOutOfBoundsException during running TPC-DS Q2
#167	Cannot successfully run q.14a.sql and q14b.sql when using double format for TPC-DS workload.
#191	libarrow.so and libgandiva.so not copy into the tmp directory
#179	Unable to find Arrow headers during build
#153	Fix incorrect queries after enabled Decimal
#173	fix the incorrect result of q69
#48	unit tests for c++ are broken
#101	ColumnarWindow: Remove obsolete debug code
#100	Incorrect result in Q45 w/ v2 bhj threshold is 10MB sf500
#81	Some ArrowVectorWriter implementations doesn't implement setNulls method
#82	Incorrect result in TPCDS Q72 SF1536
#70	Duplicate IsNull check in codegen sort
#64	Memleak in sort when SMJ is disabled
#58	Issues when running tpcds with DPP enabled and AQE disabled
#52	memory leakage in columnar SMJ
#53	Q24a/Q24b SHJ tail task took about 50 secs in SF1500
#42	reduce columnar sort memory footprint
#40	columnar sort codegen fallback to executor side
#1	columnar whole stage codegen failed due to empty results
#23	TPC-DS Q8 failed due to unsupported operation in columnar sortmergejoin
#22	TPC-DS Q95 failed due in columnar wscg
#4	columnar BHJ failed on new memory pool
#5	columnar BHJ failed on partitioned table with prefercolumnar=false

PRs


#288	[NSE-119] clean up on comments
#282	[NSE-280]fix cmake version check
#281	[NSE-280] bump cmake to 3.16
#279	[NSE-278]fix arrow dep in 1.1 branch
#268	[NSE-186] backport to 1.1 branch
#266	[NSE-265] Reserve enough memory before UnsafeAppend in builder
#270	[NSE-261] ArrowDataSource: Add S3 Support
#263	[NSE-262] fix remainer loss in decimal divide
#215	[NSE-196] clean up native sql options
#231	[NSE-176]Arrow install order issue
#242	[NSE-224] update third party code
#240	[NSE-239] Adopt ARROW-7011
#238	[NSE-237] Add ARROW_CSV=ON to default C++ build commands
#230	[NSE-229] Fix the deprecated code warning in shuffle_split_test
#225	[NSE-227]fix issues from codescan
#219	[NSE-217] fix missing decimal check
#212	[NSE-211] IndexOutOfBoundsException during running TPC-DS Q2
#187	[NSE-185] Avoid unnecessary copying when simply projecting on fields
#195	[NSE-194]Turn on several Arrow parameters
#189	[NSE-153] Following NSE-153, optimize fallback conditions for columnar window
#192	[NSE-191]Fix issue0191 for .so file copy to tmp.
#181	[NSE-179]Fix arrow include directory not include when using ARROW_ROOT
#175	[NSE-153] Fix window results
#174	[NSE-173] fix incorrect result of q69
#172	[NSE-62]Fixing issue0062 for package arrow dependencies in jar with refresh2
#171	[NSE-170]improve sort shuffle code
#165	[NSE-161] adding format check
#166	[NSE-130] support decimal round and abs
#164	[NSE-130] fix precision loss in divide w/ decimal type
#159	[NSE-31] fix SMJ divide with decimal
#156	[NSE-130] fix overflow and precision loss
#152	[NSE-86] Merge Arrow Data Source
#154	[NSE-153] Fix incorrect quries after enabled Decimal
#151	[NSE-145] Support decimal in columnar window
#129	[NSE-128]Support Decimal in Aggregate/HashJoin
#131	[NSE-130] support decimal in project
#107	[NSE-136]upgrade to arrow 3.0.0
#135	[NSE-134] Update input metrics during reading
#121	[NSE-120] Columnar window: Reduce peak memory usage and fix performance issues
#112	[NSE-97] optimize null check and refactor sort kernels
#109	[NSE-108] Add end-to-end test suite against TPC-DS
#69	[NSE-68][Shuffle] Adaptive compression select in Shuffle.
#98	[NSE-97] remove isnull when null count is zero
#102	[NSE-101] ColumnarWindow: Remove obsolete debug code
#105	[NSE-100]Fix an incorrect result error when using SHJ in Q45
#91	[NSE-90]Refactor HashAggregateExec and CPP kernels
#79	[NSE-81] add missing setNulls methods in ArrowWritableColumnVector
#44	[NSE-29]adding non-codegen framework for multiple-key sort
#76	[NSE-75]Support ColumnarHashAggregate in ColumnarWSCG
#83	[NSE-82] Fix Q72 SF1536 incorrect result
#72	[NSE-51] add more datatype fallback logic in columnar operators
#60	[NSE-48] fix c++ unit tests
#50	[NSE-45] BHJ memory leak
#74	[NSE-73]using data ref in multiple keys based SMJ
#71	[NSE-70] remove duplicate IsNull check in sort
#65	[NSE-64] fix memleak in sort when SMJ is disabled
#59	[NSE-58]Fix empty input issue when DPP enabled
#7	[OAP-1846][oap-native-sql] add more fallback logic
#57	[NSE-56]ColumnarSMJ: fallback on full outer join
#55	[NSE-52]Columnar SMJ: fix memory leak by closing stream batches properly
#54	[NSE-53]Partial fix Q24a/Q24b tail SHJ task materialization performance issue
#47	[NSE-17]TPCDS Q72 optimization
#39	[NSE-38]ColumnarSMJ: support expression as join keys
#43	[NSE-42] early release sort input
#33	[NSE-32] Use Spark managed spill in columnar shuffle
#41	[NSE-40] fixes driver failing to do sort codege
#28	[NSE-27]Reuse exchage to optimize DPP performance
#36	[NSE-1]fix columnar wscg on empty recordbatch
#24	[NSE-23]fix columnar SMJ fallback
#26	[NSE-22]Fix w/DPP issue when inside wscg smj both sides are smj
#18	[NSE-17] smjwscg optimization:
#3	[NSE-4]fix columnar BHJ on new memory pool
#6	[NSE-5][SCALA] Fix ColumnarBroadcastExchange didn't fallback issue w/ DPP

SQL DS Cache

Features


#36	HCFS doc for Spark
#38	update Plasma dependency for Plasma-based-cache module
#14	Add HCFS module
#17	replace arrow-plasma dependency for hcfs module

Bugs Fixed


#62	Upgrade hadoop dependencies in HCFS

PRs


#83	[SQL-DS-CACHE-82][SDLe]Upgrade Jetty version
#77	[SQL-DS-CACHE-62][POAE7-984] upgrade hadoop version to 3.3.0
#56	[SQL-DS-CACHE-47]Add plasma native get timeout
#37	[SQL-DS-CACHE-36][POAE7-898]HCFS docs for OAP 1.1
#39	[SQL-DS-CACHE-38][POAE7-892]update Plasma dependency
#18	[SQL-DS-CACHE-17][POAE7-905]replace intel-arrow with apache-arrow v3.0.0
#13	[SQL-DS-CACHE-14][POAE7-847] Port HCFS to OAP
#16	[SQL-DS-CACHE-15][POAE7-869]Refactor original code to make it a sub-module

OAP MLlib

Features


#35	Restrict printNumericTable to first 10 eigenvalues with first 20 dimensions
#33	Optimize oneCCL port detecting
#28	Use getifaddrs to get host ips for oneCCL kvs
#12	Improve CI and add pseudo cluster testing
#31	Print time duration for each PCA step
#13	Add ALS with new oneCCL APIs
#18	Auto detect KVS port for oneCCL to avoid port conflict
#10	Porting Kmeans and PCA to new oneCCL API

Bugs Fixed


#43	[Release] Error when installing intel-oneapi-dal-devel-2021.1.1 intel-oneapi-tbb-devel-2021.1.1
#46	[Release] Meet hang issue when running PCA algorithm.
#48	[Release] No performance benefit when using Intel-MLlib to run ALS algorithm.
#25	Fix oneCCL KVS port auto detect and improve logging

PRs


#51	[ML-50] Merge #47 and prepare for OAP 1.1
#49	Revert "[ML-41] Revert to old oneCCL and Prepare for OAP 1.1"
#47	[ML-44] [PIP] Update to oneAPI 2021.2 and Rework examples for validation
#40	[ML-41] Revert to old oneCCL and Prepare for OAP 1.1
#36	[ML-35] Restrict printNumericTable to first 10 eigenvalues with first 20 dimensions
#34	[ML-33] Optimize oneCCL port detecting
#20	[ML-12] Improve CI and add pseudo cluster testing
#32	[ML-31] Print time duration for each PCA step
#14	[ML-13] Add ALS with new oneCCL APIs
#24	[ML-25] Fix oneCCL KVS port auto detect and improve logging
#19	[ML-18] Auto detect KVS port for oneCCL to avoid port conflict

PMem Spill

Bugs Fixed


#22	[SDLe][Snyk]Upgrade Jetty version to fix vulnerability scanned by Snyk
#13	The compiled code failed because the variable name was not changed

PRs


#27	[PMEM-SPILL-22][SDLe]Upgrade Jetty version
#21	[POAE7-961] fix null pointer issue when offheap enabled.
#18	[POAE7-858] disable RDD cache related PMem intialization as default and add PMem related logic in SparkEnv
#19	[PMEM-SPILL-20][POAE7-912]add vanilla SparkEnv.scala for future update
#15	[POAE7-858] port memory extension options to OAP 1.1
#12	Change the variable name so that the passed parameters are correct
#10	Fixing one pmem path on AppDirect mode may cause the pmem initialization path to be empty Path

PMem Shuffle

Features


#7	Enable running in fsdax mode

Bugs Fixed


#10	[pmem-shuffle] There are potential issues reported by Klockwork.

PRs


#13	[PMEM-SHUFFLE-10] Fix potential issues reported by klockwork for branch 1.1.
#6	[PMEM-SHUFFLE-7] enable fsdax mode in pmem-shuffle

Remote Shuffle

Features


#6	refactor shuffle-daos by abstracting shuffle IO for supporting both synchronous and asynchronous DAOS Object API
#4	check-in remote shuffle based on DAOS Object API

Bugs Fixed


#12	[SDLe][Snyk]Upgrade org.mock-server:mockserver-netty to fix vulnerability scanned by Snyk

PRs


#13	[REMOTE-SHUFFLE-12][SDle][Snyk]Upgrade org.mock-server:mockserver-net…
#5	check-in remote shuffle based on DAOS Object API

Release 1.0.0

Features


#1823	[oap-native-sql][doc] Spark Native SQL Engine installation guide is obsolete and thus broken.
#1545	[oap-data-source][arrow] Add metric: output_batches
#1588	[OAP-CACHE] Make Parquet file splitable
#1337	[oap-cacnhe] Discard OAP data format
#1679	[OAP-CACHE]Remove the code related to reading and writing OAP data format
#1680	[OAP-CACHE]Decouple spark code includes FileFormatDataWriter, FileFormatWriter and OutputWriter
#1846	[oap-native-sql] spark sql unit test
#1811	[OAP-cache]provide one-step starting scripts like plasma-sever redis-server
#1519	[oap-native-sql] upgrade cmake
#1873	[oap-native-sql] Columnar shuffle split variable length use UnsafeAppend
#1835	[oap-native-sql] Support ColumnarBHJ to Build and Broadcast HashRelation in driver side
#1848	[OAP-CACHE]Decouple spark code include OneApplicationResource.scala
#1824	[OAP-CACHE]Decouple spark code includes DataSourceScanExec.scala.
#1838	[OAP-CACHE]Decouple spark code includes VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java
#1839	[oap-native-sql] Add prefetch to columnar shuffle split
#1756	[Intel MLlib] Add Kmeans "tolerance" support and test cases
#1818	[OAP-Cache]Make Spark webUI OAP Tab more user friendly
#1831	[oap-native-sql] ColumnarWindow: Support reusing same window spec in multiple functions
#1653	[SQL Data Source Cache]Consistency issue on "enable" and "enabled" configuration
#1765	[oap-native-sql] Support WSCG in nativesql
#1517	[oap-native-sql] implement SortMergeJoin
#1535	[oap-native-sql] Add ColumnarWindowExec
#1654	[oap-native-sql] Columnar shuffle TPCDS enabling
#1700	[oap-native-sql] Support inside join condition project
#1717	[oap-native-sql] support null in columnar literal and subquery
#1704	[oap-native-sql] Add ColumnarUnion and ColumnarExpand
#1647	[oap-native-sql] row to columnar for decimal
#1638	[oap-native-sql] adding full TPC-DS support
#1498	[oap-native-sql] stddev_samp support
#1547	[oap-native-sql] adding metrics for input/output batches

Performance


#1956	[OAP-MLlib]Cannot get 5x performance benefit comparing with vanilla spark.
#1955	[OAP-CACHE] Plasma shows lower performance comparing with vanilla spark.
#2023	[OAP-MLlib] Use oneAPI official release instead of beta versions
#1829	[oap-native-sql] Optimize columnar shuffle and option to use AVX512
#1734	[oap-native-sql] use non-codegen for sort with one key
#1706	[oap-native-sql] Optimize columnar shuffle write

Bugs Fixed


#2054	[OAP-MLlib] Faild run Intel mllib after updating the version of oneapi.
#2012	[SQL Data Source Cache] The task will be suspended when using plasma cache.
#1640	[SQL Data Source Cache] The task will be suspended when using plasma cache and starting 2 executors per worker.
#2028	[OAP-Cache]When using Plasma Spark webUI OAP Tab cache metrics are not right
#1979	[SDLe][native-sql-engine] Issues from Static Code Analysis with Klocwork need to be fixed
#1938	[oap-native-sql] Stability test failed when running TPCH for 10 rounds.
#1924	[OAP-CACHE] Decouple hearbeat message and use conf to determine whether to report locailty information
#1937	[rpmem-shuffle] Cannot pass q64.sql of TPC-DS when enable RPmem shuffle.
#1951	[SDLe][PMem-Shuffle]Specify Scala version above 2.12.4 in pom.xml
#1921	[SDLe][rpmem-shuffle] The master branch and branch-1.0-spark-3.0 can't pass BDBA analysis with libsqlitejdbc dependency.
#1743	[oap-native-sql] Error not reported when creating CodeGenerator instance
#1864	[oap-native-sql] hash conflict in hashagg
#1934	[oap-native-sql] backport to 1.0
#1929	[oap-native-sql] memleak in non-codegen aggregate
#1907	[OAP-cache]Cannot find the class of redis-client
#1888	[oap-native-sql] Add hash collision check for all HashJoins and hashAggr
#1903	[oap-native-sql] BHJ related UT fix
#1881	[oap-native-sql] Fix split use avx512
#1742	[oap-native-sql] SortArraysToIndicesKernel: incorrect null ordering with multiple sort keys
#1553	[oap-native-sql] TPCH-Q7 fails in throughput tests
#1854	[oap-native-sql] Fix columnar shuffle file not deleted
#1844	[oap-native-sql] Fix columnar shuffle spilled file not deleted
#1580	[oap-native-sql] Hash Collision in multiple keys scenario
#1754	[Intel MLlib] Improve LibLoader creating temp dir name with UUID
#1815	[oap-native-sql] Memory management: Error on task end if there are unclosed child allocators
#1808	[oap-native-sql] ColumnarWindow: Memory leak on converting input/output batches
#1806	[oap-native-sql] Fix Columnar Shuffle Memory Leak
#1783	[oap-native-sql] ColumnarWindow: Rank() returns wrong result when input row number >= 65536
#1776	[oap-native-sql] memory leakage in native code
#1760	[oap-native-sql] fix columnar sorting on string
#1733	[oap-native-sql]TPCH Q18 memory leakage
#1694	[oap-native-sql] TPC-H q15 failed for ConditionedProbeArraysVisitorImpl MakeResultIterator does not support dependency type other than Batch
#1682	[oap-native-sql] fix aggregate without codegen
#1707	[oap-native-sql] Fix collect batch metric
#1642	[oap-native-sql] Support expression key in Join
#1669	[oap-native-sql] TPCH Q1 results is not correct w/ hashagg codegen off
#1629	[oap-native-sql] clean up building steps
#1602	[oap-native-sql] rework copyfromjar function
#1599	[oap-native-sql] Columnar BHJ fail on TPCH-Q15
#1567	[oap-native-sql] Spark thrift-server does not honor LIBARROW_DIR env
#1541	[oap-native-sql] TreeNode children not replaced by columnar operators

PRs


#2056	[OAP-2054][OAP-MLlib] Fix oneDAL libJavaAPI.so packaging for oneAPI 2021.1 production release
#2039	[OAP-2023][OAP-MLlib] Switch to oneAPI 2021.1.1 official release for OAP 1.0
#2043	[OAP-1981][OAP-CACHE][POAE7-617]fix binary cache core dump issue
#2002	[OAP-2001][oap-native-sql]fix coding style
#2035	[OAP-2028][OAP-cache][POAE7-635] Fix set concurrent access bug
#2037	[OAP-1640][OAP-CACHE][POAE7-593]Fix plasma hang due to threshold
#2036	[OAP-1955][OAP-CACHE][POAE7-660]preferLocation low hit rate fix master branch
#2013	[OAP-CACHE][POAE7-628]port missing commits from branch 0.8/0.9
#2015	[OAP-2016] fix klocwork issues in oap-common/oap-spark
#2022	[OAP-1980][rpmem-shuffle] Fix Klockwork issues for spark3.x version
#2011	[OAP-2010][oap-native-sql] Add abs support in wscg
#1996	[OAP-1998][oap-native-sql] Add support to do numa binding for Columnar Operations
#2004	[OAP-2012][OAP-CACHE][POAE7-635]bug fix: plasma hang - use java thread-safe set
#1988	[OAP-1983][oap-native-sql] Fix Q38 and Q87 when unsafeRow contains null
#1976	[OAP-1983][oap-native-sql] Fix hashCheck performance issue
#1970	[OAP-1947][oap-native-sql][C++] reduce sort kernel memory footprint
#1961	[OAP-1924][OAP-CACHE]Decouple hearbeat message and use conf to determine whether to report locailty information for branch branch-1.0-spark-3.x
#1982	[OAP-1981][OAP-CACHE][POAE7-617]Bug fix binary docache
#1952	[OAP-1951][PMem-Shuffle][SDLe]Specify Scala version in pom.xml
#1919	[OAP-1918][OAP-CACHE][POAE7-563]bug fix: plasma get an invalid value
#1589	[OAP-1588][OAP-CACHE][POAE7-363] Make Parquet splitable
#1954	[OAP-1884][OAP-dev]Small fix for arrow build in prepare_oap_env.sh.
#1933	[OAP-1934][oap-native-sql]Backport NativeSQL code to 1.0
#1889	[OAP-1888][oap-native-sql]Add hash collision check for all HashJoins and hashAggr
#1904	[OAP-1903][oap-native-sql] Fix Local Mode BHJ related UT fail issue
#1916	[OAP-1846][oap-native-sql] clean up travis test
#1923	[OAP-1921][rpmem-shuffle] For BDBA analysis to exclude unused library
#1890	[OAP-1846][oap-native-sql] add script for running unit test
#1905	[OAP-1813][POAE7-555] [OAP-CACHE] package redis related dependency
#1908	[OAP-1884][OAP-dev]Add cxx-compiler in oap conda recipes for native-sql.
#1901	[OAP-1884][OAP-dev]Add c-compiler in oap conda recipes for native-sql.
#1895	[OAP-1884][OAP-dev] Checkout arrow branch in case arrow in other branch
#1876	[OAP-1875]Generating changelog automatically for new releases
#1812	[OAP-1811][OAP-cache][POAE7-486]add sbin folder
#1882	[OAP-1881][oap-native-sql] Fix split use avx512
#1847	[OAP-1846][oap-native-sql] add unit tests from spark to native sql
#1836	[OAP-1835][oap-native-sql] Support ColumnarBHJ to build and broadcast hashrelation
#1885	[OAP-1884][OAP-dev]Add oap-mllib to parent pom and fix error when git clone oneccl.
#1868	[OAP-1653][OAP-Cache]Modify enabled and enable compatibility check
#1853	[OAP-1852][oap-native-sql] Memory Management: Use Arrow C++ memory po…
#1859	[OAP-1858][OAP-cache][POAE7-518] Decouple FilePartition.scala
#1857	[OAP-1833][oap-native-sql] Fix HashAggr hasNext won't stop issue
#1855	[OAP-1854][oap-native-sql] Fix columnar shuffle file not deleted
#1840	[OAP-1839][oap-native-sql] Add prefetch to columnar shuffle split
#1843	[OAP-1842][OAP-dev]Add arrow conda build action job.
#1849	[OAP-1848][SQL Data Source Cache] Decouple OneApplicationResource.scala
#1837	[OAP-1838][SQL Data Source Cache] Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java.
#1757	[OAP-1756][Intel MLlib] Add Kmeans "tolerance" support and test cases
#1845	[OAP-1844][oap-native-sql] Fix columnar shuffle spilled file not deleted
#1827	[OAP-1818][SQL-Data-Source-Cache]Modify Spark webUI OAP Tab expressio…
#1832	[OAP-1831][oap-native-sql] ColumnarWindow: Support reusing same windo…
#1834	[OAP-1833][oap-native-sql][Scala] fix CoalesceBatchs after HashAgg
#1830	[OAP-1829][oap-native-sql] Optimize columnar shuffle and option to use AVX-512
#1803	[OAP-1751][oap-native-sql]fix sort on TPC-DS
#1755	[OAP-1754][Intel MLlib] Improve LibLoader creating temp dir name with UUID
#1826	[OAP-1825] disable pmemblk test
#1802	[OAP-1653][OAP-Cache]Keep consistency on 'enabled' of OapConf configu…
#1810	[OAP-1771]Fix README for Arrow Data Source
#1816	[OAP-1815][oap-native-sql] Memory management: Error on task end if th…
#1809	[OAP-1808][oap-native-sql] ColumnarWindow: Memory leak on converting input/output batches
#1467	[OAP-1457][oap-native-sql] Reserve Spark off-heap execution memory after buffer allocation
#1807	[OAP-1806][oap-native-sql] Fix Columnar Shuffle Memory Leak
#1788	[OAP-1765][oap-native-sql] Fix for dropped CoalecseBatches before ColumnarBroadcastExchange
#1799	[OAP-CACHE][OAP-1690][POAE7-430] Cache backend fall back detect bug fix branch master
#1744	[OAP-CACHE][OAP-1748][POAE7-462] Enable externalDB to store CacheMetaInfo branch master
#1787	[OAP-1786][oap-native-sql] ColumnarWindow: Avoid unnecessary mem copies
#1773	[POAE7-471]Handle oap-common build issue about PMemKV
#1782	[OAP-1631]Update compile scripts from 0.9
#1785	[OAP-1765][oap-native-sql] Support WSCG for nativesql(PART 2)
#1781	[OAP-1765][oap-native-sql] fix codegen for SMJ and HashAgg
#1775	[OAP-1776][oap-native-sql]fix sort memleak
#1766	[OAP-1765][oap-native-sql] Support WSCG for nativesql and use non-codegen join for remainings
#1774	[OAP-1631]Add prepare_oap_env.sh.
#1769	[OAP-1768][POAE7-163][OAP-SPARK] Integrate block manager with chunk api
#1763	[OAP-1759][oap-native-sql] ColumnarWindow: Add execution metrics
#1656	[OAP-1517][oap-native-sql] Improve SortMergeJoin Part2
#1761	[oap-native-sql] quick fix sort on string by fallback to row
#1536	[OAP-1535][oap-native-sql] Add ColumnarWindowExec
#1735	[OAP-1734][oap-native-sql]use non-codegen for sort with single key
#1747	[OAP-1741][rpmem-shuffle]To make java side load native library from jar directly
#1725	[OAP-1727][POAE7-358] Spark integration: Memory Spill to PMem
#1738	[OAP-1733][oap-native-sql][Scala] fix mem leak
#1701	[OAP-1700][oap-native-sql] support join-inside condition project
#1736	[oap-1727][POAE7-358] Add native spark files for memory spill module
#1719	[oap-common][POAE7-347]Stream API for PMem storage store
#1723	[OAP-1679][OAP-CACHE] Remove the code related to reading and writing OAP data format
#1716	[OAP-1717][oap-native-sql] support null in columnar literal and subquery
#1713	[OAP-1712] [OAP-SPARK] Remove file change list from dev directory
#1711	[OAP-1694][oap-native-sql][Scala] fix hash join w/ empty batch
#1710	[OAP-1706][oap-native-sql] Optimize shuffle write
#1705	[OAP-1704][oap-native-sql] Support ColumnarUnion and ColumnarExpand
#1683	[OAP-1682][oap-native-sql] fix aggregate without codegen
#1708	[OAP-1707][oap-native-sql] Fix collect batch metric
#1675	[OAP-1651][oap-native-sql] Adding fallback rules for join/shuffle
#1674	[OAP-1673][oap-native-sql] Adding native double round function
#1632	[OAP-1631][Doc] Add Commit Message Requirements
#1672	[OAP-1610][Intel-MLlib]Upgrade the mahout-hdfs to version 14.1
#1641	[OAP-1651][OAP-1642][oap-native-sql] support TPCDS w/ AQE
#1670	[OAP-1669][oap-native-sql] use distinct ordinal list
#1655	[OAP-1654][oap-native-sql]Columnar shuffle tpcds enabling
#1630	[OAP-1629][oap-native-sql] clean up building scripts
#1601	[OAP-1602][oap-native-sql][Java] fix exract resource from jar
#1639	[OAP-1638][oap-native-sql] tpcds enabling (part2)
#1586	[OAP-1587][oap-native-sql] tpcds enabling (part1)
#1600	[oap-1599][oap-native-sql][Scala] fix broadcasthashjoin
#1555	[OAP-1541][oap-native-sql] TreeNode children not replaced by columnar…
#1546	[OAP-1547][oap-native-sql][Scala] Adding metrics for input/output batches
#1472	[OAP-1466] [RDD Cache] [POAE-354] Initialize pmem with AppDirect and KMemDax mode in block manager

Release 0.8.4

Features


#1865	[OAP-CACHE]Decouple spark code include DataSourceScanExec.scala, OneApplicationResource.scala, Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java for OAP-0.8.4.
#1813	[OAP-cache] package redis client jar into oap-cache

Bugs Fixed


#2044	[OAP-CACHE] Build error due to synchronizedSet on branch 0.8
#2027	[oap-shuffle] Should load native library from jar directly
#1981	[OAP-CACHE] Error runing q32 binary cache
#1980	[SDLe][RPMem-Shuffle]Issues from Static Code Analysis with Klocwork need to be fixed
#1918	[OAP-CACHE] Plasma throw exception:get an invalid value- branch 0.8

PRs


#2045	[OAP-2044][OAP-CACHE]bug fix: build error due to synchronizedSet
#2031	[OAP-1955][OAP-CACHE][POAE7-667]preferLocation low hit rate fix branch 0.8
#2029	[OAP-2027][rpmem-shuffle] Load native libraries from jar
#2018	[OAP-1980][SDLe][rpmem-shuffle] Fix potential risk issues reported by Klockwork
#1920	[OAP-1924][OAP-CACHE]Decouple hearbeat message and use conf to determine whether to report locailty information
#1949	[OAP-1948][rpmem-shuffle] Fix several vulnerabilities reported by BDBA
#1900	[OAP-1680][OAP-CACHE] Decouple FileFormatDataWriter, FileFormatWriter and OutputWriter
#1899	[OAP-1679][OAP-CACHE] Remove the code related to reading and writing OAP data format (#1723)
#1897	[OAP-1884][OAP-dev] Update memkind version and copy arrow plasma jar to conda package build path
#1883	[OAP-1568][OAP-CACHE] Cleanup Oap data format read/write related test cases
#1863	[OAP-1865][SQL Data Source Cache]Decouple spark code include DataSourceScanExec.scala, OneApplicationResource.scala, Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java for OAP-0.8.4.
#1841	[OAP-1579][OAP-cache]Fix web UI to show cache size
#1814	[OAP-cache][OAP-1813][POAE7-481]package redis client related dependency
#1790	[OAP-CACHE][OAP-1690][POAE7-430] Cache backend fallback bugfix
#1740	[OAP-CACHE][OAP-1748][POAE7-453]Enable externalDB to store CacheMetaInfo branch 0.8
#1731	[OAP-CACHE] [OAP-1730] [POAE-428] Add OAP cache runtime enable

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Change log

Release 1.2.0

Gazelle Plugin

Features

Performance

Bugs Fixed

PRs

OAP MLlib

Features

Bugs Fixed

PRs

SQL DS Cache

Features

Bugs Fixed

PRs

PMem Shuffle

Bugs Fixed

PRs

Remote Shuffle

Bugs Fixed

PRs

Release 1.1.1

Native SQL Engine

Features

Bugs Fixed

PRs

SQL DS Cache

Features

Bugs Fixed

PRs

OAP MLlib

Features

PRs

PMem Spill

Features

PRs

PMem Common

Features

PRs

PMem Shuffle

Features

PRs

Remote Shuffle

Features

PRs

Release 1.1.0

Native SQL Engine

Features

Performance

Bugs Fixed

PRs

SQL DS Cache

Features

Bugs Fixed

PRs

OAP MLlib

Features

Bugs Fixed

PRs

PMem Spill

Bugs Fixed

PRs

PMem Shuffle

Features

Bugs Fixed

PRs

Remote Shuffle

Features

Bugs Fixed

PRs

Release 1.0.0

Features

Performance

Bugs Fixed