-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDAP-21027 : upgrading hadoop version to 3.3.6 #15648
Conversation
1f7fd86
to
716f859
Compare
c3bc42e
to
7448b11
Compare
161d9a5
to
53a1eaa
Compare
@@ -172,6 +172,7 @@ public void apply() throws IOException { | |||
// run the partition writer m/r with this output partition time | |||
Map<String, String> arguments = new HashMap<>(); | |||
arguments.put(OUTPUT_PARTITION_KEY, Long.toString(now)); | |||
arguments.put("system.mapreduce.mapreduce.fileoutputcommitter.algorithm.version", "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here explaining why this property is set. We also need to document this in case a customer needs to overwrite this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Regarding documentation, I can add to release notes, but this change is only applicable to map reduce programs, which is already deprecated. I don't think we use map reduce outside of tests anymore.
For spark, we already support custom arguments
...app-fabric/src/main/java/io/cdap/cdap/internal/app/runtime/batch/MapReduceProgramRunner.java
Show resolved
Hide resolved
1f69734
to
dedf0dd
Compare
...ts/src/test/java/io/cdap/cdap/internal/app/runtime/batch/DynamicPartitionerWithAvroTest.java
Show resolved
Hide resolved
cdap-runtime-ext-dataproc/pom.xml
Outdated
<dependency> | ||
<groupId>com.fasterxml.jackson.core</groupId> | ||
<artifactId>jackson-databind</artifactId> | ||
<scope>compile</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to include it here and then exclude with HadoopClassExcluder? Can we just exclude it from the app file altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is was an experiment to completely remove jackson from CDAP or making it provided.
Removing jackson completely causes class not found issues inside app-fabric while provisioning.
Adding this helped resolve the issue.
I think this is no longer required, will remove n test it.
[Edit] : Tested on image, works fine after removing.
regarding "exclude it from the app file " will discuss with you offline.
cdap-security/pom.xml
Outdated
@@ -89,22 +89,32 @@ | |||
<dependency> | |||
<groupId>org.eclipse.jetty</groupId> | |||
<artifactId>jetty-server</artifactId> | |||
<version>${jetty9.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's same in parent pom. Why do we need to override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing it.
cdap-security/pom.xml
Outdated
<dependency> | ||
<groupId>javax.servlet</groupId> | ||
<artifactId>javax.servlet-api</artifactId> | ||
<version>3.1.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to override? Parent pom has <servlet.api.version>3.0.1</servlet.api.version>
. Can we just change it to 3.1.0 over there?
cdap-security/pom.xml
Outdated
@@ -172,6 +182,12 @@ | |||
</exclusion> | |||
</exclusions> | |||
</dependency> | |||
<dependency> | |||
<groupId>commons-logging</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use jcl-over-slf4j instead of commons-logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running UGIProviderTest
,
it fails with
java.lang.NoClassDefFoundError: org/apache/commons/logging/impl/Log4JLogger
at org.apache.hadoop.hdfs.server.common.MetricsLoggerTask.makeMetricsLoggerAsync(MetricsLoggerTask.java:154)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startMetricsLogger(NameNode.java:852)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:805)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1033)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1008)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1782)
at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1390)
```
Seems like `MiniDFSCluster` requires this lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to disable metrics by setting dfs.namenode.metrics.logger.period.seconds to -1. This code is weird and refers log4j. Also the commons-logging would collide with jcl-over-slf4j, so logging will be all over the place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this worked.
005cb34
to
fa0b321
Compare
fa0b321
to
5b75e69
Compare
As a part of resolving vulnerabilities we are upgrading the hadoop version.
Main upgrade
hadoop 2.6.5 --> 3.3.6
other affected upgrades :
jetty 8.1.15.v20140411 --> 9.4.51.v20230217
mockito 1.10.19 --> 3.9.0
powermock 1.7.4 --> 2.0.9
avro 1.8.2 --> 1.11.0
Tested pipelines with some core plugins, avro file , joins, wrangler , GCS, big query.
Tested with draft branches :