Skip to content

TEZ-4689: Introduce Node abstraction for DAGAppMaster instead of sepagrate NodeManager-related fields#461

Open
Aggarwal-Raghav wants to merge 4 commits intoapache:masterfrom
Aggarwal-Raghav:TEZ-4689
Open

TEZ-4689: Introduce Node abstraction for DAGAppMaster instead of sepagrate NodeManager-related fields#461
Aggarwal-Raghav wants to merge 4 commits intoapache:masterfrom
Aggarwal-Raghav:TEZ-4689

Conversation

@Aggarwal-Raghav
Copy link
Contributor

No description provided.

private final int nmPort;
private final int nmHttpPort;
private String nmHost;
private int nmPort;
Copy link
Contributor Author

@Aggarwal-Raghav Aggarwal-Raghav Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nmPort is not used anywhere but still keeping getAppNMPort() public api

if (!isLocal) {
this.nmHost = nodeContext.getNodeHostString();
int nmHttpPort = Integer.parseInt(nodeContext.getNodeHttpPortString());
this.containerLogs =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.containerLogs is making use of nmHttpPort and nmHost, moved it from constructor to serviceInit() and made nmHttpPort as local variable

DAGProtos.ConfigurationProto confProto = amExtensions.loadConfigurationProto();
TezUtilsInternal.addUserSpecifiedTezConfiguration(conf, confProto.getConfKeyValuesList());

NodeContext nodeContext = new YarnNodeManagerContext();
Copy link
Contributor Author

@Aggarwal-Raghav Aggarwal-Raghav Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to wrap in if statement for checking if framework is yarn. As the YarnNodeManagerContext is using supplier it won't be evaluated immediately and evaluation will happen only it !isLocal.
NOTE: in tez-am docker image tez.local.mode = true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assume that tez.local.mode is going to become false with this whole initiative eventually, when reaching the optimal state of separate AM and Task containers

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@Aggarwal-Raghav Aggarwal-Raghav marked this pull request as ready for review February 28, 2026 17:13
@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

// Install the tez class loader, which can be used add new resources
TezClassLoader.setupTezClassLoader();
Thread.setDefaultUncaughtExceptionHandler(new YarnUncaughtExceptionHandler());
final String pid = System.getenv().get("JVM_PID");
Copy link
Contributor

@abstractdog abstractdog Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I like using ProcessHandle instead of JVM_PID, I'm not sure if this is 100% correct
I'm concerned by the fact that JVM_PID is used in yarn too, see in hadoop code, e.g. https://github.com/apache/hadoop/blob/f49c49aa083b9324ae432902f94a7aaf98f543ea/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java#L126

also, not 100% sure if JVM_PID belongs to the actual Java process id or the launch_container.sh shell script provided by yarn: even if the shell script must be the parent of the jvm process, they are not the same

Copy link
Contributor Author

@Aggarwal-Raghav Aggarwal-Raghav Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll investigate on this and get back to you on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i tested by putting both

LOG.info("######");
      LOG.info("System.getenv().get(\"JVM_PID\") value is: {}", old_pid);
      LOG.info("ProcessHandle.current().pid() value is : {}", pid);
      LOG.info("######");

the JVM_PID belongs to launch_container.sh and
ProcessHandler one is DagAppMaster

Screenshot confirming the same.
Screenshot 2026-03-04 at 11 52 14 PM
Screenshot 2026-03-04 at 11 49 46 PM

Will revert this change. Thanks for pointing this!!

Comment on lines +1214 to +1219
public int getAppNMPort() {
return nmPort;
return nodeContext.getNodePort();
}

public int getAppNMHttpPort() {
return nmHttpPort;
return nodeContext.getNodeHttpPort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a chance we can remove getAppNMPort and getAppNMHttpPort methods altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought so but they are public api, but its a tez major version change so we can do so. I wanted to discuss the same with you. Let me know.
there are few variables that can be made final and local to method instead of class varaibles in DagAppMaster. I wanted to clean that up and use JDK 21 features like enhanced switch but the change would have been outside the scope of PR. I can create a separte PR for code hygiene?

Copy link
Contributor

@abstractdog abstractdog Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code hygiene patches are always welcome
IMO you can remove these methods here, and do more in separate PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will update PR shortly and raise separate JIRA/PR for code hygiene. Thanks

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 18s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 10m 59s master passed
+1 💚 compile 0m 43s master passed
+1 💚 checkstyle 1m 6s master passed
+1 💚 javadoc 0m 36s master passed
+0 🆗 spotbugs 2m 22s tez-dag in master has 749 extant spotbugs warnings.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 codespell 0m 51s No new issues.
+1 💚 compile 0m 30s the patch passed
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 10s /buildtool-patch-checkstyle-tez-dag.txt The patch fails to run checkstyle in tez-dag
+1 💚 javadoc 0m 13s the patch passed
+1 💚 spotbugs 1m 44s the patch passed
_ Other Tests _
+1 💚 unit 5m 58s tez-dag in the patch passed.
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
27m 37s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/5/artifact/out/Dockerfile
GITHUB PR #461
Optional Tests dupname asflicense javac javadoc unit spotbugs checkstyle codespell detsecrets compile
uname Linux 3e7149cc127d 5.15.0-160-generic #170-Ubuntu SMP Wed Oct 1 10:06:56 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/.yetus/personality.sh
git revision master / b882a0c
Default Java Ubuntu-21.0.10+7-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/5/testReport/
Max. process+thread count 262 (vs. ulimit of 5500)
modules C: tez-dag U: tez-dag
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/5/console
versions git=2.43.0 maven=3.8.7 spotbugs=4.9.3 codespell=2.4.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@abstractdog
Copy link
Contributor

abstractdog commented Mar 6, 2026

hm, how is this possible:
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/5/artifact/out/buildtool-patch-checkstyle-tez-dag.txt

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.1.1:checkstyle (default-cli) on project tez-dag: An error has occurred in Checkstyle report generation. Failed during checkstyle configuration: Exception was thrown while processing /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/tez-dag/src/main/java/org/apache/tez/dag/app/NodeContext.java: IllegalStateException occurred while parsing file /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/tez-dag/src/main/java/org/apache/tez/dag/app/NodeContext.java. /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/tez-dag/src/main/java/org/apache/tez/dag/app/NodeContext.java:18:8: unexpected token: sealed -> [Help 1]

I thought sealed interfaces are present since JDK17, so that must not be a problem...however, it compiled successfully, but the checkstyle:checkstyle target was the one that failed, I hope the same can be reproduced locally

@Aggarwal-Raghav
Copy link
Contributor Author

hm, how is this possible: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/5/artifact/out/buildtool-patch-checkstyle-tez-dag.txt

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.1.1:checkstyle (default-cli) on project tez-dag: An error has occurred in Checkstyle report generation. Failed during checkstyle configuration: Exception was thrown while processing /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/tez-dag/src/main/java/org/apache/tez/dag/app/NodeContext.java: IllegalStateException occurred while parsing file /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/tez-dag/src/main/java/org/apache/tez/dag/app/NodeContext.java. /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/tez-dag/src/main/java/org/apache/tez/dag/app/NodeContext.java:18:8: unexpected token: sealed -> [Help 1]

I thought sealed interfaces are present since JDK17, so that must not be a problem...however, it compiled successfully, but the checkstyle:checkstyle target was the one that failed, I hope the same can be reproduced locally

I'm doing some investigation on checkstyle issue as part of #462 . but using checkstyle 3.6.0 and removing puppycrawler explicit dependency, then its working.

NodeContext nodeContext =
(frameworkService instanceof YarnServerFrameworkService)
? new YarnNodeManagerContext()
: null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of an instanceof check we should let ServerFrameworkService interface handle this: the advantage of separate framework service implementations is to eliminate these type checks (the ZkStandaloneServerFrameworkService can simply return null or a LocalNodeContext with dummy values)

}

@Override
public String getNodeHostString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be nodeHost? I believe "String" doesn't add any value here, it could be eliminated from variable names as well as interface

Comment on lines +29 to +31
private final Supplier<String> nodeHostStringSupplier;
private final Supplier<Integer> nodePortStringSupplier;
private final Supplier<Integer> nodeHttpPortStringSupplier;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also remove "Supplier" suffix from the variable names: by looking at the variables, it should be clear that they are Suppliers

@abstractdog
Copy link
Contributor

abstractdog commented Mar 6, 2026

@Aggarwal-Raghav : left some minor comments, but this looks good so far

@Aggarwal-Raghav
Copy link
Contributor Author

@Aggarwal-Raghav : left some minor comments, but this looks good so far

Thanks! Will update PR I'm some time.

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 11m 5s master passed
+1 💚 compile 0m 44s master passed
+1 💚 checkstyle 1m 2s master passed
+1 💚 javadoc 0m 37s master passed
+0 🆗 spotbugs 2m 22s tez-dag in master has 749 extant spotbugs warnings.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 codespell 0m 52s No new issues.
+1 💚 compile 0m 32s the patch passed
+1 💚 javac 0m 32s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 11s /buildtool-patch-checkstyle-tez-dag.txt The patch fails to run checkstyle in tez-dag
+1 💚 javadoc 0m 13s the patch passed
+1 💚 spotbugs 1m 41s the patch passed
_ Other Tests _
+1 💚 unit 6m 2s tez-dag in the patch passed.
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
27m 41s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/6/artifact/out/Dockerfile
GITHUB PR #461
Optional Tests dupname asflicense javac javadoc unit spotbugs checkstyle codespell detsecrets compile
uname Linux b39160e6b376 5.15.0-160-generic #170-Ubuntu SMP Wed Oct 1 10:06:56 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-461/src/.yetus/personality.sh
git revision master / 86675af
Default Java Ubuntu-21.0.10+7-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/6/testReport/
Max. process+thread count 217 (vs. ulimit of 5500)
modules C: tez-dag U: tez-dag
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-461/6/console
versions git=2.43.0 maven=3.8.7 spotbugs=4.9.3 codespell=2.4.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants