Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jmcd error when pulling java metrics when not running as Cassandra user #123

Open
joelsdc opened this issue Apr 25, 2022 · 2 comments
Open

Comments

@joelsdc
Copy link
Contributor

joelsdc commented Apr 25, 2022

ds-collector v2.0.2:

I've noticed the following error:

	executing `jcmd 8890 VM.system_properties > java_system_properties.txt`… com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
	at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
	at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
	at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:208)
	at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:147)
	at sun.tools.jcmd.JCmd.main(JCmd.java:131)
failed
	executing `jcmd 8890 VM.command_line > java_command_line.txt`… com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
	at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
	at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
	at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:208)
	at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:147)
	at sun.tools.jcmd.JCmd.main(JCmd.java:131)
failed

The issue here is running jcmd with a different user that the one owning the process. In this case, root is running the collector and cassandra is running the service, therefor it should be cassandra who runs jcmd instead of root.

As a workaround I have added sudo -u cassandra to the jcmd entries here:

https://github.com/datastax/diagnostic-collection/blob/master/ds-collector/rust-commands/collect-info.rs#L919-L940

This workaround is ugly at best. 😂

As the collector already handles finding the Cassandra PID to run jcmd, one better approach would be to run something like ps -o user= -p${cassandra_pid} once we have the ${cassandra_pid} to get the specific user running Cassandra, and then doing a proper sudo -u ${cassandra_pid_owner} jcmd ... the command doesn't fail.

I'm not sure what the best approach code-wise is for this one, I think the changes belong more in the rust side of the collector and I get lost there.

@joelsdc
Copy link
Contributor Author

joelsdc commented Apr 25, 2022

FYI: The hacky workaround works..., now a matter of adding it correctly instead of hardcoded.

...
	executing `sudo -u cassandra jcmd 8890 VM.system_properties > java_system_properties.txt`… OK
	executing `sudo -u cassandra jcmd 8890 VM.command_line > java_command_line.txt`… OK
...

@michaelsembwever
Copy link
Member

michaelsembwever commented Apr 25, 2022

The jcmd command is optional. ref: https://github.com/datastax/diagnostic-collection/blob/master/ds-collector/rust-commands/collect-info.rs#L924

So i've not yet convinced of a need for a quick hack.

I agree something better needs to be done here.
I think what we want is a run_user option in collector.conf and the first thing the script does on each node is su - <run_user>

That we have hardcoded the ssh user to the user we want to run the script on each node has bitten us as a limitation a few times already.
Such a run_user approach would solve this, I believe …?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants