Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add systemd unit pid matching to procstat #3459

Merged
merged 1 commit into from
Nov 13, 2017
Merged

add systemd unit pid matching to procstat #3459

merged 1 commit into from
Nov 13, 2017

Conversation

phemmer
Copy link
Contributor

@phemmer phemmer commented Nov 10, 2017

This adds the ability to identify the process to monitor by asking systemd for the PID. It uses the "Main PID" value you can see in systemctl status. In the event that the "children" parameter is true, it grabs all the PIDs within the Cgroup that systemd creates for the service.

I didn't want to add the "children" parameter to just systemd, so I also added it to the existing process matching stuff. Though it is a little heavy as it calls pgrep recursively until it reaches the end of the chain.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Closes #3439

@phemmer
Copy link
Contributor Author

phemmer commented Nov 10, 2017

Here's what it looks like running against the systemd unit splunk.service with children = true:

> procstat,process_name=splunkd,host=fll2aspi01stg,systemd_unit=splunk.service cpu_time_nice=0,cpu_time_irq=0,memory_locked=0i,cpu_time_user=36.86,cpu_time_soft_irq=0,cpu_time_steal=0,memory_rss=2789376i,memory_vms=78925824i,involuntary_context_switches=275i,voluntary_context_switches=2230469i,cpu_time_system=452.82,cpu_time_idle=0,cpu_time_iowait=0,cpu_time_stolen=0,cpu_usage=0,memory_stack=0i,num_threads=1i,cpu_time_guest=0,cpu_time_guest_nice=0,memory_swap=0i,memory_data=0i,pid=6203i 1510350845000000000
> procstat,systemd_unit=splunk.service,process_name=mongod,host=fll2aspi01stg voluntary_context_switches=68841693i,cpu_time_idle=0,cpu_time_nice=0,cpu_time_guest=0,cpu_usage=0,memory_vms=1340375040i,cpu_time_user=1039.12,cpu_time_system=835.64,cpu_time_soft_irq=0,cpu_time_steal=0,cpu_time_stolen=0,pid=6223i,num_threads=57i,involuntary_context_switches=301i,cpu_time_guest_nice=0,memory_rss=106991616i,memory_stack=0i,memory_locked=0i,cpu_time_iowait=0,cpu_time_irq=0,memory_swap=0i,memory_data=0i 1510350845000000000
> procstat,systemd_unit=splunk.service,process_name=python,host=fll2aspi01stg cpu_time_soft_irq=0,cpu_time_stolen=0,memory_vms=1926037504i,memory_data=0i,memory_stack=0i,num_threads=26i,voluntary_context_switches=6960164i,cpu_time_system=275,pid=6366i,cpu_time_iowait=0,cpu_time_guest=0,cpu_time_steal=0,memory_rss=11415552i,memory_swap=0i,memory_locked=0i,involuntary_context_switches=72i,cpu_time_idle=0,cpu_time_nice=0,cpu_usage=1.9832773504692827,cpu_time_user=894.46,cpu_time_irq=0,cpu_time_guest_nice=0 1510350845000000000
> procstat,systemd_unit=splunk.service,process_name=splunkd,host=fll2aspi01stg cpu_time_idle=0,cpu_time_stolen=0,cpu_usage=0,memory_rss=35835904i,memory_swap=0i,voluntary_context_switches=2i,cpu_time_system=419.54,cpu_time_iowait=0,cpu_time_soft_irq=0,cpu_time_steal=0,memory_stack=0i,involuntary_context_switches=7i,cpu_time_nice=0,cpu_time_guest_nice=0,memory_vms=106008576i,memory_data=0i,memory_locked=0i,pid=6413i,num_threads=8i,cpu_time_user=271.14,cpu_time_irq=0,cpu_time_guest=0 1510350845000000000
> procstat,host=fll2aspi01stg,systemd_unit=splunk.service,process_name=splunkd num_threads=3i,cpu_time_nice=0,cpu_time_irq=0,cpu_time_soft_irq=0,memory_rss=38146048i,memory_stack=0i,cpu_time_system=0.12,cpu_time_idle=0,cpu_time_guest=0,memory_vms=3905605632i,pid=12982i,involuntary_context_switches=6i,cpu_time_user=0.91,cpu_time_guest_nice=0,cpu_usage=0,memory_locked=0i,voluntary_context_switches=35i,cpu_time_iowait=0,cpu_time_steal=0,cpu_time_stolen=0,memory_swap=0i,memory_data=0i 1510350845000000000
> procstat,systemd_unit=splunk.service,process_name=splunkd,host=fll2aspi01stg cpu_time_system=0,cpu_time_nice=0,cpu_time_guest_nice=0,memory_rss=5894144i,memory_vms=78925824i,memory_locked=0i,num_threads=1i,cpu_time_user=0,cpu_time_iowait=0,cpu_time_steal=0,memory_swap=0i,voluntary_context_switches=1i,involuntary_context_switches=0i,memory_stack=0i,cpu_time_idle=0,cpu_time_stolen=0,cpu_time_soft_irq=0,cpu_time_guest=0,cpu_usage=0,memory_data=0i,pid=12983i,cpu_time_irq=0 1510350845000000000
> procstat,systemd_unit=splunk.service,process_name=splunkd,host=fll2aspi01stg cpu_time_iowait=0,voluntary_context_switches=719811i,involuntary_context_switches=15i,cpu_time_user=41119.52,cpu_time_idle=0,cpu_time_soft_irq=0,cpu_time_guest=0,cpu_time_guest_nice=0,cpu_usage=7.855250252621229,memory_vms=1570865152i,pid=6197i,cpu_time_system=24431.61,cpu_time_steal=0,num_threads=80i,cpu_time_nice=0,cpu_time_irq=0,cpu_time_stolen=0,memory_rss=305315840i,memory_swap=0i,memory_data=0i,memory_stack=0i,memory_locked=0i 1510350845000000000

Edit: Hrm, this does raise an issue. We have multiple points that would overwrite each other. Several of the processes share the same process_name param, so the tag combination is not unique. We could use the same index pool idea I mentioned over on #3436 to prevent the cardinality issue on PID as a tag.

@danielnelson
Copy link
Contributor

What about adding this as a cgroup option? I guess it would need to be specified as cgroup = systemd/system.slice/ssh.service, but the upside is you could use it with any cgroup.

I'm not sure about the recursive pgrep calls, seems like too much since it is ran each gather, but I can imagine the cgroup or PGID setting or some other ways of selecting groups.

I think the ID pool could be useful here, people are always free to use pid_tag = true in addition if they are concerned about the reuse.

@phemmer
Copy link
Contributor Author

phemmer commented Nov 10, 2017

Well if we're going to go the cgroup route, what about changing the pidfile matcher to support reading multiple PIDs (delimited by whitespace). Then you could just specify the path to the cgroup's cgroup.procs.

@danielnelson
Copy link
Contributor

Implementation could be essentially that, but I think in the config it may be more discoverable as a cgroup option, since I wouldn't guess for a pidfile option to support this.

@phemmer
Copy link
Contributor Author

phemmer commented Nov 13, 2017

PR updated to basically split apart the systemd & cgroup matching. The systemd matcher now only returns the master pid. The cgroup matcher will return all pids in the cgroup.
Also removed the children pids stuff.

@danielnelson danielnelson added this to the 1.5.0 milestone Nov 13, 2017
@danielnelson danielnelson merged commit 6ee6d55 into influxdata:master Nov 13, 2017
@phemmer
Copy link
Contributor Author

phemmer commented Nov 13, 2017

Wasn't expecting it to be merged just yet, has no tests :-)
I'll create a follow up PR to add some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants