Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from tcpdump to ~~ss~~ nstat for connection checks #417

Merged
merged 2 commits into from
Oct 25, 2021
Merged

Switch from tcpdump to ~~ss~~ nstat for connection checks #417

merged 2 commits into from
Oct 25, 2021

Conversation

solacelost
Copy link
Contributor

@solacelost solacelost commented Oct 5, 2021

This changes from tcpdumping with a 3 second timeout while looking for packets to checking active UDP sessions being tracked incoming UDP datagrams tracked over a modifiable 3-second window, with a default idle threshold of 30 datagrams in that window.

This works completely unprivileged and I can confirm it working on rootless podman with SELinux enforcing on a Fedora system. Also tested on Debian 11 with rootful podman and bridge networking by @Tremolo4 with a non-root PUID.

fixes #413

@solacelost
Copy link
Contributor Author

Also, a side note, it's possible that implementations differ based on kernel or something other than what I can think of, but I see UDP connections falling off of the ss list about 30 seconds after a client disconnects.

lloesche
lloesche previously approved these changes Oct 5, 2021
Copy link
Owner

@lloesche lloesche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you! 🚢

@lloesche
Copy link
Owner

lloesche commented Oct 5, 2021

The CI check failed with the following errors:

#37 3.203 In /usr/local/etc/valheim/common line 143:
#37 3.203         if [ $(ss -u | grep -F "$SERVER_PORT" | wc -l) -gt 0 ]; then
#37 3.203              ^-- SC2046: Quote this to prevent word splitting.
#37 3.203                        ^-- SC2126: Consider using grep -c instead of grep|wc -l.

Essentially it would like to see

if [ "$(ss -u | grep -cF "$SERVER_PORT")" -gt 0 ]; then

@solacelost
Copy link
Contributor Author

Fixed. I should have run shellcheck. | wc -l is just second nature to me :)

Copy link
Owner

@lloesche lloesche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still complaining about

#37 2.621 In /usr/local/etc/valheim/common line 143:
#37 2.621         if [ $(ss -u | grep -cF "$SERVER_PORT") -gt 0 ]; then
#37 2.621              ^-- SC2046: Quote this to prevent word splitting.

The $() must be quoted. See my original comment above.

@Tremolo4
Copy link
Contributor

I tested this and it doesn't seem to work for me 🤔. Server is permanently idle according to my debug logging (debug prints in the if-else).
Even when I start a shell in the container and run ss manually, the only thing I see is a TCP connection to an IP from Valve (Steam).
tcpdump does show my UDP packets to the server while I'm on it. ss -u shows just an empty table.

Any ideas?

@solacelost
Copy link
Contributor Author

solacelost commented Oct 11, 2021

I'm unsure how it behaves in Docker with the default bridge networking there. As I mentioned, I'm running with rootless podman which means using slirp4netns networking. This is what I see:

[valheim@games ~]$ podman exec -it valheim2 bash
root@78e4cf38f876:/# ss -u # nobody is connected right now
Recv-Q                    Send-Q                                        Local Address:Port                                         Peer Address:Port
root@78e4cf38f876:/# ss -u # this is with one user logged on
Recv-Q                    Send-Q                                       Local Address:Port                                          Peer Address:Port
0                         0                                               10.0.2.100:40022                                           10.0.2.100:2456

(eta: I don't have a host with a normal docker-ce or moby installation handy to check - though I'm interested to hear if other people see the same behavior)

edit2: Also, to make sure that an unprivileged user inside the container could pull the UDP socket information, I manually added a non-root user inside the container and ran this:

[valheim@games ~]$ podman exec -it -u valheim2 valheim2 grep valheim2 /etc/passwd
valheim2:x:1000:1000::/home/valheim2:/bin/sh
[valheim@games ~]$ podman exec -it -u valheim2 valheim2 ss -u
Recv-Q                                                Send-Q                                                                                                Local Address:Port                                                                                                  Peer Address:Port
0                                                     0                                                                                                        10.0.2.100:36149                                                                                                   10.0.2.100:2456

And that behavior is despite the fact that, in my implementation, I haven't changed the PUID for valheim away from 0, so this is a non-root user inside the userns able to see the UDP sockets of the root user inside the userns.

@Tremolo4
Copy link
Contributor

I'm using docker-ce as distributed with Debian 11 with default bridge networking as you mentioned.

I've tried all constellations of non-root and root-users with ss -u and ss -ua inside and outside of the docker container. Nothing changes in the table after I connect to the valheim server (as the first and only user).

It's interesting that both the local address and the peer address are the same for you. I guess it has something to do with how the podman networking works.

@TheHades
Copy link
Contributor

UDP is stateless. I think that is the reason, why we don't get any information out of ss ?

@solacelost
Copy link
Contributor Author

UDP is stateless, but the implementation of tracking that I'm seeing in SS is unrelated to the state of the UDP connection in the same way that the TCP session would be tracked. I noted that the connections dropped off of my ss output after a pretty-strict 30 seconds. It's possible that this is related to the nftables implementation I have on my host system, outside of the container. If you're using an iptables based interface, it should be hitting similar netfilter interfaces in the kernel to affect the UDP tracking.

I just double-checked and see that when I run sudo sysctl net.netfilter.nf_conntrack_udp_timeout I get the output net.netfilter.nf_conntrack_udp_timeout = 30.

Could you check the value on your host system, @Tremolo4 ?

@Tremolo4
Copy link
Contributor

Could you check the value on your host system, @Tremolo4 ?

I also get net.netfilter.nf_conntrack_udp_timeout = 30. Have you tried changing it and seeing whether that changes the timeout in ss output? Just to confirm that nf_conntrack is the basis for ss output (seems likely to me).

I've tested ss -u with another UDP gameserver that runs on my host, but not within a docker container. Nothing shows up there either, unless I do ss -ul (or ss -ua). Then I can see the server's sockets:

State  Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
UNCONN 0      0            0.0.0.0:7777       0.0.0.0:*
UNCONN 0      0            0.0.0.0:7778       0.0.0.0:*
...

While on the server I can sometimes see the Recv-Q and Send-Q values going up. Note that I can also see the Valheim sockets there.

UNCONN 0      0            0.0.0.0:2458       0.0.0.0:*
UNCONN 0      0            0.0.0.0:2459       0.0.0.0:*

In the docker container they also show up, like this (regardless of whether someone was on the server in the last 30 seconds):

root@43708c1e164d:/# ss -ul
State     Recv-Q    Send-Q       Local Address:Port        Peer Address:Port
UNCONN    0         0                  0.0.0.0:2459             0.0.0.0:*
UNCONN    0         0                        *:2458                   *:*

@solacelost My guess is the way your container networking works causes ss to "incorrectly" interpret the valheim server sockets as "outgoing" UDP traffic. In my case, they are correctly identified as "incoming", listening server sockets. So in your case that gives the advantage of being able to track activity due to the 30 second timeout for "outgoing" "connections", but in my case they are always there in the listen socket list (ss -ul).

@solacelost
Copy link
Contributor Author

solacelost commented Oct 14, 2021

Alright - I think I have an alternative that should work. @Tremolo4 mind testing this one out before I make another commit (which I will definitely run shellcheck on 😇)? Make sure you have iproute2 installed in your container instance, the same package that provides ss. Run the following from your host:

RUNTIME=docker # here I was using podman
UID=1000 # this should work set to any arbitrary UID, even if the user doesn't exist
CONTAINER_INSTANCE_ID=valheim # this should be the name or container ID of your Valheim container

# This should print the number of UDP datagrams received by the network namespace since it was last checked
datagrams() {
    $RUNTIME exec --user $UID $CONTAINER_INSTANCE_ID nstat | \
          awk '/^UdpInDatagrams/{print $2}' | \
          tr -d ' '
}
datagrams_print() {
    dg_count=$(datagrams)
    echo ${dg_count:-0} datagrams received
}

# This is to print the timestamp to the nearest centisecond to make it easy to see the passage of time
ts() {
    date +%s.%N | head -c 13
    echo
}

# This throws away the starting statistics
datagrams &>/dev/null

# This is enough for you to see some traffic with a little bit of arbitrary spacing in between
ts
datagrams_print
ts
datagrams_print
sleep 0.5
ts
datagrams_print
ts

This mock-up is designed to be run from your host and should demonstrate that this method will work for you on Docker on Debian with bridge networking and normal rootful docker with an unprivileged user in the container, while also working for me on Podman on Fedora with slirp4netns networking and rootless podman and either rootless or root users inside the container (because for me the majority of the caps are taken away at the runtime level). Here's my output from the above when nobody is connected to a server:

1634183087.67
0 datagrams received
1634183087.90
0 datagrams received
1634183088.53
0 datagrams received
1634183088.66

and with two users connected:

1634183116.55
21 datagrams received
1634183116.68
20 datagrams received
1634183117.31
103 datagrams received
1634183117.44

If I'm right, and this works for you, I'll have the PR patched up to measure the amount of datagrams received over a three-second period and have server_is_idle return the number of datagrams received, so 0 will be is_idle and >0 will be (capped at 255, thanks to shell behavior) not is_idle.

Edited to add one important fact: I'm running two copies of this image right now. The above 0 datagrams received and <not 0> datagrams received messages were on the same box, two separate container instances, one non-root user running them both, and taken while both users were logged into one of them. That is, the number of datagrams received was confirmed to be confined to the network namespace - although not strictly confined to the Valheim server process. If you have a way of confirming this on your end as well (active incoming UDP traffic of any sort should work honestly) then I'd appreciate it.

@Tremolo4
Copy link
Contributor

Tremolo4 commented Oct 14, 2021

That seems to work for me too, thanks a lot!

I've confirmed that it does not pick up any other UDP traffic outside of the valheim container, not even from another valheim container.

One problem is that this method can't distinguish port numbers, so we can't check for game traffic on port 2456 only. I've been running the server in public mode to make the idle check work, so I very regularly get queries on port 2457 from people refreshing the server list. They get picked up by tcpdump -n udp and also by nstat. I've turned public mode off just now and there are still some queries to port 2457, but a lot fewer and without a response from the server of course. I guess these are from people whose cached server browser list still contains my server or some other server monitoring tools.

The other non-player UDP packets I see in tcpdump are DNS queries from steamcmd whenever the update check runs.

Since even a single player on the server causes a lot more packets per second than this, I suggest we use a threshold to determine whether it's player traffic or not. Something like 30 packets per 3 seconds should be good. With just me on the server I'm getting around 100 - 400 packets per 3 seconds, depending on whether I move around or not etc. (using docker exec -it <container_name> timeout 3 tcpdump -n udp and port 2456).

Unless of course you know a way to check only a specific UDP port :D

@solacelost
Copy link
Contributor Author

I can't think of a way to get per-socket statistics without enabling conntrack on the host, which goes pretty well outside of the bounds of what we should be requiring with a container.

@lloesche how do you feel about the use of 30 UDP datagrams within a 3-second period being the default threshhold to identify an actively connected player, with an environment variable defined to tune it per-instance if necessary? I'll do up this implementation now and you can review shortly.

@solacelost
Copy link
Contributor Author

$ shellcheck -axs bash -e SC2034 common defaults
$ echo $?
0

😇

@solacelost solacelost changed the title Switch from tcpdump to ss for connection checks Switch from tcpdump to ~~ss~~ nstat for connection checks Oct 14, 2021
@solacelost
Copy link
Contributor Author

Just made one other tweak because my .vimrc automatically prunes lots of whitespace when I save and I didn't catch that it slipped into the documentation commit, so I split the commit up into two chunks. Up to you if you'd like to have the idle detection documented, we can revert or I can drop it from the tree.

@solacelost
Copy link
Contributor Author

solacelost commented Oct 14, 2021

Annnd a rebase! Apologies for the GHA spam - I did not expect it to be quite so bad as it is. I'm not used to GHA when I'm not the approver.

@lloesche
Copy link
Owner

lloesche commented Oct 17, 2021

@solacelost thank you, looks good! But please restore the trailing double spaces in the README. They are Markdown syntax to enforce linefeeds.

@solacelost
Copy link
Contributor Author

Any update? I'm already running this version here successfully. Just hope the PRs can get cleaned up here so I can shift back to your image :)

@lloesche lloesche merged commit 565942d into lloesche:main Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Idle check fails (says always idle) when SERVER_PUBLIC = false and PUID != 0
4 participants