Page MenuHomePhabricator

control-port-filter-proxy sd_notify support
Closed, ResolvedPublic

Description

In T273#3852, @nrgaway wrote (edited):

There are a few deficiencies we may want to consider for tor-controlport-filter for the future:
1. cpfp.py should daemonize itself and write the PID to the proper location (done)

  1. /usr/lib/tor-controlport-filter should implement a watchdog (sd_notify) function

The watchdog feature would be nice to have, which would restart the service if it appeared to be running, but became unresponsive.


python-daemon looks interesting. In Debian:

with sd-notify:
https://packages.debian.org/sid/python3-cotyledon

python3-sdnotify:

python3-systemd:
https://packages.debian.org/stretch/python3-systemd


TODO:

Details

Impact
Normal

Event Timeline

Patrick created this task.Apr 24 2015, 6:54 AM
Patrick updated the task description. (Show Details)
Patrick raised the priority of this task from to Normal.
Patrick set Impact to Normal.
Patrick added subscribers: Patrick, troubadour, nrgaway.
Patrick updated the task description. (Show Details)Apr 24 2015, 6:57 AM

The daemonized control-port-filter-python is nearly completed (with some tricks for wheezy).

Remarks and suggestions.

I will create a new repository, suggested name control-port-filter-daemon. Instead of cpfp.py, the file name could be cpfpd without the .py extension. Since it's a daemon, perhaps a better path would be /usr/bin. I don't know in systemd, but in init.d, it would look like

case "$1" in
  start)
    echo "Starting control port filter" 
    cpfpd start
    ;;
  stop)
    echo "Stopping control port filter"
    cpfpd stop
    ;;
  restart)
     "Restarting control port filter"
    cpfpd restart
    ;;
esac

Which leads to the first issue. There is no status in python-daemon. It could probably be implemented in a non standard way. To be explored.

Also, when the daemon is running, starting it again outputs a timeout error because the lock file (pid.lock) exists, it seems. Looks like a bug, the daemon is still running.

Quoting @nrgaway

cpfp.py should daemonize itself and write the PID to the proper location

What is the proper location for a daemon pid?

The daemonized control-port-filter-python is nearly completed (with some tricks for wheezy).

I guess this is sufficient for Whonix 11 / jessie.

I will create a new repository, suggested name control-port-filter-daemon.

I don't think this should become a separate package. Never saw something like this. Like apache-server and apache-daemon, doesn't exist.

Just installing control-port-filter-python without the -daemon package, what's the use case for that? Anyone up to that could rather just disable the systemd unit and have the same effect.

Instead of cpfp.py, the file name could be cpfpd without the .py extension. Since it's a daemon, perhaps a better path would be /usr/bin.

I guess /usr/sbin as per https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard.

Which leads to the first issue. There is no status in python-daemon. It could probably be implemented in a non standard way. To be explored.

Hm. Do you know any existing python daemon(s) so we can do a standard thing?

What is the proper location for a daemon pid?

/var/run/control-port-filter-python.pid? Looks like most packages do it that way.

I thought it's working like this: you somehow import a library into a python script, that does the listening to events. Systemd starts the binary. Once the system shuts down or [manually] or so, systemd sends a signal to the daemon. It receives it and acts accordingly.

The python-daemon package is more like separate binary in between?

The daemonized control-port-filter-python is nearly completed (with some tricks for wheezy).

Remarks and suggestions.

I will create a new repository, suggested name control-port-filter-daemon. Instead of cpfp.py, the file name could be cpfpd without the .py extension. Since it's a daemon, perhaps a better path would be /usr/bin. I don't know in systemd, but in init.d, it would look like

case "$1" in
   start)
     echo "Starting control port filter" 
     cpfpd start
     ;;
   stop)
     echo "Stopping control port filter"
     cpfpd stop
     ;;
   restart)
      "Restarting control port filter"
     cpfpd restart
     ;;
 esac

Which leads to the first issue. There is no status in python-daemon. It could probably be implemented in a non standard way. To be explored.

We will not need a status function for systemd; sysv init can also implement following procedure. status will be based on PID and if PID is running, status will return with success when using systemctl status control-port-filter-daemon.

If you also implement the watchdog function (sd_notify), systemd will automatically restart the daemon if the watchdog has not heard from daemon in the specified time period which indicates the daemon process is most likely locked up.

Also, when the daemon is running, starting it again outputs a timeout error because the lock file (pid.lock) exists, it seems. Looks like a bug, the daemon is still running.

Quoting @nrgaway

cpfp.py should daemonize itself and write the PID to the proper location

What is the proper location for a daemon pid?

Either:
/var/run/control-port-filter-daemon.pid

  • OR -

/var/run/control-port-filter-daemon/pid

If a /var/run/control-port-filter-daemon directory is not required for other related files, then it saves having to create such a directory by placing the pid in directly in /var/run.

whonixcheck would also need to be update to use new location for pid

troubadour added a comment.EditedApr 28 2015, 8:42 PM
In T274#4071, @Patrick wrote:

I don't think this should become a separate package. Never saw something like this. Like apache-server and apache-daemon, doesn't exist.

Just installing control-port-filter-python without the -daemon package, what's the use case for that? Anyone up to that could rather just disable the systemd unit and have the same effect.

I thought it's working like this: you somehow import a library into a python script, that does the listening to events. Systemd starts the binary. Once the system shuts down or [manually] or so, systemd sends a signal to the daemon. It receives it and acts accordingly.

The python-daemon package is more like separate binary in between?

I should have mentioned that the daemon and control-port-filter are in a single package (single file). Instead of importing control-port-filter in the daemon, it's in the same script. Same behavior, but self-contained.

Created a temporary repository. You can see the code at https://github.com/troubadoour/cpfpd/blob/master/cpfpd

Was thinking of a new repository because of the major code change and the different path. And the name of the package would become control-port-filter-daemon.

The pid file should be in /var/run/control-port-filter-daemon/, becuse the daemon creates three files:

host.MainThread.nnnnn
pid
pid.lock
In T274#4073, @nrgaway wrote:

We will not need a status function for systemd; sysv init can also implement following procedure. status will be based on PID and if PID is running, status will return with success when using systemctl status control-port-filter-daemon.

That's good news. It's probably why it's not implemented in python-daemon.

During upgrades we have more trouble when we deprecate packages (ex: control-port-filter-python), want them uninstalled in favor of new packages (ex: control-port-filter-daemon). The control-port-filter-> control-port-filter-python transition has cost quite some effort.

I see no reason why we should abstain from huge refactoring/moving around files in existing packages. We can keep control-port-filter-python as is (perhaps with minor maintenance fixes if really urgent) in Whonix 10 and have a major revision in Whonix 11.

Sounds good.

Yes, nothing urgent. This change is not intended for Whonix 10, say we have a good base and we can fine tune the script before Whonix 11 release. I guess some small issues might solve themselves after it's tested in a jessie based Whonix. We could change the tag, make tit Whonix 11.

Yes.

(I just didn't add the Whonix 11 tag, because I didn't intent to do it myself for Whonix 11. Was more like adding it to the pile. But if you plan on doing stuff for a specific release tag, please feel free to add the tag and/or to claim the task.)

(I'll start merging your code and testing it on my system shortly after you're done as usually so I am not very concerned about missing issues.)

Pushed some changes in control-port-filter-python.

The last one includes python-daemon in debian/control.

Replaced usr/lib/control-port-filter-python/cpfp.py with usr/sbin/cpfpdin https://github.com/troubadoour/control-port-filter-python/commit/d7d5839ce50b7dc9c3d64cb84b644bda63ef6051

New etc/init.d/control-port-filter-python in https://github.com/troubadoour/control-port-filter-python/commit/a7875d45b79a600c40ebb1ccfbed05a78c9d80ca

New lib/systemd/control-port-filter-python.service in https://github.com/troubadoour/control-port-filter-python/commit/e5a376f03aa80810af0078e4dd5222eba5b36b6f

It 's tested in Whonix 10.0.0.5.5. A strange issue (to me). the daemon dose not start at boot, but it runs after

sudo service control-port-filter-python start

stop, restart, status work.

An update to cpfpd.

Pushed a few commits. The relevant ones to some issues.

https://github.com/troubadoour/control-port-filter-python/commit/9226b2923034e011a994941d37578e6a93720e3f
In ControlPortFilter() init(), replace /dev/tty with /dev/null for stdout and stderr. /dev/tty is unknown when the service is started by systemd.

https://github.com/troubadoour/control-port-filter-python/commit/5a06efa8cad9c23734963740400784b376a8ec82
The service does not start when user and group are set to debian-tor. Commented out the settings in control-port-filter-python.service.

Now the daemon starts, but is only activating, it does not reach the active state and is restarted after the TimeoutSec period. Investigating.

Other commits:
https://github.com/troubadoour/control-port-filter-python/commit/f0eedc86739fb2524be35fe29fa92dbc30f9e80a
Create /var/run/control-port-filter-python directory, needed because three files are written by the daemon.

https://github.com/troubadoour/control-port-filter-python/commit/baf8eaa1bb891343bd6ac6aa5fc51e2f7673f833
Copyright (lintian warning).

Still trying to get the man page without warning.

I think using /bin/touch in systemd unit files might work, but it's non-ideal. systemd units aren't shell scripts, and therefore fast. Forking to touch is "slow". The canonical way looks like using /usr/lib/tmpfiles.d/ - http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html.

https://github.com/troubadoour/control-port-filter-python/commit/9226b2923034e011a994941d37578e6a93720e3f
In ControlPortFilter() init(), replace /dev/tty with /dev/null for stdout and stderr. /dev/tty is unknown when the service is started by systemd.

That works and if we don't need /dev/tty, that's fine. Otherwise we could tell systemd to wait until /dev/tty is available by figuring out on which dependency to add.

Other commits:
https://github.com/troubadoour/control-port-filter-python/commit/f0eedc86739fb2524be35fe29fa92dbc30f9e80a
Create /var/run/control-port-filter-python directory, needed because three files are written by the daemon.

That's fine for ports and running as root. But otherwise I think it's up to systemd creating these files as owned by user debian:tor and running as user debian:tor.

Still trying to get the man page without warning.

override_dh_install:
make manpages
dh_installman $(CURDIR)/debian/tmp-man/*

Even though you haven't requested review yet, I took the liberty to merge your latest changes. :) Needed to man page creation, because that fixed a lintian warning, that otherwise would make builds from master inconvenient. Added a small fix to the man page on top to fix the last lintian warning:
https://github.com/Whonix/control-port-filter-python/commit/d6c62e9625ae41ceeb1e15f501b0513b1c6c9ca8

An issue. When running sudo -u debian-tor /usr/sbin/cpfpd start while /var/run/control-port-filter-python does not exist, cpfpd will exit 0. When it fails to write into /var/run/control-port-filter-python it would be great to have it exit non-zero. Fixing that could help with debugging the systemd unit file.

Apart from that, running sudo -u debian-tor /usr/sbin/cpfpd start and sudo -u debian-tor /usr/sbin/cpfpd stop works for me.

As an example package we could look into a package from Debian: apt-get download bcfg2-server. It depends on python-daemon and comes with a systemd unit file.

sudo -u debian-tor /usr/sbin/cpfpd start and sudo -u debian-tor /usr/sbin/cpfpd stop works for me too. The problem is when the daemon is run with systemd.

sudo systemctl start control-port-filter-python works but you have to ctrl-c to get back the terminal. sudo systemctl stop control-port-filter-python also work, and here the the output of sudo systemctl status control-port-filter-python:

● control-port-filter-python.service - Control Port Filter Proxy
   Loaded: loaded (/lib/systemd/system/control-port-filter-python.service; enabled)
   Active: activating (start) since Thu 2015-05-14 19:59:31 UTC; 4s ago
  Process: 19753 ExecStartPre=/bin/chown --recursive debian-tor:debian-tor /var/log/%p.log (code=exited, status=0/SUCCESS)
  Process: 19750 ExecStartPre=/bin/touch /var/log/%p.log (code=exited, status=0/SUCCESS)
  Process: 19748 ExecStartPre=/bin/chown --recursive debian-tor:debian-tor /var/run/%p (code=exited, status=0/SUCCESS)
  Process: 19745 ExecStartPre=/bin/mkdir -p /run/%p (code=exited, status=0/SUCCESS)
  Process: 19741 ExecStartPre=/usr/lib/anon-shared-helper-scripts/torsocks-remove-ld-preload (code=exited, status=0/SUCCESS)
  Control: 19756 (cpfpd)
   CGroup: /system.slice/control-0.port-filter-python.service
           └─19756 /usr/bin/python /usr/sbin/cpfpd start

The service is activating instead of running, and is restarted regularly at the timeout interval set in the unit file. It looks like the process is not detached. Looking into the bcfg2-server code, there might be something useful in there (thanks for that one, by the way).

troubadour added a comment.EditedMay 14 2015, 10:41 PM

In the unit file, changing Type = forking to Type = simple solves this issue.
https://github.com/troubadoour/control-port-filter-python/commit/9f80d19f778a5d6d047b1fe24212905f8b003c28

Nice. Merged. Glad you figured out! Works well. I'll add some commits on the systemd unit file on top.

systemd unit: removed "ExecStartPre = /usr/lib/anon-shared-helper-scripts/torsocks-remove-ld-preload" - Doesn't work with systemd and also not required. - https://phabricator.whonix.org/T274:
https://github.com/Whonix/control-port-filter-python/commit/6f6734588381a4129e4209b4bee86ba5e1fe0bdb

systemd unit file: replaced ExecStartPre touch/mkdir stuff with usr/lib/tmpfiles.d mechanism - https://phabricator.whonix.org/T274:
https://github.com/Whonix/control-port-filter-python/commit/78c0a49b0450152daee1446892d4de1607525e4c

systemd unit: added 'SuccessExitStatus = 143' so when systemd stopped the service it will consider it terminated with status success rather than failure - https://phabricator.whonix.org/T274:
https://github.com/Whonix/control-port-filter-python/commit/11e2cb5fc61a341ec8eff8bbb8086b577e6387b6

Patrick changed the task status from Open to Review.May 15 2015, 1:27 AM

Anything left to do here?

Well, you could go for perfection. Implement sd_notify. Some pointers (dunno if they lead somewhere).

But from my perspective for now, that would be just for the sake of it. For learning. As an intellectual challenge. I don't see an actual issue that would be solved. Perhaps @nrgaway has some argument in favor of sd_notify? We would open a follow up task if needed.

Otherwise, unless I missed something or you think something... This should be closeable or moveable to Whonix 12?

sd_notify would be used if you want to have watchdog monitor the service. If the service does not respond within X amount of seconds (IE did not send a notification it was still active within this time period), then systemd would then restart the service.

Since this seems to be a critical service, I would suggest implementing it for Whonix 11.

I leave that up to @troubadour if he even wants to implement this and @Patrick to decide the level of importance.

I am also assuming Whonix 11 is at least more than 4-6 weeks away from release. If you have a shorter release frame in mind, please let me know as I would need to shift my coding efforts around. My goal was to complete build of Whonix 11 this weekend; then shift to Qubes Pre-configuration for a few weeks, then back to streamlining the Whonix 11 build

Difficult question. I am debating with myself. The normal case should be no daemons crashing. If they do crash. something is very wrong that we should apply a-real-fix (tm) to, no? On the other hand, If the service was crashing due to some hypothetical parsing issue, I would actually prefer the affected user noticing and reporting an issue. An automatically restarted service with no one noticing because of the auto restart would be bad. The daemon is very stable at this preset. How much code would be added by implementing the sd_notify protocol? A reason against would be if it's getting complexer so we actually understand it less. If it's just a few lines, that doesn't apply.

I am putting sd_notify in the pipeline, without guarantee that it will be completed before Whonix 11 release.

On the other hand, systemd restarts the service after`TimeoutSec` if it does not receive a proper notification. It happens in one situation at least, when using Type = forking in the unit file (cpfpd is running in that configuration, but systemd does not seem aware of it).

As Patrick suggests, we could move it to Whonix 12 and open a follow up task when required.

Moved import gevent where it should be (beginning of the script).
https://github.com/troubadoour/control-port-filter-python/commit/a608e24cf84978b0e9ba43f2b9ddd060b0b6e7fa

That should be it for the daemon part. There will be some PEP8 / pylint compliance adjustments, will use T243.

In T274#4522, @Patrick wrote:

Difficult question. I am debating with myself. The normal case should be no daemons crashing. If they do crash. something is very wrong that we should apply a-real-fix (tm) to, no? On the other hand, If the service was crashing due to some hypothetical parsing issue, I would actually prefer the affected user noticing and reporting an issue. An automatically restarted service with no one noticing because of the auto restart would be bad. The daemon is very stable at this preset. How much code would be added by implementing the sd_notify protocol? A reason against would be if it's getting complexer so we actually understand it less. If it's just a few lines, that doesn't apply.

Daemons can crash for many reasons, many of which have nothing to do with the daemon itself. There are kernel issues, memory, cpu, races, so many things that are not in a daemons control and that's why watchdog exists

Having consensus on that one... It might follow... systemd 'Restart=' and sd_notify for sdwdate and whonixcheck: T311

Patrick renamed this task from control-port-filter-proxy better daemon support to control-port-filter-proxy better daemon support, sd_notify support.May 23 2015, 6:43 PM
Patrick updated the task description. (Show Details)
Patrick changed the task status from Review to Open.
Patrick edited projects, added Whonix 12; removed Whonix 11.
Patrick removed troubadour as the assignee of this task.Sep 23 2016, 1:58 AM
Patrick updated the task description. (Show Details)
Patrick updated the task description. (Show Details)Sep 23 2016, 2:07 AM
Patrick renamed this task from control-port-filter-proxy better daemon support, sd_notify support to control-port-filter-proxy sd_notify support.Sep 23 2016, 2:10 AM
Patrick updated the task description. (Show Details)
Patrick updated the task description. (Show Details)Sep 23 2016, 9:55 PM
Patrick updated the task description. (Show Details)Sep 24 2016, 3:07 PM
Patrick updated the task description. (Show Details)Feb 15 2017, 3:16 PM
Patrick updated the task description. (Show Details)
Patrick updated the task description. (Show Details)
Patrick updated the task description. (Show Details)Feb 15 2017, 3:19 PM
Patrick updated the task description. (Show Details)Feb 15 2017, 3:31 PM
Patrick assigned this task to joysn1980.Feb 21 2017, 7:00 PM
Patrick changed the task status from Open to Review.EditedFeb 21 2017, 7:10 PM

Great work by @joysn1980 with the initial implementation.

Merged. And added a few minor commits on top.



Tests:

"sudo service tor-controlport-filter status" contains

Status: "Serving Thread started"
Status: "Serving Thread active. count: 12 / 10000"
Status: "Serving Thread active. count: 55 / 10000"
etc.

Now, if n.notify("WATCHDOG=1") gets commented out (to simulate what would happen), systemd correctly detects the stale daemon and restarts it.

Feb 21 17:48:26 host systemd[1]: Started Tor control port filter proxy.
Feb 21 17:48:36 host systemd[1]: tor-controlport-filter.service: Watchdog timeout (limit 10s)!
Feb 21 17:48:36 host systemd[1]: tor-controlport-filter.service: Killing process 5419 (tor-controlport) with signal SIGABRT.
Feb 21 17:48:36 host systemd[1]: tor-controlport-filter.service: Main process exited, code=killed, status=6/ABRT
Feb 21 17:48:36 host systemd[1]: tor-controlport-filter.service: Unit entered failed state.
Feb 21 17:48:36 host systemd[1]: tor-controlport-filter.service: Failed with result 'watchdog'.
Feb 21 17:48:37 host systemd[1]: tor-controlport-filter.service: Service hold-off time over, scheduling restart.
Feb 21 17:48:37 host systemd[1]: Stopped Tor control port filter proxy.
Feb 21 17:48:37 host systemd[1]: Starting Tor control port filter proxy...

Similarly, if n.notify("READY=1") gets commented out to simulate a daemon that fails to start, systemd rightly never considers the daemon started and tries starting it again.


I am not sure if it was the best idea to move n.notify("READY=1") to def service_actions(self):. By adding n.notify("READY=1") just before server.serve_forever() I was not sure if a race condition might happen. Currently not the very most pretty solution to keep calling n.notify("READY=1") but it works great.

Also arbitrarily chosen 10 for WatchdogSec=10. No idea if that is a good value or if a lower/higher one should be used.

tor-controlport-filter has been renamed to onion-grater by upstream Tails.

https://git-tails.immerda.ch/onion-grater
(web interface is broken, but git clone works)


https://github.com/Whonix/onion-grater


sd-notify patch pull request sent to upstream Tails:
https://mailman.boum.org/pipermail/tails-dev/2017-March/011327.html

Patrick closed this task as Resolved.Mar 7 2018, 1:19 AM