Page MenuHomePhabricator

Tor systemd startup issue
Closed, ResolvedPublic

Description

In T233#3269, @Patrick wrote:
In T233#3268, @nrgaway wrote:

I have a problem on initial run of whonix-setup-wizard where Tor is not restarting. Will get back on that once I can find the problem.

I bumped into this issue when creating Debian jessie based Whonix builds. My speculation is, that it's a bug in Tor and/or Tor's systemd unit / sysvinit script. I think it only happens on first boot, so for debugging it would make sense to make a snapshot at the point where Tor gets reload etc. I am quite certain the issue is not directly caused by whonixsetup or whonix-setup-wizard. For debugging purposes, I think it would be easiest if the the Tor enable logic should be tested in a simple, minimal shell script. Scripts that can be manually run for testing:

Fixing the likely systemd related issue requires to make those scripts compatible with systemd.

Details

Impact
High

Event Timeline

nrgaway created this task.Mar 25 2015, 5:31 AM
nrgaway raised the priority of this task from to High.
nrgaway updated the task description. (Show Details)
nrgaway added projects: Qubes, Whonix 10.
nrgaway added subscribers: nrgaway, Patrick, WhonixQubes and 2 others.
Patrick updated the task description. (Show Details)Mar 25 2015, 3:10 PM
Patrick renamed this task from systemd compatibility to Tor systemd startup issue.Mar 25 2015, 3:28 PM
Patrick added a project: systemd.

After first start of Whonix-Gateway, just ignore whonix-setup-wizard and open a terminal.

Here are instructions on how to reproduce this issue. I am attempting to write generic instructions so the bug can be reproduced on Whonix-Gateway as well as on plain Debian jessie.

## 1) Stop Tor.
service tor stop

## 2) Append 'DisableNetwork 1' to /etc/tor/torrc.

## 3) Start Tor with Tor networking disabled.
service tor start

## 4) Check Tor's status.
service tor status

## 5) All fine until now.

## 6) Set 'DisableNetwork 0' in /etc/tor/torrc

## 7) Reload Tor to enable Tor networking.
service tor reload

## 8) Check Tor's status.
service tor status

## 9) Result:
## Tor service no longer running.
## Expected result:
## Tor running with networking enabled.

I was able to reproduce this issue multiple times in a row on a jessie [and therefore also systemd] based Whonix-Gateway 10.0.0.4.3-developers-only, but not yet on a plain Debian jessie [and therefore also systemd] based system.

@nrgaway, do you think you can get to the bottom of this one?

Yes, I created that python-sh package that I was going to use to write a shutdown, restart script; one you can use from the shell scripts as well.

I plan to also maybe write a systemd system unit file if non are available for tor.

The proper syntax for systemd status/start/stop/restart is something like this:
systemctl start tor
systemctl isactive tor

It would be trivial to use restart rather than reload in existing scripts. That would fix this issue. But it comes with disadvantages. For one, this obscure Whonix-only bug would remain.

Using reload rather than restart has advantages.

  • reload is more correct
  • given us a chance to understand this obscure Whonix-only bug
  • faster (think: a zillion of hidden services and/or listeners in data centers)
  • doesn't needlessly restart Tor for users who re-run the wizard, i.e. doesn't interrupt active Tor internet and control port connections

The proper syntax for systemd status/start/stop/restart is something like this:

Yes, but it doesn't matter here. service seems to be fully compatible and doing the same. While using the systemctl equivalents, I can reproduce the same issue.

I plan to also maybe write a systemd system unit file if non are available for tor.

There is none in jessie or sid. Please submit to upstream, Debian and/or TPO. It ensures high quality and compatibility, reduces Whonix-only issues.

If you want to ship it before Debian stretch, the correct package to add it is:
https://github.com/Whonix/anon-gw-anonymizer-config

[Once that was done, I would add the proper config-package-dev config on top, so there won't be conflicts as soon as upstream ships the systemd unit.]

Maybe it's a sysvinit [script] issue and/or systemd compatibility issue. Maybe a systemd unit would make proper reload work.

Does this necessarily be finished for Whonix 10? @nrgaway

(Asking, because I would like to release Whonix 10 rather sooner than later, as per: https://www.whonix.org/forum/index.php/topic,1071.)

I was hoping so. Can you give me a day or 2 to see if I can fix it within that time frame?

Sorry, these messages sometime are ending up in spam folder.

No, I have been unable to complete testing. Last week I had a bad system crash and took 2 days to recover my file system. Since then my system has crashed 2 more times in similar fashion when attempting to do template build. I am in process to try to determine cause since SSD smartctl tests pass as well as scrubbing and balancing my file system.

I only have one functional computer to do these builds and can not afford another at the current moment.

I suppose the best thing for you is not to list this as a blocker I can always work around the issue, or we can fix it in 10.1 if appropriate.

Patrick edited projects, added qubes-whonix 10; removed Whonix 10.Apr 14 2015, 5:34 PM

Yes, this one really would be better off with a Qubes specific patch until Whonix 10.1 or Whonix 11.

I have written a tor systemd unit file that seems to be working. I have added it to the qubes-whonix package.
https://github.com/nrgaway/qubes-whonix/blob/master/etc/systemd/system/tor.service.

Systemd unit files should be more reliable when attempting to probe the status when using commands such as 'systemctl is-active tor', and 'systemctl is-enabled tor'.

I still need to test further since I do receive some network errors like sock connection refused. I am thinking it could be related to the watchdog setting. When the initial unit file looks stable I have also started on a hardened one https://github.com/nrgaway/qubes-whonix/blob/master/tests/wip/tor.service

At the same time I have also created a whonixcheck unit file which is not in use currently but did seem to work when testing with it. https://github.com/nrgaway/qubes-whonix/blob/master/tests/wip/whonixcheck.service. Main reason I am not using it is it had a status conditional that would return a different result based on a file being present or not and I am not sure if that is possible to do with systemd and may require a separate 'oneshot' unit file to report that? Anyway, we can discuss this further in a seperate issue as well if you want me to continue working on it.

I initially also glanced at the sdwdate startup script but would need more documentation as to the expected operation. If you want me to attempt to create the systemd unit file for that, just stat a new topic with some documentation of its use so we can discuss further.
Note, that both unit files have corresponding configuration files in /etc/tmpfiles.d directory to create the /var/run directories.

@MemoryLost Help is always welcome :) How would you like to help?

Yes, please help with systemd stuff.

Can you commit the whonixcheck and other systemd files to their corresponding packages please i.e. whonixcheck etc? Now, that Whonix 11 is jessie-only, that should be easier.

The Tor systemd file belong to:
https://github.com/Whonix/anon-gw-anonymizer-config

For sdwdate, also a standard systemd file suffices. Forget about the non-standard restartnd, restartndclean actions. This has to be somehow re-done in the timesync package itself.

Main reason I am not using it is it had a status conditional that would return a different result based on a file being present or not and I am not sure if that is possible to do with systemd and may require a separate 'oneshot' unit file to report that?

You most likely mean do_status from /etc/init.d/whonixcheck, /var/run/whonixcheck/whonixcheck_done. No. Just forget about this. What I cooked up there is fancy, non-standard and not what either sysvinit or systemd status command is for. A simple, standard conform service running or not thing would be sufficient. Nothing fancy.

The original issue of this ticket is solved for now as of 11.0.0.1.7-developers-only. The Tor daemon is automatically started on first boot.

Will leave this open until T316 is sorted out and then check again.

Patrick closed this task as Resolved.May 20 2015, 7:49 PM
Patrick claimed this task.

This is fixed, but there is a similar outstanding issue. Created T320 for it.