Page MenuHomePhabricator

sdwdate onions checker enhancements required
Closed, ResolvedPublic

Description

To be able to review T647, sdwdate's unit test needs to be improved.

https://github.com/Whonix/sdwdate/blob/master/usr/share/sdwdate/unit_test

  • don't fetch all onions at once (because now we have so many that the connection might not succeeded due to overloading Tor or the local internet connection)
  • add human readable date to the output or diff to local clock (to check if they are within +/- 1 second accuracy)
  • add url comment to output (to be able to easily verify all the proofs)
  • rename onion tester
  • check for duplicates
  • parse urls with # in url

Details

Impact
Normal

Event Timeline

Patrick created this task.Mar 15 2017, 2:35 PM

maybe curl and grep the addresses in comments for a quick scan of onion address?

don't fetch all onions at once (because now we have so many that the connection might not succeeded due to overloading Tor or the local internet connection)

You want this to always be true (and not just with the test), correct? The code for that is here:
https://github.com/Whonix/sdwdate/blob/master/usr/lib/python3/dist-packages/sdwdate/remote_times.py
I don't know python well, but it appears that all processes for each url are started at the same time.

add human readable date to the output or diff to local clock (to check if they are within +/- 1 second accuracy)

You want the unix epoch timestamp (1470432423) to be changed everywhere or just the test? Both times or just utc? I believe the code is here:
https://github.com/Whonix/sdwdate/blob/master/usr/lib/python3/dist-packages/sdwdate/remote_times.py#L59
If you want UTC time stamps:

import datetime

And changing that line I linked to:

time_stamp = datetime.datetime.utcfromtimestamp(int(reply.strip())).strftime('%Y-%m-%d %H:%M:%S')
unix_times.append.time_stamp

Using seconds is also trivial. Just tell me what you want it to look like.

add url comment to output (to be able to easily verify all the proofs)

Doesn't the output say "url blahablahblah.onion" ?

don't fetch all onions at once (because now we have so many that the connection might not succeeded due to overloading Tor or the local internet connection)

You want this to always be true (and not just with the test), correct?

No, not correct.

The code for that is here:
https://github.com/Whonix/sdwdate/blob/master/usr/lib/python3/dist-packages/sdwdate/remote_times.py
I don't know python well, but it appears that all processes for each url are started at the same time.

That's good as is.

Fetching 3 at one in sdwdate is fine.

Fetching 50+ at once in sdwdate unit test is a broken test.

This test is supposed to be called sdwdate onion checker.

add human readable date to the output or diff to local clock (to check if they are within +/- 1 second accuracy)

You want the unix epoch timestamp (1470432423) to be changed everywhere or just the test?

The test needs to output UTC time stamps as well. Otherwise the human eye cannot parse the test.

Both times or just utc?

Good question. Probably unixtime don't hurt. Good to have. Does not need too much space in the unit test output either.

Just tell me what you want it to look like.

The current output of sdwdate (when I write sdwdate singular I mean like sdwdate running on users systems) is good as is. Any improvement suggestions should go to separate tickets.

add url comment to output (to be able to easily verify all the proofs)

Doesn't the output say "url blahablahblah.onion" ?

It shows blahablahblah.onion but it does not show the clearnet url. Should show it. Probably the only/main file to be modified:

https://github.com/Whonix/sdwdate/blob/master/usr/share/sdwdate/unit_test

Fetching 50+ at once in sdwdate unit test is a broken test.

Fixed. https://github.com/Whonix/sdwdate/commit/d76d95ecf8713be09ee68753c47b9529fe45a56d
The proper solution would be to change the get_remote method to throttle itself. But my cludge works fine for the test.

The test needs to output UTC time stamps

Done. https://github.com/Whonix/sdwdate/commit/b52e260187829480cc8c317785ac9c6bfab10347

does not show the clearnet url. Should show it.

If I understand it right, first config.py needs to be changed to return not just the onion string, but a tuple (or two item list) with the clearnet and onion. Then remote_times.py/get_remote needs to change "remotes[i]" to "remotes[i[0]]" (or something like that) when it needs the onion address. (The remotes[i] lines that are used to display the text of the url can stay the same or maybe need to be formatted to look better).

Please undo https://github.com/Whonix/sdwdate/commit/b52e260187829480cc8c317785ac9c6bfab10347. datetime doesn't belong there. Likely causes a python exception for invalid responses from remote servers.

To review T647 we also need to show the full url comment. But maybe it's better to have a different small script to aid/semi-automate checking that.

JasonJAyalaP added a comment.EditedJun 16 2017, 9:54 PM

Damnit. Sorry. I missed a closing )

Oh i see. There's an abort when socks returns a weird message. The response can't be formatted. Ok let me work on it.

I *think* I fixed it, but it's hard to replicate the error (I've been running unit test over and over).

https://github.com/Whonix/sdwdate/commit/cd2234332234a33a396afcb81a264124acd521ad

Do you want me to rename the file to "onions_tester" or something like that?

No.

(T648#13693 still applies.)

Using the newer 30_default, I found a parsing error that causes the error I was looking for.

https://github.com/anonmos1/sdwdate/blob/94c3ff5e5cdd7393501d3ddb7ce9905400dd92d9/etc/sdwdate.d/30_default.conf#L73

The comment has a # in it (because the url is using a #), which makes sdwdate think it's part of the url.

Now properly handled by the tester:
https://github.com/Whonix/sdwdate/blob/master/usr/lib/python3/dist-packages/sdwdate/remote_times.py#L60

anonymous1 added a comment.EditedJun 18 2017, 7:57 AM

The list should be checked for duplicate addresses too

There were 2 duplicates in the current list:

w6csjytbrl273che.onion#Ljost
w6csjytbrl273che.onion#Filtrala

5r4bjnjug3apqdii.onion#Irpileaks
5r4bjnjug3apqdii.onion#ExpoLeaks

+/- difference

https://github.com/Whonix/sdwdate/commit/e04099d76af709900983d09e7097809846d86603

onion_tester

https://github.com/Whonix/sdwdate/commit/f099c069e12618422622d49a5a7ba05f9959e91d

The comment has a # in it (because the url is using a #), which makes sdwdate think it's part of the url.

I need your experience here, Patrick. The problem appears to be with your regex here:
https://github.com/Whonix/sdwdate/blob/master/usr/lib/python3/dist-packages/sdwdate/config.py#L31
I think you're including everything on the line up to the last first #. Instead, search from the the end (^ i think) to the first # encountered

The list should be checked for duplicate addresses too

Done.

JasonJAyalaP updated the task description. (Show Details)Jun 20 2017, 4:48 AM
JasonJAyalaP added a comment.EditedJun 20 2017, 6:43 AM

At the end of onion_tester, it will report all errors (duplicates, timeouts, bad responses):
https://github.com/Whonix/sdwdate/commit/cad2c1b7d1b5476a55619865b4f25bb588875a2e#diff-e68acbfa3dee096d2c84e3cfe6773f30R59

JasonJAyalaP added a comment.EditedJun 20 2017, 7:21 AM

I think you're including everything on the line up to the last first #. Instead, search from the the end (^ i think) to the first # encountered

No. From end of the line to the *last* # encountered.

Or we can just manually remove all #s from comments. In that case, I think you (Patrick) can close this ticket. They *might* be necessary in some onion urls, but not in any we've used so far (i think).

Please undo modifications to remote_times.py. It doesn't belong there. And some input values could lead to python exceptions (think: malicious servers).

The full parsing of get_time_from_servers happens in check_remote.

https://github.com/Whonix/sdwdate/blob/ca3ffa3d8559bcfaf35f5d294bd933cea20ecaef/usr/bin/sdwdate#L130-L215

It should not be done in remote_times.py because the full checking is done in check_remote so it can be properly logged in live sdwdate on user's machines.

Also linguistically / clean code wise get_time_from_servers should not do checking for remote server replies.

The try / except is crucial, otherwise python exceptions could happen (which might crash sdwdate or at least that iteration without any useful log output, which then is a nightmare with incoming bug reports).

Perhaps check_remote has to be refactored / moved elsewhere so it can be reused from the unit test.

The problem is, we don't have unit testing yet. One needed unit test would be feeding sdwdate with replies from very bad servers. Or at least simulate that by feeding the functions in question with these values. Things such as excessive string length, spaces, special characters, very small numbers, astronomically big numbers and so forth. These tests were done manually by me in past. So until we have unit testing coverage I don't want to mess with remote_times.py in major ways (such as doing the checking there) because it could lead to python exceptions.

Currently the pool syntax is simple:

"url.onion[:port]#comment"

Or for grouped ones:

[
"url.onion[:port]#comment"
"[url.onion[:port]#comment]"
"[url.onion[:port]#comment]"
[...]
]

So currently we are using the first # we encounter in a line a the separator of the url.onion:port from the url comment. So you could python split() with split char # and then use the first item of the resulting list. Then assign all but the first item as comment.

There is a syntax error. if error -> if error:.

sdwdate in current git master as of e9733e1d0f0257c4e2ddab05fd16f856419ecfb1 is broken. sdwdate no longer sets the time because check_remote always thinks the remote is False.

Please undo modifications to remote_times.py. It doesn't belong there.

Wait. The extra message info doesn't belong there?

timestamp = int(msg)
now       = int(time.time())
diff      = timestamp - now
date      = datetime.datetime.utcfromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
output = date + " / " + str(timestamp) + " difference: " + str(diff)

And when do we check time.now to compare it to the received time? Seconds later in another function?

And some input values could lead to python exceptions (think: malicious servers).
The full parsing of get_time_from_servers happens in check_remote.

check_remote always thinks the remote is False.

Because it only expects the server response without my wonderfully formatted output.

I have to at least return (remote_timestamp, now_timestamp) from get_time_from_servers unless you want the diff to be always wrong by some small amount.

if self.check_remote(self.urls[i], self.returned_values[i], self.get_comment(self.urls[i]), i):

would be something like

if self.check_remote(self.urls[i], self.returned_values[i[0]], self.get_comment(self.urls[i]), i):

And the output formatting has to happen somewhere else.

Perhaps check_remote has to be refactored / moved elsewhere so it can be reused from the unit test.

I think it can work as it is. We just need to pass an accurate time.now through it, and copy+paste my formatting somewhere else.

Is my formatting only needed in unit test? Then I can put it there.

There is a syntax error. if error -> if error:.

Ugh. I have problems getting my setup right. Still fooling around with seamless filetransfer between my gateway test and my dev machine.

The problem is, we don't have unit testing yet.

I'd love to implement this and a whole lot more. I'd like to do some major refactoring of sdwdate. Right now we're passing around separate lists and reconstructing them as needed, hoping they're in sync. It's a large bash script. If we asked someone to audit it, they'd kick me in the nuts. We need objects and data structures that model (native to python) what we're working with. Maybe we can do this next time I'm in the EU? We'll worry about it later. Just dreaming.

Patrick removed Patrick as the assignee of this task.Jun 23 2017, 6:06 PM
JasonJAyalaP closed this task as Resolved.Jun 24 2017, 11:51 PM
JasonJAyalaP assigned this task to Patrick.

Ok. Everything fixed. Onion_checker diff time will be off by a number of sections, but good enough for now. All the enhancements are there.