Page MenuHomePhabricator

url_to_unixtime - extract time stamps from http headers
Closed, ResolvedPublic

Description

Requirements:

  • Not using libcurl (because that's a C binding, then we'd have no security enhancement and could just stick to curl).
  • force use TLSv1, not SSL
  • Download head only. (Similar to curls --head.)
  • Features. With example.
/usr/lib/sdwdate/url_to_unixtime \
   --max-time 180 \
   --socks5-hostname 10.152.152.10:9108 \
   --tls true \
   https://check.torproject.org

Expected output, unixtime, example:
1413814230


Bonus:

--max-file-size-bytes 2097152
--user-agent
--verbose

SSL:
Depending on the outcome of this we might not need SSL support.

date_to_unixtime:
The code for date to unixtime is already done:
https://github.com/Whonix/sdwdate/blob/master/usr/lib/sdwdate/date_to_unixtime

python-requests:
Implementing this using the [python-requests](http://docs.python-requests.org) library was trivial, but unfortunately, python-requests does not support socks proxies yet, which is a deal breaker for Whonix.

urllib3:
Has no yet socks proxy support either.

TODO:
So we have to find some python library that has socks proxy as well as TLSv1 support, that is installable from Debian repository. Does this exist?

Event Timeline

Patrick raised the priority of this task from to Normal.
Patrick updated the task description. (Show Details)
Patrick added subscribers: Patrick, troubadour, HulaHoop.

Might be some progress on this one. It's not really my domain of competence, so...

In Whonix-Gateway, install python-socksipy python-openssl packages.

Try this script.

#!/usr/bin/python

import socks, ssl

s = socks.socksocket()
s.setproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", port=9050)
s.connect(('check.torproject.org', 443))
ss = ssl.wrap_socket(s)
print ss.cipher()
s.close()
ss.close()

s = socks.socksocket()
s.setproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", port=9050)
s.connect(('whonix.org', 443))
ss = ssl.wrap_socket(s)
print ss.cipher()
s.close()
ss.close()

If that looks sound,we'll have to find the way to get the time from the header. That must be possible, because the shell openssl command returns (in Whonix-Workstation):

$ openssl s_client -connect whonix.org:443     
CONNECTED(00000003)
depth=1 C = FR, O = GANDI SAS, CN = Gandi Standard SSL CA
verify error:num=20:unable to get local issuer certificate
verify return:0
--
Certificate chain
0 s:/OU=Domain Control Validated/OU=Gandi Standard SSL/CN=whonix.org
  i:/C=FR/O=GANDI SAS/CN=Gandi Standard SSL CA
1 s:/C=FR/O=GANDI SAS/CN=Gandi Standard SSL CA
  i:/C=US/ST=UT/L=Salt Lake City/O=The USERTRUST Network/OU=http://www.usertrust.com   
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIE1zCCA7+gAwIBAgIRAKuuBHjHjvVq22Vahgk4QZYwDQYJKoZIhvcNAQEFBQAw
QTELMAkGA1UEBhMCRlIxEjAQBgNVBAoTCUdBTkRJIFNBUzEeMBwGA1UEAxMVR2Fu
~~
A1UW8F3H49PDn/FmBM0qOXhiWY9O0wcyZcOVUiBkw6Phq163lqkeleDlqA==
-----END CERTIFICATE-----
subject=/OU=Domain Control Validated/OU=Gandi Standard SSL/CN=whonix.org
issuer=/C=FR/O=GANDI SAS/CN=Gandi Standard SSL CA
---
No client certificate CA names sent
---
SSL handshake has read 3128 bytes and written 431 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
~~
    Start Time: 1422055124
    Timeout   : 300 (sec)
    Verify return code: 20 (unable to get local issuer certificate)
---
closed

If we pipe a command in it

$ ls | openssl s_client -connect whonix.org:443

the connection closes immediately (no 10~20 seconds waiting).

~~
  Start Time: 1422055553
  Timeout   : 300 (sec)
  Verify return code: 20 (unable to get local issuer certificate)
---
DONE

Will try soon.

If that looks sound,we'll have to find the way to get the time from the header. That must be possible, because the shell openssl command returns (in Whonix-Workstation):

We don't want to extract from SSL. (Because likely unreliable in long term. + Doesn't work for .onion domains.) We want to extract time from http headers.

Similar to this.

curl --head check.torproject.org
curl --silent --head check.torproject.org | grep "Date:"

Try this script.

Does what I thought it does. :)

I was mislead by this:

So we have to find some python library that has socks proxy as well as TLSv1 support,

Perhaps this is more acceptable (actually simpler, since we can use the socket directly).

Script 'socks_socket.py'.

#!/usr/bin/python

import sys, socks

site = sys.argv[1]

s = socks.socksocket()
s.setproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9050)
s.connect((site, 80))
s.send('HEAD / HTTP/1.1\r\n\r\n')

data = ''
buf = s.recv(1024)
while len(buf):
    data += buf
    buf = s.recv(1024)
s.close()

print('HTTP header from "%s:":\n\n%s' % (site, data))

These commands have ((almost) the same output as curl.

python socks_socket.py "check.torproject.org"
python socks_socket.py "check.torproject.org" | grep "Date:"

It's for example (no error handling, time out...). If you try 'google.com', it hangs indefinitely.

The example above supports socks proxies, but for TLSv1, it looks like we would need python-socksipychain, which is available in jessie, not in wheezy.

We switch to jessie sooner or later anyhow. So that should not be considered a blocker.

If it helps a lot to leave out timeout, then leave out timeout. ;) Because, after thinking again, timeout isn't that important inside the python script. The python script can be called using timeout, which is apparently a reliable tool to enforce timeouts from "the outside" ("from bash"). Maybe that's easiest/best and should be done in any case anyhow.

Do you think you can re-use python-request's code to parse the http header? I.e. to extract the date field from the http header.

date = response.headers.get('date')

Similar to:
https://github.com/Whonix/sdwdate/blob/5a6c1729df8caaaddcabc8caeb2108faca199ca7/usr/lib/sdwdate/url_to_unixtime#L54

Good news: the last script you posted works while transparent dns and transparent tcp is disabled! So stream isolation is apparently functional.

Curl gets a HTTP/1.1 302 Found response while the script gets a HTTP/1.1 400 Bad Request response.

We switch to jessie sooner or later anyhow. So that should not be considered a blocker.

Yes, I have donwloaded the package from Debian (not available from backports), checked with sha256sum. Currently testing, working even if still rather obscure.

Regarding timeout, if the script hangs, sdwdate won't be able to call it again. Or is the external timeoutmanaging the process?

Do you think you can re-use python-request's code to parse the http header? I.e. to extract the date field from the http header.

That's the next logical step. The script could take the URL as argument and returns unixtime.

Curl gets a HTTP/1.1 302 Found response while the script gets a HTTP/1.1 400 Bad Request response.

Yes, and the location field is missing with the script too. Is that a problem?

timeout is on line to add;

socket.settimeout(10[orso])

Yes, the external timeout command is managing the process. It sends signal SIGTERM after a configurable amount of time, optionally followed (using this) of signal SIGKILL after another configurable amount of time. See also:
http://manpages.debian.org/cgi-bin/man.cgi?&query=timeout
It's a GNU coreutil and quite old already. Expected to be very reliable.

The timeout would need to be configurable by commandline if you wish to add it.

Is that a problem?

Not sure. Better to look similar to curl, similar to a usual request. So sysadmins don't get afraid and add some magic to block it.

It's not easy to handle all the exceptions in python, and the script may hang on other errors than timeout. So it looks actually better to manage it externally. Conclusion:

Maybe that's easiest/best and should be done in any case anyhow.

Yes, most likely.

Before starting the next stage (on github, I guess), an update to the script.

#!/usr/bin/python

import sys, socks

site = sys.argv[1]

s = socks.socksocket()
s.setproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9050)

try:
    s.connect((site, 80))
except IOError as e:
    print e
    sys.exit(1)
  
s.send('HEAD / HTTP/1.0\r\n\r\n')

data = ''
buf = s.recv(1024)
while len(buf):
    data += buf
    buf = s.recv(1024)
s.close()

print('HTTP header from "%s:":\n\n%s' % (site, data))

Setting aside the exceptions handling, the significant change is the request with 'HTTP/1.0' instead of 'HTTP/1.1'. check.torproject.org returns '200 OK', which looks like an improvement, less likely to be tagged by sysadmins.

Best answer is "very same as a mainstream browser as possible", but that is kinda impossible. Realistic answer is "similar like curl or wget".

Getting this.

HTTP header from "check.torproject.org:":

HTTP/1.1 504 Gateway Time-out
Date: Sun, 25 Jan 2015 16:15:39 GMT
Connection: close

It's kinda impossible too to have a reply consistently similar to curl or wget. Depending on the URL, It can be identical, differ by one line, or the the reply be different ("200 OK" instead of "302 Found").

Have integrated date_to_unixtime.

#lang=
#!/usr/bin/python

import sys, socks
from dateutil.parser import parse

try:
    socket_ip =sys.argv[1]
    socket_port = int(sys.argv[2])
    url = sys.argv[3]
except IndexError as e:
   print >> sys.stderr, "Parsing command line parameter failed. | e: %s" % (e)
   sys.exit(1)

s = socks.socksocket()
s.setproxy(socks.PROXY_TYPE_SOCKS5, socket_ip, socket_port)

try:
    s.connect((url, 80))
except IOError as e:
    print >> sys.stderr, e
    sys.exit(1)
  
s.send('HEAD / HTTP/1.0\r\n\r\n')

data = ''
buf = s.recv(1024)
while len(buf):
    data += buf
    buf = s.recv(1024)
s.close()

date = ''
date_pos = data.find('Date:') + 6
date = data[date_pos:date_pos + 30].strip()
if date == '':
   print >> sys.stderr, 'Parsing HTTP header date failed.'
   sys.exit(2)

try:
    ## Thanks to:
    ## eumiro
    ## http://stackoverflow.com/a/3894047/2605155
    unixtime = parse(date).strftime('%s')
except ValueError as e:
   print >> sys.stderr, ('Parsing date from server failed. | date: %s \
			 | dateutil ValueError: %s' % (date, e))
   sys.exit(3)

#print data
#print date
print "%s" % unixtime

Example:

python url_to_unixtime.py 127.0.0.1 9100 whonix.org

If this is OK, is it necessary to create a separate package? It could be bundled in sdwdate.

Please remove the #lang= (unless that's good for something) and the trailing spaces.

Yeah. Unless we get a feature requests to maintain this in a separate repository, it's fine to keep this in the sdwdate package. /usr/lib/sdwdate/url_to_unixtime?

The #lang= was for syntax highlighting.

Tested with the different pools from /usr/bin/sdwdate. The replies range from 200 OK to 403 Forbidden, which looks OK.

Do you want me to add it to sdwdate?

Some more sanity testing, done:
check if date conversion result is numeric

Done:
check that string length is no bigger than 10

I guess 10 suffices...

date --date @1999999999
Wed May 18 03:33:19 UTC 2033

Just noticed, that file needs a license header. Could you add it please?

Do my changes look good so far?

Can you make data.find case insensitive please? Some servers in the wild indeed used date:. We could use tolower, but I don't know what provides best performance. (Imagine faulty or even malicious replies.) IF that makes sense at all.

And would it make sense to check the return code of data.find? To abort if it didn't find such a line? I am trying to imagine all sorts of invalid input.

Also would it make sense to check if the size of data is reasonable before processing it further? (Having performance in mind here again if a server replies a super long string so we would abort earlier and waste less processing power.)

All the checks should be in the last two commit.

  • max data length = 1024
  • data.findsarch for uppersace and lowercase in the header, return an error if not found.
  • date string length: max length = date string length, min length = max length, return an error if too short.
  • your unixtime sanity checks (minor modification).
  • extra: time offset: check local time/HTTP header time difference, max offset value hard-coded, return an error if outside.

Not sure the last check is relevant, as I have to check which local time python is returning.

Not sure the last check is relevant, as I have to check which local time python is returning.

The last check is overkill. That's something sdwdate should do itself.

http_time = http_time(data)

I am not much of a python coder, but is it a good or common way to have a variable that has the same name as a function?

## "Date:" not found.
print >> sys.stderr, 'Parsing HTTP header date failed.'

Does speak anything against writing data to stderr? In case this happens at some time somewhere and some user reports it, then it would be interesting to see what went wrong instead of needing to add more debug code by then.

if unixtime_sanity_check(unixtime_http):

Wondering if we could simplify the code by either using if not ... (non-ideal) take make the indent shorter or to just run unixtime_sanity_check(unixtime_http) and leave it there - because that function is supposed to exit anyhow if some sanity check goes wrong. (You did this that way using function http_time already.)

Let's say the ptrevious commit was a draft.
In https://github.com/troubadoour/sdwdate/commit/315728bfff5e56055c4927a11c0188d35f402e77

  • offset check removed.
  • three separate sanity check functions (received data, http time, unix time). No if or if not left in the main body of the program (makes sense, thanks).
  • write "data" to stderr if header parsing error.

I am going to add some refactoring on top.

Done for now. Please review and merge.

Manually tested all the functions with bogus input to see if they correct report and exit if something goes wrong.

One exception remains.

./usr/lib/sdwdate/url_to_unixtime 127.0.0.1 9050 "nonexisting"
Traceback (most recent call last):
  File "./usr/lib/sdwdate/url_to_unixtime", line 81, in <module>
    s.connect((url, 80))
  File "/usr/lib/python2.7/dist-packages/socks.py", line 369, in connect
    self.__negotiatesocks5(destpair[0],destpair[1])
  File "/usr/lib/python2.7/dist-packages/socks.py", line 236, in __negotiatesocks5
    raise Socks5Error(ord(resp[1]),_generalerrors[ord(resp[1])])
TypeError: __init__() takes exactly 2 arguments (3 given)

Can you look into it please?

Solved in https://github.com/troubadoour/sdwdate/commit/3b48433bdf4bcddcf4009ca242408bd32e8088f3.

It returns a pure python exception from socks.py ": __init__() takes exactly 2 arguments (3 given)" which seems to mean that the the url was not found. Perhaps we could translate it in something more meaningful.

Also, change main_function() to main(). More pythonic (if we ever want to import the script, main() is required.)

Merged.

Okay. Because that error message sounds more like a python syntax error than url not found.

I would also like to make the port configurable (easy, can do).

Have you experience creating unit tests for python scripts yet? While I am at it, now that everything is inside functions I would be motivated to write unit test that test the functions using valid and invalid input to see if they output as expected. If you get that started, I could finish it. Or you do it. And if you don't know, I can also research the whole thing. I don't really mind.

Done the translation in https://github.com/troubadoour/sdwdate/commit/c34a9332b2fcec89d4d08910e3be05f649b6d919.

I have never created uni tests. A quick research tells me that it might be a bit late for that script (using python-test or mock, to name a few). It makes sense to create the unit test BEFORE starting to write the code.

url-to-unixtime has been extensively tested for good and bad input, first by me, then you (spotting the bad URL). And I made another round of bogus inputs before pushing the last commit.

Forgot. I saw the ad on sourceforge for creating a test suite with cucumber. I guess it could be used for any language (bash, python).

Test suite cocumber is different. It is more like a simulated user who boots up the system, clicks here and there, runs several tests and checks the results. It's more for testing the interaction with the whole system. For example, see:
https://en.wikipedia.org/wiki/Cucumber_%28software%29

Unit tests work on a lower level, testing function in/output alike. Useful when functions are later changed/refactored/whatever to see if they still work as expected. I agree it's not super important, just nice to have unit tests for this.

One isn't supposed to be a replacement for the other. A unit test is more for us devs who have enough ram, dependencies installed and so forth checking functions. A test suite is more testing if the whole thing is functional or if some interaction breaks it.

  • Made remote port configurable.
  • Added usage example comment on top.

I am wondering, if these comments could be improved?

## max accepted string length.
http_time = data[date_string_start_position:date_string_start_position + 29].strip()
## min string length = max string length.
if http_time_string_length < 29:

Added verbose output mode. In verbose mode, variables will be written to stderr.

Usage (also as per comment at the top of the script):

usr/lib/sdwdate/url_to_unixtime 127.0.0.1 9050 check.torproject.org 80 true

This is useful to make the script show what the server replied, and if the conversion back and forth is fully functional.

date --date "@$(usr/lib/sdwdate/url_to_unixtime 127.0.0.1 9050 check.torproject.org 80 true)"
data: HTTP/1.1 200 OK
Date: Mon, 02 Feb 2015 19:56:01 GMT
Server: Apache
Last-Modified: Thu, 23 Feb 2012 18:45:14 GMT
ETag: "211-1e8-4b9a60b6ecade"
Accept-Ranges: bytes
Content-Length: 488
Vary: Accept-Encoding
Connection: close
Content-Type: text/html
X-Pad: avoid browser bug


http_time: Mon, 02 Feb 2015 19:56:01 GMT
parsed_unixtime: 1422906961
Mon Feb  2 19:56:01 UTC 2015

The last line shows how date converted url_to_unixtime's stdout back to a human readable date. We see that the value of the Date: field exactly matches (besides formatting, but it's the unixtime that matters, because that will be used by sdwdate).

Created follow up tasks...

Hardening:

Implementation:

I think this can be considered resolved. Please feel free to reopen, if you disagree.

It returns a pure python exception from socks.py ": init() takes exactly 2 arguments (3 given)" which seems to mean that the the url was not found. Perhaps we could translate it in something more meaningful.

An update for this issue. In jessie, python-pysocks is an upgrade to python-socksipy, which fixes some bugs, amongst them this situation.

Examples, without code modification:

$ ./url_to_unixtime 127.0.0.1 9050 whonix.or 80 true
connect error: 0x04: Host unreachable
$ ./url_to_unixtime 127.0.0.1 9050 whonix+org 80 true
connect error: 0x01: General SOCKS server failure

That looks better.

Good catch. Although that change made it incompatible with wheezy. (Because python-pysocks is not available in wheezy.) Therefore I added a commit on top to make it compatible with wheezy and jessie:
https://github.com/Whonix/sdwdate/commit/b015142d64f31d849991d9d1794110baaab1c19b

At the current rate of progress, and the few left tickets, I think chances are good, that Whonix 10 will be ready before jessie becomes the new Debian stable. So Whonix 10 might still be a wheezy based release. (Although hopefully ready to be upgraded to jessie without hassle.) (The related, "maybe wait for jessie ticket": T24)

Patrick changed the task status from Open to Review.Feb 16 2015, 9:20 AM

Introduced a new status "review". Hope you like it. (T178)