Wednesday, August 7, 2013

Pythonect Has New Graphs, Documentation, Tutorial, and More!

About two weeks ago I have released a new version of Pythonect (0.6) with new features, documentation, tutorial, and an (small, but growing) example directory.
I’d like to take this opportunity to discuss the past, present and future of the Pythonect Project.

Nearly 2 years ago I started working on Pythonect with the intention to help software developers to connect the dots and make mashup, rapid prototyping, and developing scalable distributed applications easy. Pythonect is a new, experimental, general-purpose dataflow programming language based on Python. It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python. Pythonect interpreter (and reference implementation) is a free and open source software written completely in Python, and is available under the BSD 3-Clause license.

Why Pythonect? Pythonect, being a dataflow programming language, treats data as something that originates from a source, flows through a number of processing components, and arrives at some final destination. As such, it is most suitable for creating applications that are themselves focused on the "flow" of data. Perhaps the most readily available example of a dataflow-oriented applications comes from the realm of real-time signal processing, e.g. a video signal processor which perhaps starts with a video input, modifies it through a number of processing components (video filters), and finally outputs it to a video display.

As with video, many applications can be expressed as a network of different components that are connected by a number of communication channels. The benefits, and perhaps the greatest incentives, of expressing an application this way is scalability and parallelism. The different components in the network can be maneuvered to create entirely unique dataflows without necessarily requiring the relationship to be hardcoded. Also, the design and concept of components make it easier to run on distributed systems and parallel processors.

Here is the canonical "Hello, world" example program in Pythonect:
"Hello, world" -> print
And here is the canonical "Hello, world" multi-threaded example program in Pythonect:
"Hello, world" -> [print, print]
Not to mention that you can go from multi-threaded to multi-processed as easy as:
"Hello, world" -> [print &, print &]
Or remotely call a procedure using XML-RPC:
"Hello, world" -> print@xmlrpc://localhost:8081
The language couldn't possibly be simpler...
Okay, so what's new you're asking? *I was wrong*, it can be simpler, and it is in Pythonect version 0.6 :-)

In Pythonect 0.6.0 I have re-written the engine and some large parts of the backend. Pythonect is now using graph (NetworkX. DiGraph) as its data structure, and it's also supporting multiple file formats as an input. Currently, Pythonect (since version 0.6) supports 3 file formats:
  • *.P2Y (text-based scripting language aims to combine the quick and intuitive feel of shell scripting, with the power of Python)
  • *.DIA (visual programming language enabled by Dia)
  • *.VDX (visual programming language enabled by Microsoft Visio XML)
In other words:


is equal to:
"Hello, world" -> print
And vice versa. You can launch (almost) any graph/diagram editor and save a graph/diagram as *.VDX or *.DIA format and Pythonet will be able to parse and run it (even if it's gzipped!). Curious to see how a multi-threading/processing graph looks like? See below!


Yup, it's that simple. One node with two edges. The graph above is equal to:
"Hello, world" -> [print, print]
Which is the canonical "Hello, world" multi-threaded example program. Now, another issue that I have addressed in this release is the reduce functionally.
The famous reduce from big data. Let's say that we want to write a program that will add one to every integer input and eventually sum all the results:
[1,2,3] -> _ + 1 -> sum -> print
The above example won't work because Pythonect maps (think MapReduce) each iterable value to its own thread, so the sum function will actually receive 2, 3, 4 separately and not as a list. A workaround for this will be:
sum(`[1,2,3] -> _+1`) -> print
But with the new reduce functionally in Python 0.6, it is as easy as:
[1,2,3] -> _ + 1 -> sum(_!) -> print
By using the _! Identifier, the Pythonect interrupter will automatically join all the values (and threads/processes) into a single list and pass it to the Python function without any prerequisites. The same applies when using a graph:


is equal to:
[1,2,3] -> _ + 1 -> sum(_!) -> print
Now let's talk about the future of Pythonect. Here's a link to the TODO list, where you can find future directions. In a nutshell, more graphs, more Python implementation support, and more Service-oriented architecture (SOA).

Right now, the biggest application of Pythonect (to the best of my knowledge) is my second project, Hackersh. Hacker Shell (hackersh) is a free and open source command-line shell and scripting language designed especially for security testing. It is written in Python and uses Pythonect as its scripting engine. The upcoming release of Hackersh (work in progress!) will also enjoy the Pythonect 0.6 features such as graphs (*.VDX and *.DIA) as scripts and a better reduce functionally.

To learn more about Pythonect, please visit its homepage: http://www.pythonect.org and be sure to check out the new documentation at: http://docs.pythonect.org/en/latest/ where you can find an up-to-date tutorial and installation instructions.

That's all for now!

Wednesday, April 3, 2013

Hackersh 0.1 Release Announcement

I am pleased to announce the Official 0.1 launch of Hackersh ("Hacker Shell") - a shell (command interpreter) written in Python with built-in security commands, and out of the box wrappers for various security tools. It uses Pythonect as its scripting engine. Since it's the first release of Hackersh, I'd like to take this opportunity to explain how it works and why you should be using it.

Hackersh is an interactive console for security research and testing. It uses Pythonect as its scripting language. Pythonect is a new, experimental, general-purpose high-level dataflow programming language based on Python. It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python. The combination of the two makes:
"http://localhost" -> url -> nmap -> w3af -> print
Return something like this:
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| VULNERABILITY DESCRIPTION                                                    | URL                                                             |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| Cross Site Scripting was found at:                                           | http://localhost:8080/black/vulnerabilities/xss_r/              |
| "http://localhost:8080/black/vulnerabilities/xss_r/", using HTTP method GET. |                                                                 |
| The sent data was:                                                           |                                                                 |
| "name=%3CSCrIPT%3Efake_alert%28%22v3bd%22%29%3C%2FSCrIPT%3E". This           |                                                                 |
| vulnerability affects ALL browsers                                           |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The whole target has no protection (X-Frame-Options header) against          | Undefined                                                       |
| ClickJacking attack                                                          |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| "X-Powered-By" header for this HTTP server is: "PHP/5.3.3-7+squeeze3"        | Undefined                                                       |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The server header for the remote web server is: "Apache/2.2.16 (Debian)"     | Undefined                                                       |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| An error page sent this Apache version: "addressApache/2.2.16 (Debian)       | http://localhost:8080/black/vulnerabilities/xss_r/_vti_inf.html |
| Server at localhost Port 8080/address"                                       |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The remote Web server sent a strange HTTP response code: "405" with the      | http://localhost:8080/black/vulnerabilities/xss_r/GeBrG         |
| message: "Method Not Allowed", manual inspection is advised                  |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The remote Web server sent a strange HTTP reason message: "The HTTP server   | http://localhost:8080/black/login.php                           |
| returned a redirect error that would lead to an infinite loop. The last 30x  |                                                                 |
| error message was: Found" manual inspection is advised                       |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The remote Web server has a custom configuration, in which any non existent  | http://localhost:8080/black/vulnerabilities/xss_r/              |
| methods that are invoked are defaulted to GET instead of returning a "Not    |                                                                 |
| Implemented" response                                                        |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The URL: "http://localhost:8080/black/vulnerabilities/xss_r/" sent the       | http://localhost:8080/black/vulnerabilities/xss_r/              |
| cookie: "security=low"                                                       |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The URL: "http://localhost:8080/black/index.php" sent the cookie:            | http://localhost:8080/black/index.php                           |
| "PHPSESSID=lut893qvd4gdngp1rud5ei8pc2; path=/"                               |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| A cookie matching the cookie fingerprint DB has been found when requesting   | http://localhost:8080/black/index.php                           |
| "http://localhost:8080/black/index.php" . The remote platform is: "PHP"      |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
So, how does it work? As a dataflow programming language, Pythonect treats data as something that originates from a source - it flows through a number of processing components and arrives at a final destination. As such, it is most suitable for creating applications that are themselves focused on the "flow" of data. Perhaps the most readily available example of a dataflow-oriented application comes from the realm of real-time signal processing, e.g. a video signal processor which starts with a video input, modifies it through a number of processing components (i.e. video filters), and finally outputs it to a video display.

As with video, penetration testing (and other security domains) can be expressed as a network of different components such as: targets, network scanners, web security scanners, etc, connected by a number of communication channels. These components (and more) are provided by Hackersh, and can be either internal (e.g. url is an internal component that converts String to URL) or external (e.g. nmap is a wrapper around the Nmap security scanner). Every Hackersh component (except the Hackersh Root Component) is standardized to accept and return a context. Context is a dict (i.e. associative array) that can be piped through different components, just like text can be piped through different Unix tools (e.g. cat, grep, wc, and etc.).

Back to real life examples, here is how you can pass command line arguments to an external Hackersh component (e.g. nmap):
"http://localhost" -> url -> nmap("-sS -P0 -T3") -> w3af -> print
Here is how you can debug a Hackersh component:
"http://localhost" -> url -> nmap("-sS -P0 -T3", debug=True) -> w3af -> print
Please note that this is not a component-specific option as almost every Hackersh component can be debugged this way.

Moving on to more advanced options:
"http://localhost" -> url -> nmap("-sS -P0 -T3") -> [_['PORT'] == '8080' and _['SERVICE'] == 'HTTP'] -> w3af -> print
Support for Metadata is a major strength of Hackersh as it enables potential AI applications to fine-tune their service selection strategy based on service-specific characteristics.
"http://localhost" -> url -> [nmap, pass] -> amap
The script above is an example for a multithreaded application. It scans http://localhost alternately, using nmap + amap and amap. The output is:
http://localhost
  +-3306/tcp (MYSQL)
  +-25/tcp (SMTP)
  +-25/tcp (NNTP)
  +-902/tcp (VMWARE-AUTHD)
  +-21/tcp (FTP)
  +-21/tcp (SMTP)
  +-22/tcp (SSH)
  +-22/tcp (SSH-OPENSSH)
  +-80/tcp (HTTP)
  +-80/tcp (HTTP-APACHE-2)
  +-80/tcp (HTTP)
  +-80/tcp (HTTP-APACHE-2)
  +-631/tcp (HTTP)
  +-631/tcp (HTTP-APACHE-2)
  +-631/tcp (HTTP-CUPS)
  +-8080/tcp (HTTP)
  +-631/tcp (SSL)
  +-8080/tcp (HTTP)
  +-8080/tcp (HTTP-APACHE-2)
  +-53/tcp (DNS)
  +-8080/tcp (HTTP-APACHE-2)
  +-2222/tcp (SSH)
  +-2222/tcp (SSH-OPENSSH)
  +-3000/tcp (HTTP)
  +-111/tcp (RPC)
  `-111/tcp (RPC-RPCBIND-V4)
To read more about Pythonect's multi-thread and multi-process capabilities, please visit Pythonect Tutorial: Learn By Example.

External Hackersh components (sorted by alphabetical order) supported in this version include: As well as the internal Hackersh components (in alphabetical order) supported in this version include:
  • Hostname
  • IPv4_Address
  • IPv4_Range (supports CIDR, Netmask Source-IP Notation, IP Range and etc.)
  • Nslookup
  • Stateful programmatic Web Browser (i.e. Browse, Submit, and Iterate_Links)
  • URL
To familiarize yourself with Pythonect, you should also read these other blog posts: Make sure you check out these resources as well. Good luck, and May the Force be with you!

Sunday, February 10, 2013

Password Policy: You Are Doing It Wrong (When 2^56 Becomes 2^42)

They say the road to hell is paved with good intentions. This is often the case with non-standard password policies. About a month ago I visited my "favorite airplane company" website, and after successfully logging with my Frequent Flyer credentials, I've been redirected to an Update Password page where I've been asked to change my password according to the following criteria:

Please insert an 8 characters password
The 4 first characters need to include at least 2 letters (A-Z)
The last 4 characters must be all digits

At first sight this may seem like a good password policy, 8 characters long password, must include at least 2 A-Z letters, must include at least 4 digits. But is it really going to result in a strong password?

The answer is no, and to understand why, it is necessary to understand how brute-force attack works. Brute-force attack consists of systematically trying all possible passwords until the correct password is found. In the worst case, this would involve traversing the entire search space. Now, the more search space there is, the longer (run time) it will take the brute-force to cover it. This doesn't guarantee that a given password won't be the 1st or 2nd option in the search space, but statistically speaking, if there are more options - then there are more passwords combinations to check for.

The password policy defines the search space, depending on the password policy it can either define a global search space (e.g. 8 ASCII characters password) or an individual search space per character (e.g. 8 ASCII characters password, first character must be a digit). The latter is weaker than the former. To demonstrate this, I have developed a small Python script called alphapasswd.py that calculates the search space of a password policy given it's formation and a working sample password.
#!/usr/bin/env  python

# Copyright (C) 2013 Itzik Kotler <ik@ikotler.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

import sys
import re
import math


def main():

    try:

        print 'Evaluating Password Formation: "%s" with Sample Password "%s"' % (sys.argv[1], sys.argv[2])

        formation = re.compile(sys.argv[1])

        if not formation.match(sys.argv[2]):

            print 'Sample Password "%s" does not match Password Formation "%s"' % (sys.argv[2], sys.argv[1])

            return 0

        sample_passwd = list(formation.match(sys.argv[2]).group(0))

        print "INPUT: %s" % sample_passwd

        values_per_col = []

        exponent = 0

        # Itereate each Character in Sample Password

        for col_idx in xrange(0, len(sample_passwd)):

            total_values_per_col = 0

            # Itereate 2^8 Values

            for byte in xrange(0, 255):

                old_value = sample_passwd[col_idx]

                sample_passwd[col_idx] = chr(byte)

                # GO / NO GO ?

                if formation.match(''.join(sample_passwd)):

                    total_values_per_col = total_values_per_col + 1

                sample_passwd[col_idx] = old_value

            values_per_col.append(total_values_per_col)

        for col_idx in xrange(0, len(values_per_col)):

            print "PASSWORD BYTE #%d SEARCH SPACE 2^%d (%d)" % (col_idx+1, math.ceil(math.log(values_per_col[col_idx], 2)), values_per_col[col_idx])

            exponent = exponent + math.ceil(math.log(values_per_col[col_idx], 2))

        print "EXPONENT = %d" % exponent

        print "TOTAL = %d = 2^%d" % (2**exponent, exponent)

    except IndexError as e:

        print 'Missing password formation or sample password'
        print 'e.g. %s "[a-zA-Z0-9]{4}" "abcd"' % sys.argv[0]
        print 'Usage: %s <password formation> <sample password>' % sys.argv[0]


if __name__ == "__main__":
    main()
What the script above does is calculate how many bits (as eventually the password will be stored digitally and bit is the smallest unit of measurement used for information storage in computers) are needed to represent each character in the password given the password policy, and then it sums all the bits and outputs the maximum password strength (in bits) that this password policy can yield.

Going back to our original question, now that we have alphapasswd.py, it is possible to compare between two or more password policies and see which yields a better theoretical password (remember this is not testing against common passwords or obvious mistakes, just testing the search space). The second password policy that I will be using for comparecent is very similar to the one in question, but simpler, it's an 8 ASCII characters long password with no restrictions policy. Now that we have competitors, let's start measuring their search space, starting with the airplane company password policy:
./alphapasswd.py "[a-zA-Z][a-zA-Z][a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)]{2}[0-9]{4}" "abcd1234"
The output should be:
Evaluating Password Formation: "[a-zA-Z][a-zA-Z][a-zA-Z0-9\!\@\#$\%\^\&\*\(\)]{2}[0-9]{4}" with Sample Password "abcd1234"
INPUT: ['a', 'b', 'c', 'd', '1', '2', '3', '4']
PASSWORD BYTE #1 SEARCH SPACE 2^6 (52)
PASSWORD BYTE #2 SEARCH SPACE 2^6 (52)
PASSWORD BYTE #3 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #4 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #5 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #6 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #7 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #8 SEARCH SPACE 2^4 (10)
EXPONENT = 42
TOTAL = 4398046511104 = 2^42
From this output it is possible to see that from an 8 characters long password, the #1, #2, #5, #6, #7, and #8 bytes have a smaller search space (generally speaking the upper bounds of an ASCII byte search space is 2^7), and as a result, the maximum search space is 2^42. Now, let's try the second password policy (i.e. 8 ASCII characters long password with no restrictions):
./alphapasswd.py "[a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)]{8}" "abcd1234"
The output should be:
Evaluating Password Formation: "[a-zA-Z0-9\!\@\#$\%\^\&\*\(\)]{8}" with Sample Password "abcd1234"
INPUT: ['a', 'b', 'c', 'd', '1', '2', '3', '4']
PASSWORD BYTE #1 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #2 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #3 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #4 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #5 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #6 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #7 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #8 SEARCH SPACE 2^7 (72)
EXPONENT = 56
TOTAL = 72057594037927936 = 2^56
From this output it is possible to see that from an 8 characters long password, all the bytes have the maximum search space possible given the upper bounds of an ASCII byte search space (i.e. 2^7), and as a result, the maximum search space is 2^56. In other words, the first password policy is 16384 (i.e. 72057594037927936/4398046511104) times weaker than the second password policy. Reviewing the first password policy again, it's obiovus that the 4 digits requirement is what limits the search space the most. If so, why did my "favorite airplane company" request it? On the same Update Password page it says (on the bottom) that:

The last four characters in your password (the digits) will serve as your secret code to identify yourself at the Telephone Service Center

And so the mystery is solved, due to a legacy IVR (Interactive Voice Response), and the fact that my "favorite airplane company" did not want to seperate between their Website and IVR credentials, they composed a password security policy that is in fact weaker than an any 8 ASCII characters long password policy. Now, come to think about it, if I only need to enter 4 digits to log-in in the IVR, how are they storing the passwords then? It can't be hashed and compared as the IVR will only accept 4 digits, while the password is 8 characters long? Oh well, I guess that's a story for another day.