Wednesday, July 13, 2016

Fuzzing The Kill Chain

Fuzzing is a technique in software testing where you generate a number of random inputs, and see how a program handles it. So what does a testing technique have to do with a process such as the Cyber Kill Chain as developed by Lockheed Martin? Easy! Just as fuzzing a software produces resilient software, fuzzing a process will produce a validated process. The Kill Chain takes about seven steps that adversaries must complete in order to achieve their goals, but will it always be the case? Can an attacker pull off a successful attack with just one step? Or three? That’s what we’re going to fuzz out ...

(Again, in order to avoid cross-posting between the different blogs, that was just a brief paragraph and a link to the original post is below).

Continue reading: https://www.safebreach.com/blog/fuzzing-the-kill-chain

Wednesday, December 2, 2015

I See Your True ECHO_REQUEST Patterns (Pinging Data Away)

I've started blogging again! In order to avoid cross-posting between the different blogs, I'll just give a brief paragraph and a link back to the original post. Here we go:

Getting into a network and getting data out of a network are two different challenges. Just because an employee clicked on a malicious link and got hacked, it doesn’t mean the attacker gets to walk off with PII, Financials, Source Code etc. In this blog post, we’ll explore the known breach method of using ICMP protocol for data exfiltration but with a twist. Instead of showing how to use this breach method with some custom made tools, we’re going to do it using the default and common ping utility– red team style!

Continue reading: http://blog.safebreach.com/2015/12/02/i-see-your-true-echo_request-patterns-pinging-data-away/

Wednesday, August 7, 2013

Pythonect Has New Graphs, Documentation, Tutorial, and More!

About two weeks ago I have released a new version of Pythonect (0.6) with new features, documentation, tutorial, and an (small, but growing) example directory.
I’d like to take this opportunity to discuss the past, present and future of the Pythonect Project.

Nearly 2 years ago I started working on Pythonect with the intention to help software developers to connect the dots and make mashup, rapid prototyping, and developing scalable distributed applications easy. Pythonect is a new, experimental, general-purpose dataflow programming language based on Python. It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python. Pythonect interpreter (and reference implementation) is a free and open source software written completely in Python, and is available under the BSD 3-Clause license.

Why Pythonect? Pythonect, being a dataflow programming language, treats data as something that originates from a source, flows through a number of processing components, and arrives at some final destination. As such, it is most suitable for creating applications that are themselves focused on the "flow" of data. Perhaps the most readily available example of a dataflow-oriented applications comes from the realm of real-time signal processing, e.g. a video signal processor which perhaps starts with a video input, modifies it through a number of processing components (video filters), and finally outputs it to a video display.

As with video, many applications can be expressed as a network of different components that are connected by a number of communication channels. The benefits, and perhaps the greatest incentives, of expressing an application this way is scalability and parallelism. The different components in the network can be maneuvered to create entirely unique dataflows without necessarily requiring the relationship to be hardcoded. Also, the design and concept of components make it easier to run on distributed systems and parallel processors.

Here is the canonical "Hello, world" example program in Pythonect:
"Hello, world" -> print
And here is the canonical "Hello, world" multi-threaded example program in Pythonect:
"Hello, world" -> [print, print]
Not to mention that you can go from multi-threaded to multi-processed as easy as:
"Hello, world" -> [print &, print &]
Or remotely call a procedure using XML-RPC:
"Hello, world" -> print@xmlrpc://localhost:8081
The language couldn't possibly be simpler...
Okay, so what's new you're asking? *I was wrong*, it can be simpler, and it is in Pythonect version 0.6 :-)

In Pythonect 0.6.0 I have re-written the engine and some large parts of the backend. Pythonect is now using graph (NetworkX. DiGraph) as its data structure, and it's also supporting multiple file formats as an input. Currently, Pythonect (since version 0.6) supports 3 file formats:
  • *.P2Y (text-based scripting language aims to combine the quick and intuitive feel of shell scripting, with the power of Python)
  • *.DIA (visual programming language enabled by Dia)
  • *.VDX (visual programming language enabled by Microsoft Visio XML)
In other words:


is equal to:
"Hello, world" -> print
And vice versa. You can launch (almost) any graph/diagram editor and save a graph/diagram as *.VDX or *.DIA format and Pythonet will be able to parse and run it (even if it's gzipped!). Curious to see how a multi-threading/processing graph looks like? See below!


Yup, it's that simple. One node with two edges. The graph above is equal to:
"Hello, world" -> [print, print]
Which is the canonical "Hello, world" multi-threaded example program. Now, another issue that I have addressed in this release is the reduce functionally.
The famous reduce from big data. Let's say that we want to write a program that will add one to every integer input and eventually sum all the results:
[1,2,3] -> _ + 1 -> sum -> print
The above example won't work because Pythonect maps (think MapReduce) each iterable value to its own thread, so the sum function will actually receive 2, 3, 4 separately and not as a list. A workaround for this will be:
sum(`[1,2,3] -> _+1`) -> print
But with the new reduce functionally in Python 0.6, it is as easy as:
[1,2,3] -> _ + 1 -> sum(_!) -> print
By using the _! Identifier, the Pythonect interrupter will automatically join all the values (and threads/processes) into a single list and pass it to the Python function without any prerequisites. The same applies when using a graph:


is equal to:
[1,2,3] -> _ + 1 -> sum(_!) -> print
Now let's talk about the future of Pythonect. Here's a link to the TODO list, where you can find future directions. In a nutshell, more graphs, more Python implementation support, and more Service-oriented architecture (SOA).

Right now, the biggest application of Pythonect (to the best of my knowledge) is my second project, Hackersh. Hacker Shell (hackersh) is a free and open source command-line shell and scripting language designed especially for security testing. It is written in Python and uses Pythonect as its scripting engine. The upcoming release of Hackersh (work in progress!) will also enjoy the Pythonect 0.6 features such as graphs (*.VDX and *.DIA) as scripts and a better reduce functionally.

To learn more about Pythonect, please visit its homepage: http://www.pythonect.org and be sure to check out the new documentation at: http://docs.pythonect.org/en/latest/ where you can find an up-to-date tutorial and installation instructions.

That's all for now!

Wednesday, April 3, 2013

Hackersh 0.1 Release Announcement

I am pleased to announce the Official 0.1 launch of Hackersh ("Hacker Shell") - a shell (command interpreter) written in Python with built-in security commands, and out of the box wrappers for various security tools. It uses Pythonect as its scripting engine. Since it's the first release of Hackersh, I'd like to take this opportunity to explain how it works and why you should be using it.

Hackersh is an interactive console for security research and testing. It uses Pythonect as its scripting language. Pythonect is a new, experimental, general-purpose high-level dataflow programming language based on Python. It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python. The combination of the two makes:
"http://localhost" -> url -> nmap -> w3af -> print
Return something like this:
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| VULNERABILITY DESCRIPTION                                                    | URL                                                             |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| Cross Site Scripting was found at:                                           | http://localhost:8080/black/vulnerabilities/xss_r/              |
| "http://localhost:8080/black/vulnerabilities/xss_r/", using HTTP method GET. |                                                                 |
| The sent data was:                                                           |                                                                 |
| "name=%3CSCrIPT%3Efake_alert%28%22v3bd%22%29%3C%2FSCrIPT%3E". This           |                                                                 |
| vulnerability affects ALL browsers                                           |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The whole target has no protection (X-Frame-Options header) against          | Undefined                                                       |
| ClickJacking attack                                                          |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| "X-Powered-By" header for this HTTP server is: "PHP/5.3.3-7+squeeze3"        | Undefined                                                       |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The server header for the remote web server is: "Apache/2.2.16 (Debian)"     | Undefined                                                       |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| An error page sent this Apache version: "addressApache/2.2.16 (Debian)       | http://localhost:8080/black/vulnerabilities/xss_r/_vti_inf.html |
| Server at localhost Port 8080/address"                                       |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The remote Web server sent a strange HTTP response code: "405" with the      | http://localhost:8080/black/vulnerabilities/xss_r/GeBrG         |
| message: "Method Not Allowed", manual inspection is advised                  |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The remote Web server sent a strange HTTP reason message: "The HTTP server   | http://localhost:8080/black/login.php                           |
| returned a redirect error that would lead to an infinite loop. The last 30x  |                                                                 |
| error message was: Found" manual inspection is advised                       |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The remote Web server has a custom configuration, in which any non existent  | http://localhost:8080/black/vulnerabilities/xss_r/              |
| methods that are invoked are defaulted to GET instead of returning a "Not    |                                                                 |
| Implemented" response                                                        |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The URL: "http://localhost:8080/black/vulnerabilities/xss_r/" sent the       | http://localhost:8080/black/vulnerabilities/xss_r/              |
| cookie: "security=low"                                                       |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| The URL: "http://localhost:8080/black/index.php" sent the cookie:            | http://localhost:8080/black/index.php                           |
| "PHPSESSID=lut893qvd4gdngp1rud5ei8pc2; path=/"                               |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
| A cookie matching the cookie fingerprint DB has been found when requesting   | http://localhost:8080/black/index.php                           |
| "http://localhost:8080/black/index.php" . The remote platform is: "PHP"      |                                                                 |
+------------------------------------------------------------------------------+-----------------------------------------------------------------+
So, how does it work? As a dataflow programming language, Pythonect treats data as something that originates from a source - it flows through a number of processing components and arrives at a final destination. As such, it is most suitable for creating applications that are themselves focused on the "flow" of data. Perhaps the most readily available example of a dataflow-oriented application comes from the realm of real-time signal processing, e.g. a video signal processor which starts with a video input, modifies it through a number of processing components (i.e. video filters), and finally outputs it to a video display.

As with video, penetration testing (and other security domains) can be expressed as a network of different components such as: targets, network scanners, web security scanners, etc, connected by a number of communication channels. These components (and more) are provided by Hackersh, and can be either internal (e.g. url is an internal component that converts String to URL) or external (e.g. nmap is a wrapper around the Nmap security scanner). Every Hackersh component (except the Hackersh Root Component) is standardized to accept and return a context. Context is a dict (i.e. associative array) that can be piped through different components, just like text can be piped through different Unix tools (e.g. cat, grep, wc, and etc.).

Back to real life examples, here is how you can pass command line arguments to an external Hackersh component (e.g. nmap):
"http://localhost" -> url -> nmap("-sS -P0 -T3") -> w3af -> print
Here is how you can debug a Hackersh component:
"http://localhost" -> url -> nmap("-sS -P0 -T3", debug=True) -> w3af -> print
Please note that this is not a component-specific option as almost every Hackersh component can be debugged this way.

Moving on to more advanced options:
"http://localhost" -> url -> nmap("-sS -P0 -T3") -> [_['PORT'] == '8080' and _['SERVICE'] == 'HTTP'] -> w3af -> print
Support for Metadata is a major strength of Hackersh as it enables potential AI applications to fine-tune their service selection strategy based on service-specific characteristics.
"http://localhost" -> url -> [nmap, pass] -> amap
The script above is an example for a multithreaded application. It scans http://localhost alternately, using nmap + amap and amap. The output is:
http://localhost
  +-3306/tcp (MYSQL)
  +-25/tcp (SMTP)
  +-25/tcp (NNTP)
  +-902/tcp (VMWARE-AUTHD)
  +-21/tcp (FTP)
  +-21/tcp (SMTP)
  +-22/tcp (SSH)
  +-22/tcp (SSH-OPENSSH)
  +-80/tcp (HTTP)
  +-80/tcp (HTTP-APACHE-2)
  +-80/tcp (HTTP)
  +-80/tcp (HTTP-APACHE-2)
  +-631/tcp (HTTP)
  +-631/tcp (HTTP-APACHE-2)
  +-631/tcp (HTTP-CUPS)
  +-8080/tcp (HTTP)
  +-631/tcp (SSL)
  +-8080/tcp (HTTP)
  +-8080/tcp (HTTP-APACHE-2)
  +-53/tcp (DNS)
  +-8080/tcp (HTTP-APACHE-2)
  +-2222/tcp (SSH)
  +-2222/tcp (SSH-OPENSSH)
  +-3000/tcp (HTTP)
  +-111/tcp (RPC)
  `-111/tcp (RPC-RPCBIND-V4)
To read more about Pythonect's multi-thread and multi-process capabilities, please visit Pythonect Tutorial: Learn By Example.

External Hackersh components (sorted by alphabetical order) supported in this version include: As well as the internal Hackersh components (in alphabetical order) supported in this version include:
  • Hostname
  • IPv4_Address
  • IPv4_Range (supports CIDR, Netmask Source-IP Notation, IP Range and etc.)
  • Nslookup
  • Stateful programmatic Web Browser (i.e. Browse, Submit, and Iterate_Links)
  • URL
To familiarize yourself with Pythonect, you should also read these other blog posts: Make sure you check out these resources as well. Good luck, and May the Force be with you!

Sunday, February 10, 2013

Password Policy: You Are Doing It Wrong (When 2^56 Becomes 2^42)

They say the road to hell is paved with good intentions. This is often the case with non-standard password policies. About a month ago I visited my "favorite airplane company" website, and after successfully logging with my Frequent Flyer credentials, I've been redirected to an Update Password page where I've been asked to change my password according to the following criteria:

Please insert an 8 characters password
The 4 first characters need to include at least 2 letters (A-Z)
The last 4 characters must be all digits

At first sight this may seem like a good password policy, 8 characters long password, must include at least 2 A-Z letters, must include at least 4 digits. But is it really going to result in a strong password?

The answer is no, and to understand why, it is necessary to understand how brute-force attack works. Brute-force attack consists of systematically trying all possible passwords until the correct password is found. In the worst case, this would involve traversing the entire search space. Now, the more search space there is, the longer (run time) it will take the brute-force to cover it. This doesn't guarantee that a given password won't be the 1st or 2nd option in the search space, but statistically speaking, if there are more options - then there are more passwords combinations to check for.

The password policy defines the search space, depending on the password policy it can either define a global search space (e.g. 8 ASCII characters password) or an individual search space per character (e.g. 8 ASCII characters password, first character must be a digit). The latter is weaker than the former. To demonstrate this, I have developed a small Python script called alphapasswd.py that calculates the search space of a password policy given it's formation and a working sample password.
#!/usr/bin/env  python

# Copyright (C) 2013 Itzik Kotler <ik@ikotler.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

import sys
import re
import math


def main():

    try:

        print 'Evaluating Password Formation: "%s" with Sample Password "%s"' % (sys.argv[1], sys.argv[2])

        formation = re.compile(sys.argv[1])

        if not formation.match(sys.argv[2]):

            print 'Sample Password "%s" does not match Password Formation "%s"' % (sys.argv[2], sys.argv[1])

            return 0

        sample_passwd = list(formation.match(sys.argv[2]).group(0))

        print "INPUT: %s" % sample_passwd

        values_per_col = []

        exponent = 0

        # Itereate each Character in Sample Password

        for col_idx in xrange(0, len(sample_passwd)):

            total_values_per_col = 0

            # Itereate 2^8 Values

            for byte in xrange(0, 255):

                old_value = sample_passwd[col_idx]

                sample_passwd[col_idx] = chr(byte)

                # GO / NO GO ?

                if formation.match(''.join(sample_passwd)):

                    total_values_per_col = total_values_per_col + 1

                sample_passwd[col_idx] = old_value

            values_per_col.append(total_values_per_col)

        for col_idx in xrange(0, len(values_per_col)):

            print "PASSWORD BYTE #%d SEARCH SPACE 2^%d (%d)" % (col_idx+1, math.ceil(math.log(values_per_col[col_idx], 2)), values_per_col[col_idx])

            exponent = exponent + math.ceil(math.log(values_per_col[col_idx], 2))

        print "EXPONENT = %d" % exponent

        print "TOTAL = %d = 2^%d" % (2**exponent, exponent)

    except IndexError as e:

        print 'Missing password formation or sample password'
        print 'e.g. %s "[a-zA-Z0-9]{4}" "abcd"' % sys.argv[0]
        print 'Usage: %s <password formation> <sample password>' % sys.argv[0]


if __name__ == "__main__":
    main()
What the script above does is calculate how many bits (as eventually the password will be stored digitally and bit is the smallest unit of measurement used for information storage in computers) are needed to represent each character in the password given the password policy, and then it sums all the bits and outputs the maximum password strength (in bits) that this password policy can yield.

Going back to our original question, now that we have alphapasswd.py, it is possible to compare between two or more password policies and see which yields a better theoretical password (remember this is not testing against common passwords or obvious mistakes, just testing the search space). The second password policy that I will be using for comparecent is very similar to the one in question, but simpler, it's an 8 ASCII characters long password with no restrictions policy. Now that we have competitors, let's start measuring their search space, starting with the airplane company password policy:
./alphapasswd.py "[a-zA-Z][a-zA-Z][a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)]{2}[0-9]{4}" "abcd1234"
The output should be:
Evaluating Password Formation: "[a-zA-Z][a-zA-Z][a-zA-Z0-9\!\@\#$\%\^\&\*\(\)]{2}[0-9]{4}" with Sample Password "abcd1234"
INPUT: ['a', 'b', 'c', 'd', '1', '2', '3', '4']
PASSWORD BYTE #1 SEARCH SPACE 2^6 (52)
PASSWORD BYTE #2 SEARCH SPACE 2^6 (52)
PASSWORD BYTE #3 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #4 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #5 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #6 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #7 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #8 SEARCH SPACE 2^4 (10)
EXPONENT = 42
TOTAL = 4398046511104 = 2^42
From this output it is possible to see that from an 8 characters long password, the #1, #2, #5, #6, #7, and #8 bytes have a smaller search space (generally speaking the upper bounds of an ASCII byte search space is 2^7), and as a result, the maximum search space is 2^42. Now, let's try the second password policy (i.e. 8 ASCII characters long password with no restrictions):
./alphapasswd.py "[a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)]{8}" "abcd1234"
The output should be:
Evaluating Password Formation: "[a-zA-Z0-9\!\@\#$\%\^\&\*\(\)]{8}" with Sample Password "abcd1234"
INPUT: ['a', 'b', 'c', 'd', '1', '2', '3', '4']
PASSWORD BYTE #1 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #2 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #3 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #4 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #5 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #6 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #7 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #8 SEARCH SPACE 2^7 (72)
EXPONENT = 56
TOTAL = 72057594037927936 = 2^56
From this output it is possible to see that from an 8 characters long password, all the bytes have the maximum search space possible given the upper bounds of an ASCII byte search space (i.e. 2^7), and as a result, the maximum search space is 2^56. In other words, the first password policy is 16384 (i.e. 72057594037927936/4398046511104) times weaker than the second password policy. Reviewing the first password policy again, it's obiovus that the 4 digits requirement is what limits the search space the most. If so, why did my "favorite airplane company" request it? On the same Update Password page it says (on the bottom) that:

The last four characters in your password (the digits) will serve as your secret code to identify yourself at the Telephone Service Center

And so the mystery is solved, due to a legacy IVR (Interactive Voice Response), and the fact that my "favorite airplane company" did not want to seperate between their Website and IVR credentials, they composed a password security policy that is in fact weaker than an any 8 ASCII characters long password policy. Now, come to think about it, if I only need to enter 4 digits to log-in in the IVR, how are they storing the passwords then? It can't be hashed and compared as the IVR will only accept 4 digits, while the password is 8 characters long? Oh well, I guess that's a story for another day.

Tuesday, December 25, 2012

Scraping LinkedIn Public Profiles for Fun and Profit

Reconnaissance and Information Gathering is a part of almost every penetration testing engagement. Often, the tester will only perform network reconnaissance in an attempt to disclose and learn the company's network infrastructure (i.e. IP addresses, domain names, and etc), but there are other types of reconnaissance to conduct, and no, I'm not talking about dumpster diving. Thanks to social networks like LinkedIn, OSINT/WEBINT is now yielding more information. This information can then be used to help the tester test anything from social engineering to weak passwords.

In this blog post I will show you how to use Pythonect to easily generate potential passwords from LinkedIn public profiles. If you haven't heard about Pythonect yet, it is a new, experimental, general-purpose dataflow programming language based on the Python programming language. Pythonect is most suitable for creating applications that are themselves focused on the "flow" of the data. An application that generates passwords from the employees public LinkedIn profiles of a given company - have a coherence and clear dataflow:

(1) Find all the employees public LinkedIn profiles(2) Scrap all the employees public LinkedIn profiles(3) Crunch all the data into potential passwords

Now that we have the general concept and high-level overview out of the way, let's dive in to the details.

Finding all the employees public LinkedIn profiles will be done via Google Custom Search Engine, a free service by Google that allows anyone to create their own search engine by themselves. The idea is to create a search engine that when searching for a given company name - will return all the employees public LinkedIn profiles. How? When creating a Google Custom Search Engine it's possible to refine the search results to a specific site (i.e. 'Sites to search'), and we're going to limit ours to: linkedin.com. It's also possible to fine-tune the search results even further, e.g. uk.linkedin.com to find only employees from United Kingdom.

The access to the newly created Google Custom Search Engine will be made using a free API key obtained from Google API Console. Why go through the Google API? because it allows automation (No CAPTCHA's), and it also means that the search-result pages will be returned as JSON (as oppose to HTML). The only catch with using the free API key is that it's limited to 100 queries per day, but it's possible to buy an API key that will not be limited.

Scraping the profiles is a matter of iterating all over the hCards in all the search-result pages, and extracting the employee name from each hCard. Whats is a hCard? hCard is a micro format for publishing the contact details of people, companies, organizations, and places. hCard is also supported by social networks such as Facebook, Google+, LinkedIn and etc. for exporting public profiles. Google (when indexing) parses hCard, and when relevant, uses them in search-result pages. In other words, when search-result pages include LinkedIn public profiles, it will appear as hCards, and could be easily parsed.

Let's see the implementation of the above:
#!/usr/bin/python
#
# Copyright (C) 2012 Itzik Kotler
#
# scraper.py is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# scraper.py is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with scraper.py.  If not, see <http://www.gnu.org/licenses/>.

"""Simple LinkedIn public profiles scraper that uses Google Custom Search"""

import urllib
import simplejson


BASE_URL = "https://www.googleapis.com/customsearch/v1?key=<YOUR GOOGLE API KEY>&cx=<YOUR GOOGLE SEARCH ENGINE CX>"


def __get_all_hcards_from_query(query, index=0, hcards={}):

    url = query

    if index != 0:

        url = url + '&start=%d' % (index)

    json = simplejson.loads(urllib.urlopen(url).read())

    if json.has_key('error'):

        print "Stopping at %s due to Error!" % (url)

        print json

    else:

        for item in json['items']:

            try:

                hcards[item['pagemap']['hcard'][0]['fn']] = item['pagemap']['hcard'][0]['title']

            except KeyError as e:

                pass

        if json['queries'].has_key('nextPage'):

            return __get_all_hcards_from_query(query, json['queries']['nextPage'][0]['startIndex'], hcards)

    return hcards


def get_all_employees_by_company_via_linkedin(company):

    queries = ['"at %s" inurl:"in"', '"at %s" inurl:"pub"']

    result = {}

    for query in queries:

        _query = query % company

        result.update(__get_all_hcards_from_query(BASE_URL + '&q=' + _query))

    return list(result)
Replace <YOUR GOOGLE API KEY> and <YOUR GOOGLE SEARCH ENGINE CX> in the code above with your Google API Key and Google Search Engine CX respectively, save it to a file called scraper.py, and you're ready!

To kick-start, here is a simple program in Pythonect (that utilizes the scraper module) that searchs and prints all the Pythonect company employees full names:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> print
The output should be:
Itzik Kotler
In my LinkedIn Profile, I have listed Pythonect as a company that I work for, and since no one else is working there, when searching for all the employees of Pythonect company - only my LinkedIn profile comes up.
For demonstration purposes I will keep using this example (i.e. "Pythonect" company, and "Itzik Kotler" employee), but go ahead and replace Pythonect with other, more popular, companies names and see the results.

Now that we have a working skeleton, let's take its output and start crunching it. Keep in mind that every "password generation forumla" is merely a guess. The examples below are only a sampling of what can be done. There are, obviously many more possibilities and you are encouraged to experiment. But first, let's normalize the output - this way it's going to be consistent before operations are performed on it:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> string.lower(''.join(_.split()))
The normalization procedure is short and simple: convert the string to lowercase and remove any spaces, and so the output should be now:
itzikkotler
As for data manipulation, out of the box (Thanks to The Python Standard Library) we've got itertools and it's combinatoric generators. Let's start by applying itertools.product:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> string.lower(''.join(_.split())) -> itertools.product(_, repeat=4) -> print
The code above will generate and print every 4 characters password from the letters: i, t, z, k, o, t, l , e, r. However, it won't cover passwords with uppercase letters in it. And so, here's a simple and straightforward implementation of a cycle_uppercase function that cycles the input letters yields a copy of the input with letter in uppercase:
def cycle_uppercase(i):
    s = ''.join(i)
    for idx in xrange(0, len(s)):
        yield s[:idx] + s[idx].upper() + s[idx+1:]
To use it, save it to a file called itertools2.py, and then simply add it to the Pythonect program after the itertools.product(_, repeat=4) block, as follows:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin \
    -> string.lower(''.join(_.split())) \
        -> itertools.product(_, repeat=4) \
            -> itertools2.cycle_uppercase \
                -> print
Now, the program will also cover passwords that include a single uppercase letter in it. Moving on with the data manipulation, sometimes the password might contain symbols that are not found within the scrapped data. In this case, it is necessary to build a generator that will take the input and add symbols to it. Here is a short and simple generator implemented as a Generator Expression:
[_ + postfix for postfix in ['123','!','$']]
To use it, simply add it to the Pythonect program after the itertools2.cycle_uppercase block, as follows:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin \
    -> string.lower(''.join(_.split())) \
        -> itertools.product(_, repeat=4) \
            -> itertools2.cycle_uppercase \
                -> [_ + postfix for postfix in ['123','!','$']] \
                    -> print
The result is that now the program adds the strings: '123', '!', and '$' to every generated password, which increases the chances of guessing the user's right password, or not, depends on the password :)

To summarize, it's possible to take OSINT/WEBINT data on a given person or company and use it to generate potential passwords, and it's easy to do with Pythonect. There are, of course, many different ways to manipulate the data into passwords and many programs and filters that can be used. In this aspect, Pythonect being a flow-oriented language makes it easy to experiment and research with different modules and programs in a "plug and play" manner.

Monday, September 17, 2012

Fuzzing Like A Boss with Pythonect

In my previous post Automated Static Malware Analysis with Pythonect, I wrote about how to use Pythonect to automate static malware analysis. In this post I'll describe how to use Pythonect and all of its perks to fuzz file formats, network protocols, and command line arguments. The examples provided are only a sampling of what can be done. There are, obviously many more possibilities and you are encouraged to experiment. Before you read this tutorial you should have at least a basic knowledge of Fuzz testing, Python and Pythonect (I recommend reading the Pythonect Tutorial: Learn By Example).

Let's see some code!
['A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    -> [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        -> os.system('/bin/ping ' + _)
The code above tries to fuzz the command-line arguments of a *nix command-line tool (e.g. /bin/ping). Let's go line by line and explain what's going on with these 3 lines of code.

The first line defines a list of inputs to try (i.e. ['A', 'a', '0', ...]]), the second line defines a list of length parameters (i.e. [256, 512, 1024, ...]), and the last line executes the command-line tool with the generated argument as argv[1] (e.g. /bin/ping "AAAAAA ... 250 times"). In addition, this fuzzer is multi-threaded and uses asynchronous communication. What does it mean? It means that it's not waiting for a thread to finish before continuing with the loop, and as a result, it's not guaranteed to fuzz in sorted order (.e. A * 255, A * 512, A * 1024, ...)

You can easily extend the code above to include testing for format string vulnerabilities:
['%s', '%n', 'A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    -> [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        -> os.system('/bin/ping ' + _)
If you want the format string testing inputs to run first (i.e. fuzz in sorted order), change the forward pipe operator from asynchronous to synchronous:
['%s', '%n', 'A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    | [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        -> os.system('/bin/ping ' + _)
If you also want the length parameters to run in sorted order (i.e. '%s' * 256, '%s' * 512, '%s' * 1024, ...), change the 2nd forward pipe operator to synchronous as well:
['%s', '%n', 'A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    | [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        | os.system('/bin/ping ' + _)
Keep in mind, that the latter is no longer multi-threaded (due to the fact that it's waiting for both, the inputs and length threads to finish).

Moving on, here is an example of a generic file format fuzzer:
open('dana.jpg', 'r').read() \
    -> itertools.permutations \
        -> open('output_' + hex(_.__hash__()) + '.jpg', 'w').write(''.join(_))
The code above reads the content of dana.jpg and passes it to itertools.permutations, and that in turn returns dana.jpg-length tuples, all possible orderings, no repeated elements.
Each dana.jpg-length tuple is saved into a unique output_ prefixed file. Afterwards, testing the JPEG libraries is as easy as: eog *.jpg or zgv *.jpg

This is another example of a generic file format fuzzer:
open('dana.jpg', 'r').read() \
    -> [list(_) + [os.urandom(1) for n in xrange(0, len(_))]] \
        -> [tuple(random.sample(_, len(_)/2)) for i in xrange(0, len(_)*2)] \
            -> open('output_' + hex(_.__hash__()) + '.jpg', 'w').write(''.join(_))
The code above reads the content of dana.jpg, generates a dana.jpg-length random bytes buffer, joins them, and then randomly samples dana.jpg-length*2 dana.jpg-length chunks.
Each dana.jpg-length chunk is saved into a unique output_ prefixed file. Again, testing the JPEG libraries is as easy as: eog *.jpg or zgv *.jpg

Last but not least, here's a network protocol (FTP) fuzzer:
ftplib.FTP('localhost') \
    -> _.login().startswith('230') \
    -> [_.mkd(s) for s in reduce(lambda x,y: x+y, map(lambda c: [chr(c) * 2**l for l in range(8,13)], xrange(1, 255)))]
The code above uses ftplib module to connect to a FTP site, logins as an anonymous, generates strings from byte value 1-255 * 256, 512 and etc. and passes each string as pathname for MKD.

Lastly, if you have suggestions on how we can make Pythonect better, head over to Pythonect's github page and create a new ticket or fork. Enjoy the examples and have fun with Pythonect!