Wednesday, August 7, 2013

Pythonect Has New Graphs, Documentation, Tutorial, and More!

About two weeks ago I have released a new version of Pythonect (0.6) with new features, documentation, tutorial, and an (small, but growing) example directory.
I’d like to take this opportunity to discuss the past, present and future of the Pythonect Project.

Nearly 2 years ago I started working on Pythonect with the intention to help software developers to connect the dots and make mashup, rapid prototyping, and developing scalable distributed applications easy. Pythonect is a new, experimental, general-purpose dataflow programming language based on Python. It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python. Pythonect interpreter (and reference implementation) is a free and open source software written completely in Python, and is available under the BSD 3-Clause license.

Why Pythonect? Pythonect, being a dataflow programming language, treats data as something that originates from a source, flows through a number of processing components, and arrives at some final destination. As such, it is most suitable for creating applications that are themselves focused on the "flow" of data. Perhaps the most readily available example of a dataflow-oriented applications comes from the realm of real-time signal processing, e.g. a video signal processor which perhaps starts with a video input, modifies it through a number of processing components (video filters), and finally outputs it to a video display.

As with video, many applications can be expressed as a network of different components that are connected by a number of communication channels. The benefits, and perhaps the greatest incentives, of expressing an application this way is scalability and parallelism. The different components in the network can be maneuvered to create entirely unique dataflows without necessarily requiring the relationship to be hardcoded. Also, the design and concept of components make it easier to run on distributed systems and parallel processors.

Here is the canonical "Hello, world" example program in Pythonect:
"Hello, world" -> print
And here is the canonical "Hello, world" multi-threaded example program in Pythonect:
"Hello, world" -> [print, print]
Not to mention that you can go from multi-threaded to multi-processed as easy as:
"Hello, world" -> [print &, print &]
Or remotely call a procedure using XML-RPC:
"Hello, world" -> print@xmlrpc://localhost:8081
The language couldn't possibly be simpler...
Okay, so what's new you're asking? *I was wrong*, it can be simpler, and it is in Pythonect version 0.6 :-)

In Pythonect 0.6.0 I have re-written the engine and some large parts of the backend. Pythonect is now using graph (NetworkX. DiGraph) as its data structure, and it's also supporting multiple file formats as an input. Currently, Pythonect (since version 0.6) supports 3 file formats:
  • *.P2Y (text-based scripting language aims to combine the quick and intuitive feel of shell scripting, with the power of Python)
  • *.DIA (visual programming language enabled by Dia)
  • *.VDX (visual programming language enabled by Microsoft Visio XML)
In other words:


is equal to:
"Hello, world" -> print
And vice versa. You can launch (almost) any graph/diagram editor and save a graph/diagram as *.VDX or *.DIA format and Pythonet will be able to parse and run it (even if it's gzipped!). Curious to see how a multi-threading/processing graph looks like? See below!


Yup, it's that simple. One node with two edges. The graph above is equal to:
"Hello, world" -> [print, print]
Which is the canonical "Hello, world" multi-threaded example program. Now, another issue that I have addressed in this release is the reduce functionally.
The famous reduce from big data. Let's say that we want to write a program that will add one to every integer input and eventually sum all the results:
[1,2,3] -> _ + 1 -> sum -> print
The above example won't work because Pythonect maps (think MapReduce) each iterable value to its own thread, so the sum function will actually receive 2, 3, 4 separately and not as a list. A workaround for this will be:
sum(`[1,2,3] -> _+1`) -> print
But with the new reduce functionally in Python 0.6, it is as easy as:
[1,2,3] -> _ + 1 -> sum(_!) -> print
By using the _! Identifier, the Pythonect interrupter will automatically join all the values (and threads/processes) into a single list and pass it to the Python function without any prerequisites. The same applies when using a graph:


is equal to:
[1,2,3] -> _ + 1 -> sum(_!) -> print
Now let's talk about the future of Pythonect. Here's a link to the TODO list, where you can find future directions. In a nutshell, more graphs, more Python implementation support, and more Service-oriented architecture (SOA).

Right now, the biggest application of Pythonect (to the best of my knowledge) is my second project, Hackersh. Hacker Shell (hackersh) is a free and open source command-line shell and scripting language designed especially for security testing. It is written in Python and uses Pythonect as its scripting engine. The upcoming release of Hackersh (work in progress!) will also enjoy the Pythonect 0.6 features such as graphs (*.VDX and *.DIA) as scripts and a better reduce functionally.

To learn more about Pythonect, please visit its homepage: http://www.pythonect.org and be sure to check out the new documentation at: http://docs.pythonect.org/en/latest/ where you can find an up-to-date tutorial and installation instructions.

That's all for now!