SoCal Piggies

Monday, February 20, 2006

ANTLR Studio

While browsing through Eclipse plugins for Python and Jython, I came across ANTLR Studio for generating ANTLR lexers and parsers. Even if you aren't that interested in lexers and parsers, it has a great video of a very innovative user interface. The author of ANTLR Studio, Prashant Deva, is only 19 years old!

Friday, November 11, 2005

Ninth SoCal Piggies Meeting

The SoCal Piggies had their ninth meeting at USC (Salvatori Computer Science Center) on November 10th at 7:00 PM. Eight Piggies attended -- Daniel Arbuckle, Steve Williams, Grig Gheorghiu, Diane Trout, Titus Brown, Mark Kohler, Howard Golden and George Bullis.

The first presenter was Daniel Arbuckle, who talked about "Python and Unicode". Daniel started by introducing general Unicode concepts such as the Universal Character Set (UCS) and the Basic Multilingual Plane (BMP). A somewhat curious detail about UCS is that, due to a limitation of the UTF-16 encoding, the high 9 bits of the 4 bytes taken by Unicode characters are not being used, so we end up with 1114111 possible characters, which should still be plenty enough for everybody's needs. In any case, the most widely used Unicode characters are the first 65536, conveniently fitting on 2 bytes, which is why many Unicode implementations use only 2 bytes per character (and this includes Python, if you compile it from source and use the default build settings).

In terms of codecs, Daniel talked mainly about UTF-8 and UTF-16, the two most common ones. In short, if you're dealing mainly with latin characters, you're better off using UTF-8, which will offer a more compact encoding in this case. If your app involves other character sets, such as Chinese or Japanese, you're better off using UTF-16, which will usually use only 2 bytes per character, while UTF-8 will tend to end up using more than 2 bytes in this case.

In Python, you use myunicode.encode(codec_name) to turn a Unicode object myunicode into a byte string containing encoded Unicode characters. To go back from a byte string mystring to a Unicode object, use mystring.decode(codec_name). Here is an example showing how to go back and forth between Unicode and normal byte strings:

u"\ud723abcow".encode("utf-8").decode("utf-8") == u"\ud723abcow"

A good strategy of dealing with Unicode in your application is this, in Daniel's words: "decode everything on input, leave it in unicode format while processing, and encode everything on output (preferably using the same codec throughout)". When this is not possible, for example when your code interfaces with a library that sometimes returns Unicode and sometimes returns strings, you can use an adapter function such as this one that Daniel put together:

def adapt_unicode(value, codec):
   if isinstance(value, unicode):
       return value
   try:
       return unicode(value, codec)
   except TypeError:
       return unicode(str(value), 'ascii')

The next presenter was Mark Kohler, who showed us some Python code he put together while taking part in the Scrapheap Challenge at OOPSLA '05. Dubbed "A workshop in post-modern programming", the Scrapheap Challenge consisted in solving problems in a relatively short amount of time (60 to 90 minutes), using the Internet as the scrapheap, i.e. reusing as much code as you can find on the Internet. The problems were short enough that they could be solved in this manner, yet complicated enough so that solving them from scratch would not be practical in the allotted time. Mark was proud to report that for one of the 3 problems, a Sudoku puzzle solver, his team (each team consisted of a pair of programmers) produced the only solution from all the competing teams. Theirs was of course a Python-based solution, using a recipe found via Google on the Python Cookbook site. Mark's team used the w3m text-based browser to sanitize the Web page containing the problem, fed the output to grep for filtering the lines containing the actual Sudoku grid lines, munged the grep output via a few lines of custom Python code, and sent the result to the Sudoku solver from the recipe. Nice and fast! We had some fun yesterday making the few custom lines of Python code even more compact by various tricks such as slicing a list while skipping every other character. We could have reduced everything to just one line via a list comprehension, but we stopped short of that

Another fun problem was to build a so-called "integrationometer", a dial that shows how far a local copy of an svn repository has diverged from a the repository. Mark's team's solution was to use svn diff to get the number of lines changed between the local copy and the repository, then use PIL to draw a nice semicircle as a dial that showed the percentage of differing lines in red on a green background. Very neat.

A third problem (actually the first one chronologically) was to provide some way for a user to find questions that had remained unanswered in their email for more than three days. This one was the toughest one to solve according to Mark, and the solution found by the winning team involved some gmail and greasemonkey tricks that made certain assumptions about the Inbox and the Sent mailbox. Generic solutions for parsing mailboxes are not easy to get by; after all, it is not for naught that Zawinski's Law states that "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can."

We ended the meeting with an impromptu presentation by Titus Brown, who gave us an update on his progress in writing unit tests for twill, his Web app testing tool. Titus handed out some printouts with some nifty code he put together for testing twill itself by starting a server process that he runs twill scripts against. The server process can be anything, in his case it's a Quixote application that enables him to test form submission and other features of twill. Speaking of submission, Titus also showed us the aptly-named 'collar' application (get it? submission...collar...it's funny, laugh), a Quixote-based Web app he created for submitting and reviewing conference papers. He's currently creating a suite of twill tests that will serve as a regression test harness and will make the application more solid across changes in Quixote, changes in cucumber2 (which is an ORM layer that Titus wrote for talking Pythonically to PostgreSQL databases), etc.

The meeting ran pretty long, we were there until 10 PM or so, but it was well worth it. Many thanks to Daniel, Mark and Titus for presenting and to Daniel again for hosting the meeting. We won't have a regular meeting in December, because many people, including Daniel, will take time off, but Titus proposed we meet for dinner one evening, maybe in Pasadena. This sounds like a great idea and we'll discuss details etc. on the mailing list.

Here is the tentative agenda for the next regular meeting, in January 2006:

"metakit overview" - Howard Golden
"Testing a Python Web app with Selenium" - Grig Gheorghiu

Monday, October 17, 2005

Eighth SoCal Piggies Meeting

The SoCal Piggies had their eighth meeting at USC (Salvatori Computer Science Center) on October 13th at 7:00 PM. Seven Piggies attended -- Daniel Arbuckle, Diane Trout, Brian Leair, Titus Brown, Grig Gheorghiu, Mark Kohler and Howard Golden.

The first presenter was Brian Leair, who introduced the Python Imaging Library, aka PIL. Brian talked about the main PIL modules such as Image and ImageDraw, and showed snippets of code that load and save images from files, create new images from scratch, use antialiasing, and in general manipulate images in various ways. PIL offers a dazzling array of options when it comes to image manipulation, and it knows how to deal with a zillion file formats. Brian was especially appreciative of the text handling capabilities of PIL. He showed us how easily you can embed text and use TrueType fonts in your images. The presentation ended with a demo of an OpenGL application that Brian wrote that showed a zoomable and rotatable 3D graphic, with the text on the axis being rendered in TrueType fonts via PIL. Words do not do justice to the coolness of the app.

You can see Brian's presentation online or you can download it as a PowerPoint file. Brian also mentioned a Python Cookbook recipe that uses PIL for watermarking images: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/362879.

Diane Trout was next, and she presented an overview of matplotlib, a 2D plotting library written in Python, with a high degree of matlab compatibility. Diane talked about the main plotting functions and the various backends available in matplotlib, then ran several sample programs distributed with matplotlib (subplots, scatter plots, spectograms, etc.). By default the graphs are plotted on a canvas that allows you to pan and zoom on the X and Y axis for any combination of the axes that are plotted -- very spectacular. Diane also showed us some code she wrote that uses matplotlib in conjuction with HTML image maps, so that when you can click on a data point in the plot and see its coordinates or display other information about it.

Some caveats that Diane mentioned regarding matplotlib:

The GTK backend can cause problems if you run matplotlib commands interactively from the Python shell (basically if you close the GTK canvas you can't go back to the shell, due to the GTK event loop)
matplotlib is under very active development, and things are not always backward compatible, so expect your code to break here and there when upgrading to a new version

You can see Diane's presentation in PDF format.

On the same matplotlib theme, Grig handed out some printouts showing a little module he wrote called sparkplot, which uses matplotlib to create sparklines. Edward Tufte introduced sparklines in a sample chapter of his upcoming book "Beautiful Evidence". In his words, sparklines are "small, high-resolution graphics embedded in a context of words, numbers, images. Sparklines are data-intense, design-simple, word-sized graphics." The sparkplot module allows you to create sparklines as PNG files that can then be seamlessly displayed within your text. For example, here is the Los Angeles Lakers' road to their NBA title lakers2002 " in 2002.

As is always the case, we had lively discussions outside of the 'official' presentations, while munching on some pizza. We covered such various subjects as TurboGears, how to unit test a testing module, whether the Python community is really less friendly than the Ruby community (as some luminaries such as Martin Fowlers are prone to declare), Guido's time machine, and many others. It was fun, so come join us next time!

Many thanks to Brian and Diane for presenting and to Daniel for hosting the meeting.

The next meeting will be at USC again, with the following agenda:

Python and Unicode (Daniel Arbuckle)
metakit overview (Howard Golden)

Wednesday, September 14, 2005

Seventh SoCal Piggies Meeting

The SoCal Piggies had their seventh meeting at USC (Salvatori Computer Science Center) on September 13th at 7:00 PM. Eight Piggies attended -- Daniel Arbuckle, Diane Trout, Grig Gheorghiu, Howard Golden, Manuel Garcia, Mark Kohler, Steve Williams and Titus Brown.

Diane Trout talked about her experiences with Trac, an integrated wiki/issue tracker/source code browser/interface to revision control systems. The chief virtue of Trac is that it's written in Python. The other chief virtue is that it looks good, which may be the reason why it's starting to get traction (pardon the pun) in different open souce projects -- and here we have to mention Django, which uses Trac in the Code section of its Web site. The Django repository site is a good example of how attractive and polished a Trac site can be.

Diane also said the setup of Trac was relatively easy.

Trac supports Subversion out of the box, but Darcs can also be integrated into it, as Diane showed us. The guy who integrated Darcs is the same guy who wrote Tailor, a Python package that knows how to migrate changesets among tons of revision control systems.

Another nice thing about Trac is that it allows you to easily link from a bug description into the source code referenced by that bug. For example, to link to the README file, simply include source:README in your bug description. The reverse is also true, i.e. you can reference bug numbers when in your code check-in comments -- for example, to reference bug number 23, simply include #23 or ticket:23 in your commit comments. Here is more information on specific Trac Wiki formatting rules.

Diane also mentioned TestOOB, which is a unit test tool that extends the default unittest module with things such as HTML/XML output and colored console output. The thing that Diane found most useful was that, when run with --debug, testoob drops you into the PDB debugger when it encounters any test failures.

Titus Brown was next, and he talked about why he likes Darcs. Darcs is a distributed revision control system with some solid math theory behind it. All working copies of Darcs are full repositories, and you can easily sync repositories by means of 'push' and 'pull' commands. Patches can be sent from one developer to another via email, and the repositories can very easily be made available in a read-only format by simply making them accessible through the Web, for example by putting them somewhere under the DocumentRoot directory of your Apache web server (in fact, this was one of the main reasons why Titus switched from CVS to Darcs).

One minor Darcs annoyance is the difficulty to install from source (installing from binary is obviously no problem, assuming a binary exists for your distribution). Darcs is written in Haskell, so the Glasgow Haskell Compiler is required, and if you wish to install Haskell from source, you have a boot-strapping problem, because Haskell is written in itself.

Had some commentary about how nice Darcs would be for people managing projects with users on more than one site, with subtle changes in the configuration at each site. Each site would have its own darcs repository, along with the main one. Patch the main repository or an individual site, and then cherry-pick the changes to distribute to all. Very clean.

Many thanks to Diane and Titus for presenting and to Daniel for hosting the meeting.

The next meeting will be at USC again. Here are some topics for our next meetings:

PIL overview (Brian Leair)
matplotlib tutorial (Diane Trout)
metakit overview (Howard Golden)
twill presentation (Titus Brown)
Selenium tutorial (Grig Gheorghiu)

Friday, August 05, 2005

Review of C++ unit testing frameworks

Not Python related, but pretty interesting nevertheless... Via Len: Exploring the C++ Unit Testing Framework Jungle

Good introductory article on SQLObject

From IBM developerWorks.

Friday, July 22, 2005

Fifth SoCal Piggies meeting

The SoCal Piggies had their fifth meeting at USC on July 21st at 7:00 PM. Six Piggies attended -- Daniel Arbuckle, Diane Trout, Steve Williams, Mark Kohler, Grig Gheorghiu and Brian Leair.

Grig presented an overview of the py library, a collection of modules that intend to address several issues with the Python standard library. The py lib's mantra is "No API", which means it aims to be as simple to use and as "pythonic" as possible, while avoiding the F word ("Framework"). Grig talked mostly about py.test, and then briefly presented greenlets, py.xml and py.log. Hopefully he managed to convince the attendants to give the py lib a try.

We obviously also talked about Django, the hottest new thing since sliced bread in Python land. Daniel said he gave it a try, but he found the installation much harder than CherryPy's, so for now he's sticking with the latter. We went around the room in search of topics for future presentations, and fortunately it looks like there's no shortage of subjects: wxWindows, PIL, matplotlib, decorators, sets, and the list goes on.

We also briefly toyed with and solved the first 2 challenges of the Python Challenge, just to see how it feels to code together as a group. It was fun, so maybe we'll dedicate a future meeting to a coding session, be it the Python Challenge or other problems that can be solved in max. 1 hour.

Many thanks to Daniel for hosting the meeting and to Diane for bringing the projector.

The next meeting will be at USC again, with the date tentatively set to August 18th. Topics for next meeting:

Demo of an interactive multi-player game written as a Python CGI application, with some AJAX thrown in (Charlie Hornberger)
Presentation/demo of a commercial wxWindows-based Python application (Steve Williams)

Wednesday, June 29, 2005

PEP 238 - import .relative

Python Imports

PEP 238 which has been accepted and may have an impact on any Python packages you've made so far, depending on your import style. This PEP will change the way imports work in a package. By default, all imports will be concidered absolute. Previously, if you had a Python package named 'Foo' with modules 'Bar' and 'Fig', you could import 'Fig' from the 'Bar' module by typing 'import Fig'. Once this new feature is implemented in Python 2.5, it will look for 'Fig' in sys.path first. They also plan to implement relative imports as well. With this you will be able to import the 'Fig' module from 'Bar' by typing 'import .Fig' for more a better example, see Guido's Decision or read the entire PEP.

Other features include importing groups like:


from os import (path,
              sys,
              mkdir)

Which allows for easyier multi-line imports. Check out the PEP for additional information.

Note that you can use the new features in Python 2.4 using:

  'from __future__ import absolute_import'

In Python 2.5, any package that results in a relative import from a module from within a package will result in a DeprecationWarning. Python 2.6 will include the fully implemented and on by default PEP 238.

PEP 238 - Abstract

The import statement has two problems:

Long import statements can be difficult to write, requiring
various contortions to fit Pythonic style guidelines.

Imports can be ambiguous in the face of packages; within a package,
it's not clear whether import foo refers to a module within the
package or some module outside the package. (More precisely, a local
module or package can shadow another hanging directly off
sys.path.)

For the first problem, it is proposed that parentheses be permitted to
enclose multiple names, thus allowing Python's standard mechanisms for
multi-line values to apply. For the second problem, it is proposed that
all import statements be absolute by default (searching sys.path
only) with special syntax (leading dots) for accessing package-relative
imports.

Feedback

Let me know if this post was useful or not, so I'll know wether or not to post something like this again. I hope everyone finds this post to be useful.