Monday, February 20, 2006

ANTLR Studio

While browsing through Eclipse plugins for Python and Jython, I came across ANTLR Studio for generating ANTLR lexers and parsers. Even if you aren't that interested in lexers and parsers, it has a great video of a very innovative user interface. The author of ANTLR Studio, Prashant Deva, is only 19 years old!

Friday, November 11, 2005

Ninth SoCal Piggies Meeting

The SoCal Piggies had their ninth meeting at USC (Salvatori Computer Science Center) on November 10th at 7:00 PM. Eight Piggies attended -- Daniel Arbuckle, Steve Williams, Grig Gheorghiu, Diane Trout, Titus Brown, Mark Kohler, Howard Golden and George Bullis.

The first presenter was Daniel Arbuckle, who [WWW] talked about "Python and Unicode". Daniel started by introducing general Unicode concepts such as the Universal Character Set (UCS) and the Basic Multilingual Plane (BMP). A somewhat curious detail about UCS is that, due to a limitation of the UTF-16 encoding, the high 9 bits of the 4 bytes taken by Unicode characters are not being used, so we end up with 1114111 possible characters, which should still be plenty enough for everybody's needs. In any case, the most widely used Unicode characters are the first 65536, conveniently fitting on 2 bytes, which is why many Unicode implementations use only 2 bytes per character (and this includes Python, if you compile it from source and use the default build settings).

In terms of codecs, Daniel talked mainly about UTF-8 and UTF-16, the two most common ones. In short, if you're dealing mainly with latin characters, you're better off using UTF-8, which will offer a more compact encoding in this case. If your app involves other character sets, such as Chinese or Japanese, you're better off using UTF-16, which will usually use only 2 bytes per character, while UTF-8 will tend to end up using more than 2 bytes in this case.

In Python, you use myunicode.encode(codec_name) to turn a Unicode object myunicode into a byte string containing encoded Unicode characters. To go back from a byte string mystring to a Unicode object, use mystring.decode(codec_name). Here is an example showing how to go back and forth between Unicode and normal byte strings:

u"\ud723abcow".encode("utf-8").decode("utf-8") == u"\ud723abcow"

A good strategy of dealing with Unicode in your application is this, in Daniel's words: "decode everything on input, leave it in unicode format while processing, and encode everything on output (preferably using the same codec throughout)". When this is not possible, for example when your code interfaces with a library that sometimes returns Unicode and sometimes returns strings, you can use an adapter function such as this one that Daniel put together:

def adapt_unicode(value, codec):
if isinstance(value, unicode):
return value
return unicode(value, codec)
except TypeError:
return unicode(str(value), 'ascii')

The next presenter was Mark Kohler, who showed us some Python code he put together while taking part in the [WWW] Scrapheap Challenge at [WWW] OOPSLA '05. Dubbed "A workshop in post-modern programming", the Scrapheap Challenge consisted in solving problems in a relatively short amount of time (60 to 90 minutes), using the Internet as the scrapheap, i.e. reusing as much code as you can find on the Internet. The problems were short enough that they could be solved in this manner, yet complicated enough so that solving them from scratch would not be practical in the allotted time. Mark was proud to report that for one of the 3 problems, a Sudoku puzzle solver, his team (each team consisted of a pair of programmers) produced the only solution from all the competing teams. Theirs was of course a Python-based solution, using a [WWW] recipe found via Google on the [WWW] Python Cookbook site. Mark's team used the w3m text-based browser to sanitize the Web page containing the problem, fed the output to grep for filtering the lines containing the actual Sudoku grid lines, munged the grep output via a few lines of custom Python code, and sent the result to the Sudoku solver from the recipe. Nice and fast! We had some fun yesterday making the few custom lines of Python code even more compact by various tricks such as slicing a list while skipping every other character. We could have reduced everything to just one line via a list comprehension, but we stopped short of that :-)

Another fun problem was to build a so-called "integrationometer", a dial that shows how far a local copy of an svn repository has diverged from a the repository. Mark's team's solution was to use svn diff to get the number of lines changed between the local copy and the repository, then use PIL to draw a nice semicircle as a dial that showed the percentage of differing lines in red on a green background. Very neat.

A third problem (actually the first one chronologically) was to provide some way for a user to find questions that had remained unanswered in their email for more than three days. This one was the toughest one to solve according to Mark, and the solution found by the winning team involved some gmail and greasemonkey tricks that made certain assumptions about the Inbox and the Sent mailbox. Generic solutions for parsing mailboxes are not easy to get by; after all, it is not for naught that [WWW] Zawinski's Law states that "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can."

We ended the meeting with an impromptu presentation by Titus Brown, who gave us an update on his progress in writing unit tests for [WWW] twill, his Web app testing tool. Titus handed out some printouts with some nifty code he put together for testing twill itself by starting a server process that he runs twill scripts against. The server process can be anything, in his case it's a Quixote application that enables him to test form submission and other features of twill. Speaking of submission, Titus also showed us the aptly-named 'collar' application (get it?'s funny, laugh), a Quixote-based Web app he created for submitting and reviewing conference papers. He's currently creating a suite of twill tests that will serve as a regression test harness and will make the application more solid across changes in Quixote, changes in cucumber2 (which is an ORM layer that Titus wrote for talking Pythonically to PostgreSQL databases), etc.

The meeting ran pretty long, we were there until 10 PM or so, but it was well worth it. Many thanks to Daniel, Mark and Titus for presenting and to Daniel again for hosting the meeting. We won't have a regular meeting in December, because many people, including Daniel, will take time off, but Titus proposed we meet for dinner one evening, maybe in Pasadena. This sounds like a great idea and we'll discuss details etc. on the mailing list.

Here is the tentative agenda for the next regular meeting, in January 2006:

  • "metakit overview" - Howard Golden

  • "Testing a Python Web app with Selenium" - Grig Gheorghiu

Monday, October 17, 2005

Eighth SoCal Piggies Meeting

The SoCal Piggies had their eighth meeting at USC (Salvatori Computer Science Center) on October 13th at 7:00 PM. Seven Piggies attended -- Daniel Arbuckle, Diane Trout, Brian Leair, Titus Brown, Grig Gheorghiu, Mark Kohler and Howard Golden.

The first presenter was Brian Leair, who introduced [WWW] the Python Imaging Library, aka PIL. Brian talked about the main PIL modules such as Image and ImageDraw, and showed snippets of code that load and save images from files, create new images from scratch, use antialiasing, and in general manipulate images in various ways. PIL offers a dazzling array of options when it comes to image manipulation, and it knows how to deal with a [WWW] zillion file formats. Brian was especially appreciative of the text handling capabilities of PIL. He showed us how easily you can embed text and use TrueType fonts in your images. The presentation ended with a demo of an OpenGL application that Brian wrote that showed a zoomable and rotatable 3D graphic, with the text on the axis being rendered in TrueType fonts via PIL. Words do not do justice to the coolness of the app.

You can see Brian's presentation [WWW] online or you can download it as a [WWW] PowerPoint file. Brian also mentioned a Python Cookbook recipe that uses PIL for watermarking images: [WWW]

Diane Trout was next, and she presented an overview of [WWW] matplotlib, a 2D plotting library written in Python, with a high degree of matlab compatibility. Diane talked about the main plotting functions and the [WWW] various backends available in matplotlib, then ran several [WWW] sample programs distributed with matplotlib (subplots, scatter plots, spectograms, etc.). By default the graphs are plotted on a canvas that allows you to pan and zoom on the X and Y axis for any combination of the axes that are plotted -- very spectacular. Diane also showed us some code she wrote that uses matplotlib in conjuction with HTML image maps, so that when you can click on a data point in the plot and see its coordinates or display other information about it.

Some caveats that Diane mentioned regarding matplotlib:

  • The GTK backend can cause problems if you run matplotlib commands interactively from the Python shell (basically if you close the GTK canvas you can't go back to the shell, due to the GTK event loop)

  • matplotlib is under very active development, and things are not always backward compatible, so expect your code to break here and there when upgrading to a new version

You can see Diane's presentation [WWW] in PDF format.

On the same matplotlib theme, Grig handed out some printouts showing a little module he wrote called [WWW] sparkplot, which uses matplotlib to create sparklines. [WWW] Edward Tufte introduced sparklines in a [WWW] sample chapter of his upcoming book "Beautiful Evidence". In his words, sparklines are "small, high-resolution graphics embedded in a context of words, numbers, images. Sparklines are data-intense, design-simple, word-sized graphics." The sparkplot module allows you to create sparklines as PNG files that can then be seamlessly displayed within your text. For example, here is the Los Angeles Lakers' road to their NBA title lakers2002" in 2002.

As is always the case, we had lively discussions outside of the 'official' presentations, while munching on some pizza. We covered such various subjects as [WWW] TurboGears, how to unit test a testing module, whether the Python community is really less friendly than the Ruby community (as some luminaries such as Martin Fowlers are [WWW] prone to declare), [WWW] Guido's time machine, and many others. It was fun, so come join us next time!

Many thanks to Brian and Diane for presenting and to Daniel for hosting the meeting.

The next meeting will be at USC again, with the following agenda:

  • Python and Unicode (Daniel Arbuckle)

  • metakit overview (Howard Golden)

Tuesday, September 20, 2005

Pyrex is back in the game

For any who missed it, Pyrex has finally been updated with full support for Python 2.4. There is much rejoicing.

Wednesday, September 14, 2005

Seventh SoCal Piggies Meeting

The SoCal Piggies had their seventh meeting at USC (Salvatori Computer Science Center) on September 13th at 7:00 PM. Eight Piggies attended -- Daniel Arbuckle, Diane Trout, Grig Gheorghiu, Howard Golden, Manuel Garcia, Mark Kohler, Steve Williams and Titus Brown.

Diane Trout talked about her experiences with [WWW] Trac, an integrated wiki/issue tracker/source code browser/interface to revision control systems. The chief virtue of Trac is that it's written in Python. The other chief virtue is that it looks good, which may be the reason why it's starting to get traction (pardon the pun) in different open souce projects -- and here we have to mention [WWW] Django, which uses Trac in the [WWW] Code section of its Web site. The Django repository site is a good example of how attractive and polished a Trac site can be.

Diane also said the setup of Trac was relatively easy.

Trac supports Subversion out of the box, but Darcs can also be integrated into it, as Diane showed us. The guy who integrated Darcs is the same guy who wrote [WWW] Tailor, a Python package that knows how to migrate changesets among tons of revision control systems.

Another nice thing about Trac is that it allows you to easily link from a bug description into the source code referenced by that bug. For example, to link to the README file, simply include source:README in your bug description. The reverse is also true, i.e. you can reference bug numbers when in your code check-in comments -- for example, to reference bug number 23, simply include #23 or ticket:23 in your commit comments. Here is more information on specific [WWW] Trac Wiki formatting rules.

Diane also mentioned [WWW] TestOOB, which is a unit test tool that extends the default unittest module with things such as HTML/XML output and colored console output. The thing that Diane found most useful was that, when run with --debug, testoob drops you into the PDB debugger when it encounters any test failures.

Titus Brown was next, and he talked about [WWW] why he likes Darcs. [WWW] Darcs is a distributed revision control system with some solid math theory behind it. All working copies of Darcs are full repositories, and you can easily sync repositories by means of 'push' and 'pull' commands. Patches can be sent from one developer to another via email, and the repositories can very easily be made available in a read-only format by simply making them accessible through the Web, for example by putting them somewhere under the DocumentRoot directory of your Apache web server (in fact, this was one of the main reasons why Titus switched from CVS to Darcs).

One minor Darcs annoyance is the difficulty to install from source (installing from binary is obviously no problem, assuming a binary exists for your distribution). Darcs is written in Haskell, so the [WWW] Glasgow Haskell Compiler is required, and if you wish to install Haskell from source, you have a boot-strapping problem, because Haskell is written in itself.

Had some commentary about how nice Darcs would be for people managing projects with users on more than one site, with subtle changes in the configuration at each site. Each site would have its own darcs repository, along with the main one. Patch the main repository or an individual site, and then cherry-pick the changes to distribute to all. Very clean.

Many thanks to Diane and Titus for presenting and to Daniel for hosting the meeting.

The next meeting will be at USC again. Here are some topics for our next meetings:

  • PIL overview (Brian Leair)

  • matplotlib tutorial (Diane Trout)

  • metakit overview (Howard Golden)

  • twill presentation (Titus Brown)

  • Selenium tutorial (Grig Gheorghiu)

Wednesday, August 24, 2005

I'm baaaaaack...

I'm back.

I spent most of June and July (6 weeks total) at Embryology
Boot camp
in Woods Hole, MA; came back to California after
stopping off in NY and Minnesota to visit parents & relatives; and
then went to Antibes, on the Cote d'Azur / French Riviera for a week
with my in-laws. I've been back just over a week, and I spent most of
that week dealing with long-neglected tasks and restarting my research

Embryology Boot Camp

The Embryology course was really amazing. It was an intensive 6 wk
intro to animal development; we covered worms, flies & other
arthropods, four vertebrates (chick, mouse, Xenopus, and zebrafish),
ascidians, sea urchins, several annelids, ctenophores (comb jellies),
cnidarians, clams, other lophotrochozoans, and more clams. The format
was 2-4 hours of lecture a day, interspersed with 6-12 hours of lab
work, 6 days a week; each day usually stretched from 9:30am 'til 2am.
Drinking -- occasionally heavy drinking -- was part of
the course (par for the course? ;), as was late night swimming and
almost weekly parties.

The instructors were all excellent. Each lecture was typically
given by a different professor, and the professors were among the best
in their field. We usually had an hour-long Q&A session after each
lecture, so we could really get into their research. And in general
the lectures were coordinated with the labs, so immediately after hearing
about e.g. chick development, we could go into the labs and work with
chick eggs.

The lab part of the course was stunning: each group of teaching assistants
brought reagents and technique handouts, and were available throughout the
lab period to answer questions and help with techniques. I've never had
so many expert biologists waiting hand and foot on me ;).

Zeiss brought several million dollars worth of microscopes to the
course, too, so I could play with scopes that I simply don't have
daily access to even here at Caltech. They did keep on breaking their
own scopes by swapping out hardware, but that just means that I'm now
somewhat expert at troubleshooting their microscopes!

The best part of the course was the other students. There were 25 students
total, and I got to know the other 24 students really well. It was definitely
a boot-camp atmosphere, with the attendant intimacy and group dynamics.
Good stuff. I expect to be seeing 'em all again over the next few years.

It's hard to really explain how great this course was, but that's enough
gushing for now ;). I'll post some pretty pictures from there over the
next few weeks

Antibes and the Cote d'Azur

If you've been there, you understand. If you haven't, go. The
Nice/Antibes/Cannes area is fantastic. We stayed at the Three Palms, which I highly recommend.

For-work & for-fun programming

I'm slowly catching up on stuff. There are some twill issues
that need to be dealt with. I'm also working on updating the Cartwheel documentation to
fix the problems that an installer has been reporting. And sloooowly
but surely I'm picking up the pieces of my research from before the
summer: PhD or bust!


p.s. Looks like advogato's domain name has expired. Couldn't figure out anyway to renoo it.

Friday, August 05, 2005

Review of C++ unit testing frameworks

Not Python related, but pretty interesting nevertheless... Via Len: Exploring the C++ Unit Testing Framework Jungle

Good introductory article on SQLObject