The Linguist and the Programmer

A history of Perl, Python, and the websites that rely on them.

Col Needham started tracking movies when he was a teenager. In a homegrown electronic movie diary, he’d keep track of the cast, crew, and details of every movie he watched. When he got an email address, he joined Usenet, and found a newsgroup filled with like minded individuals. The group’s obsession was lists. Lists of movies arranged in different categories. Lists of actors and actresses in common roles. Lists that linked movies together in strange and interesting ways. The members of the group traded lists like kids trade baseball cards, and it became a challenge of sorts to identify the most unique ways to group together films.

These lists, and the movies that comprised each one, were loosely organized in a database members kept updated. Needham, an amateur programmer at the time, cobbled together a few scripts that it possible to search through the lists. He called the downloadable collection of scripts the rec.arts.movies movie database, a reference to the name of the Usenet group where it originated. Years later, a new and improved version would go by IMDb.

Rob Hartill, a member of that Usenet group, was the first to decide that what IMDb really needed was an interface for the World Wide Web. He knew that he would need something powerful and flexible enough to parse through a massive movie database that, up until now, had been passed around like a spreadsheet on message boards. He needed a programming language that could sift through all of the text in that database and reformat it in a way that could be displayed on the screen (a process sometimes referred to as data munging). For that, he turned to Perl.

A screenshot of IMDb's homepage in the early 2000's

Perl is a programming language created by Larry Wall in the 1980’s. Technically, it’s a scripting language, which is a bit of a semantic distinction but useful here for explaining what sets it apart. A scripting language does not need to be compiled to be run, a common requirement of the early wave of so-called “serious” programming languages. Not compiling means a bit of a performance cost, and errors aren’t checked until the program is run. But what you get is a language that’s easy to experiment with and easy to create with. That makes them a great fit for green programmers new to the game. And Perl was one of the first major scripting languages to break out.

Larry Wall created Perl after he spent some time at NASA’s Jet Propulsion Laboratory. In college, he studied linguistics, a background that informed how he structured and designed his language. Wall felt that programming should come as naturally as one reads and speaks. The Perl syntax, the structure of its code, is inherently expressive and approachable.

Language being singularly important, Wall made text a first class citizen. With Perl, it was simple to parse, format and scan through blobs of text of any size. Over the years, it’s become notorious for how it handles regular expressions, the process of using patterns to match and identify parts of text data. Regular expressions can be a bit confusing, but you can use them to do some truly radical things with them in Perl, and the members of its community swear by its advanced ability to, essentially, manipulate and restructure language. Just as Larry Wall intended.

The web, when it launched, was essentially one big page of text after another. So even though it came first, Perl was well suited for handling web interfaces. Hartill needed a way to search through an incredibly large database of movies, pull what he needed out, and display it on a screen. Perl was essentially a no-brainer.

Craig Newmark ran into a similar problem when he started emailing a small group of friends a list of local events. As his list of emails grew bigger, Newmark added listings for jobs and things for sale. By the time Craiglist moved to the web, Newmark needed a way to parse through thousands of local listings and organize them into different webpages. Perl was his tool of choice. Newmark used Perl to search through requests and organize them on his site. Fun fact: years after Craigslist became a national sensation, Larry Wall would spend a bit of time employed by the company.

Craigslist’s look when it launched wasn’t really all that different than today’s minimalist design

Perl has, at times, been referred to as the glue that holds the web together. It’s an apt metaphor, and one that highlights its greatest strength. Perl can be used as the backbone of a site, to connect to a database, pull information out, parse it, and spit it out on the screen. That’s how it was used by IMDb and Craigslist. It was famously an integral part of Yahoo. But perhaps more crucially, it doubles as a handy utility language, one that can be used for automation, or the aforementioned data munging, or to toss data from one system over to another. The long tail of Perl on the web is an amalgam of countless sites that use it for just one thing or another, something quick and necessary, and almost always working tirelessly behind the scenes.

Perl has a number of baseline platitudes that both its creator and users of the language adhere too. “Easy things should be easy and hard things should be possible” serves as an unofficial motto for the Perl community. It’s both a clever way of expressing Perl’s guiding philosophy and a unique turn of a phrase that Wall, as a linguist, tends to enjoy. There’s another saying you may come across that’s a bit ore concise. “There’s more than one way to do it.” Perl doesn’t prescribe solutions, it leaves those choices up to the programmer.

Now what if you were to flip that on its head and take another approach? For as simple as Perl is, having more than one way to do something can be a bit daunting to the first time programmer. An alternative approach would be to let the language expose a clear and obvious way to do something. Rather than leave decisions entirely up to the programmer, the language can point you in the right direction. In some cases, it can even make decisions based on assumptions about these best practices. In other words, if you want to do something, “There should be one – and preferably only one – obvious way to do it.”. For that, there’s Python.

Python’s another scripting language which was actually first developed not long after Perl was released, in 1990. It’s is an offshoot of ABC, a Dutch prototyping language developed at CWI, where Guido van Rossum was working when he come up with an idea for a more stable version of the language that could be used for scripting. At the time, van Possum was reading scripts from the popular television show Monty Python’s Flying Circus, so that’s where he got the name. It would take a few years for him to get something he was fully comfortable with and ready for widespread use. Version 1.0 of Python was released in 1994. Van Rossum  served as the semi-official BDFL (Benevolent Dictator for Life) for the project until just last year, a role Larry Wall continues to occupy for Perl.

The first version of Pinterest, pictured above, launched using Python

Because of the slight delay at the beginning of the project, Python caught more of the second wave of web apps in the late 90’s and after the dot-com boom of the early 2000’s. Sites like YouTube, Instagram, Pinterest, and perhaps most famously Reddit, still use Python to this day. As more programmers turned to Python as their preferred language, a rivalry heated up.

Python and Perl are, in many ways, very similar. And it’s only because of this similarity that comparisons of their relatively minor differences are so common. Calling those differences minor probably has a few people reading this feeling a bit defensive, but it’s true. Like Perl, Python is an excellent language for parsing, processing and reformatting large chunks of data. Its regular expression engine isn’t as powerful as Perl’s, but it’s still an above average implementation. As a scripting language, it deals with dynamic data structures and integrates with databases easily, and comes with bolted on extensions that can be imported and used in the library. And both can be used to write quick and dirty scripts, or scaled up to complex, object oriented applications. The structure and approach of both languages is to get its users into the code and productive as quickly as possible, and allow for lots of things to be possible. And the projects are both run by longstanding, charismatic, and slightly eccentric personalities.

The rivalry of Perl and Python, captured beautifully by XKCD

There is one driving difference between the two. In many ways, Python is far more rigid than Perl. Its approach is to point programmers to the “right” way of doing things, so that most programmers are reaching for the same solution to common problems. This is taken to the extreme even in the design of the language. In Python, whitespace counts. The amount of spaces or tabs between code is used to script its general hierarchy A stray space somewhere in there might cause your whole program to fail completely, so everything needs to be spaced just right. That may seem like an odd choice, but listen to Reddit’s co-founder Steve Huffman on why Python is so appealing to him:

Just so I can read it. And it’s awesome because I can see from across the room, looking at their screen, whether their code is good or bad. Because good Python code has a very obvious structure. And that makes my life so much easier

By adding a whitespace constraint, Python actually frees up the language a bit from unnecessary brackets or braces, and it makes code from one programmer to the next look exactly the same. It enforces structure simply by making it a requirement. This is the kind of thing that crops up throughout Python’s implementations. It leads its developers towards more object oriented code and common configurations. But those conventions often come at the cost of flexibility.

And that argument, flexibility versus approachability, has been at the heart of the Python and Perl rivalry that’s been going on for a couple of decades now. From time to time, the fires of that rivalry have been stoked. Like when O’Reilly published an ad that seemed to favor Python. Or when Larry Wall makes a quick quip about Python in his conference talks.

O'reilly ad that reads "Perl? Ha, ha, ha. Try Python!"
That O’Reilly ad

In truth, the rivalry is, in most cases, in good spirit. There are plenty of Perl programmers that use Python and vise versa. Both are making plenty of headway in fields like natural language where data needs to be stripped and processed on the fly and quickly. They are interesting to me, and to this project, because of how they were adopted to the web. When the web gave us messy blobs of text and rows and rows of databases, two programmers gave us a languages that could take those blobs and give us back something beautiful.

This post introduced 4 milestones to the Timeline.

  • Craigslist

    What began as an email list for events and random classifieds launches on its own domain, the eponymous craigslist.org, founded by Craig Newmark and Philip Knowlton. The small list of classifieds would soon expand to city after city, allowing anyone to post their listings with notoriously view restrictions and very little in the way of overt advertising or ornate design.

  • Perl 5

    Larry Wall releases Perl version 5, a complete rewrite of the programming language that was originally released in late 1987. Perl 5 was the first library to feature first class support for database interfaces, and would soon be used to create an entirely new generation of webpages and web-based applications.

  • Python

    The first stable verison of the Python programming language is officially released by Guido van Rossum. Though Python is not strictly a web language, it had references to the web and to HTML in its 1.0.0 release notes, and is often used in web applications to manipulate, process, and format data.

  • IMDb

    Col Needham publishes a few Unix scripts to a Usenet group for browsing and searching through a user generated index of movie lists subdivided into several categories. He calls it the rec.arts.movies movies database. Years later, Needham and a few others would move the interface online and incorporate officially as IMDb.

View the Full Timeline

Sources