A friend of mine sent me the following email, which I’m just going to include in its entirety, because she’s awesome:
Good morning. This is going to be long winded but hear me out. Because I think this is cool as fuck. OK for the past year I have been reading a book called Viral Storm by Nathan Wolfe. I say the past year because I started and stopped at least three times. I find the subject interesting but I usually have 3 books in flux and well, I got distracted. In light of the recent Ebola outbreak I decided to pick it back up again because with some of the new viruses running around like Enterovirus D68, I thought this guys idea of a mutant Ebola/Flu pandemic prophecy might be coming true. I was skimming back over the first chapters again and came across some ideas I had highlighted. When I read the paragraphs below I remembered thinking, “I want to do this”. I saw the Computer Scientist reference and immediately thought of you. If anyone could do this it would be you.
Here is the passage:
Viruses manage to function with such few genes through a variety of tricks that allow them to maximize the impact of their diminutive genomes. Among the most elegant is a phenomenon called overlapping reading frames . As an analogy, take a poem of around thirteen thousand letters— say, T. S. Eliot’s poem The Waste Land . It has roughly the same number of letters as the Ebola virus has base pairs. When you read The Waste Land , it has meaning, tempo, reference— all of the characteristics we normally expect from literature. In the same way, the genome of the Ebola virus has meaning, with base pair letters making up genes that get translated into the proteins that provide the virus with its capacity to function. If you take the first stanza of The Waste Land , around a thousand letters, and begin to read it starting with the second letter instead and move the first letters of the other words, it’s a disaster. “April is the cruelest month” becomes “Prili sthec rueles tmonth.” Nonsense.
Now imagine that embedded within the stanza was a second poem so that both readings, the one that starts with the first letter and the one that starts with the second letter, lead to fluent comprehensible verses. Now imagine that you took the same stanza and read it backward and that a third hidden stanza emerged from the same letters. This is precisely what viruses can do. A good challenge to poets (or perhaps computer scientists) would be to create such a stanza to see if they could be as creative as natural selection has been with viruses. Viruses with overlapping reading frames use the same string of base pairs to code up to three different proteins, an incredible genomic efficiency, which makes their small genomes pack a much larger punch.
So, the challenge would be to write a poem that made sense using the normal first letter to start, using the second letter to start and to top it off have it make sense backwards.
Boom, there it is. The ultimate viral poem. Now the question is, would it spread?
My initial reaction when I read this was: no way. Of course, that’s my initial reaction when I get any challenge. I immediately say it’s impossible, and then in the background my mind figures out if maybe it isn’t. Obviously the backward part is possible. “A man, a plan, a canal, Panama!” proves that. And I figured the second letter thing was also possible, although I’ve never seen that. But getting both of those at the same time (even with the relaxation that the reversed phrase could be completely different—it didn’t have to be a palindrome) just didn’t seem like something our language can support.
But I could maybe explore the problem a little anyway.
To begin with, I’d need to figure out how to make it workable on a computer. Parts of it are straightforward, but natural language is very hard, so it’s nearly impossible for a computer to recognize whether a series of words actually make any sense together. Then I thought of predictive poetry.
Predictive poetry is a fun thing to do on your phone. You put in a word. Start with a poetry word. Let’s use Balance. Then the phone suggests three words that you might choose next. I get of, the, and between. Let’s go with between. My next three choices are the, a, and us. I choose a. And so on. My final poem:
Balance between a good time waster and a half hour of sleep
Which actually makes a lot of sense and is probably a decent description of what you are doing right this very minute! In predictive poetry, the computer and human are working together. The computer is suggesting and the human is choosing. As long as you are deliberate, you can create some decent poems this way.
With that idea, I thought perhaps the computer could suggest sentences that meet all the rules, and then I (the poet) could pick the ones that are most meaningful.
First up, how to tackle the programming part? Clearly I need a list of words. I happen to have one, but it’s the list of words that Scrabble-like games accept, which means it has all those weird things like “aa” an “qi” and it’s really, really big (173,528 words, to be precise). Since this is a search problem, having a really big list is going to make things take a long time. Plus I’m going to get lots of answers that include those weird Scrabble words that I don’t even know. So I head to the intertubes and find a list of the 5000 most common words. We’ll start with that.
Now I need to find somewhere to begin with the code. The hardest thing about writing a poem is getting started, so let’s see if the computer can help with that. Can it find a word that if you drop the first letter, another word starts with that? It’s actually a pretty easy search problem. For each word in the list, drop the first letter and look for the remaining stuff in the list. If you find anything that starts with those letters, you’re good.
So I wrote that and got a list of potential first words. Lots of them.
Next, let’s figure out whether we’ll be able to end the backward sentence with that first word. This is a little trickier. I took my list of words and made a second list which is just all those words with the letters reversed. Now I have pretty much the same question I just asked: Does my word start something in that list? If it does, then there may exist a backward sentence that ends with my word. If nothing in the reverse-word list starts with my word, I shouldn’t use it, because I won’t be able to end on it.
“Balance,” for example is not going to work, because there isn’t any word that ends with “ecnalab.” “Bassists” on the other hand is good, because “…st sis sab” could potentially end something. The test for whether the reverse is possible is as follows:
Look through all the reversed words. For each word w: If test word t is the same length or shorter than w: If w starts with t, this is a good word! Otherwise: (w is shorter than t) If the first letters of t match w and If the rest of t is a good word, this is a good word!
I did a little thing there at the end. My test for whether a long word is a good word requires breaking it up and testing the latter part for goodness. This is called “recursion” and is something we computer geeks do all the time. It’s a way of breaking down a problem into an easy part and a hard part, and then having the hard part disappear as if by magic.
In case you’re curious, that algorithm in actual code (in a language I like called “Python,” which is named after “Monty Python” and is actually a serious computer language which is super popular right now) looks like this:
def good(t): len_t = len(t) for w in rwords: len_w = len(w) if len_t <= len_w: if w[0:len_t] == t: return True else: if t[0:len_w] == w and good(t[len_w:]): return True return False
It turns out that’s a better way to test for dropped-the-first-letter words, too. It can find cases where you drop the first letter, and what’s left starts a couple of different words. “Drawer” for example leads to “raw era” which you wouldn’t find if you were requiring the dropped-letter word to start a single whole word (“rawer”? meh, maybe that actually is a word, although my editor would probably disagree).
I ran those two tests against all the words in the 5000 word dictionary and came up with a list of words my “poem” can start with.
Then, I tried pairing each of these first words with a second word, and I ran exactly the same set of tests. This produced pairs of words that I can use to start my poem. For example “elite rats.” We can drop the first letter and we have “liter ats…” and we can end a phrase with “stare tile.” The number of decent word pairs from my list of 5000 most common words was pretty abysmal. I tried to make some poems using these and they lacked poetry.
The next step was to try this with the big list of words and see if I found anything better, but my simple search algorithms are too slow on that big list. So I rewrote those algorithms to be much more efficient. I’ll spare you the details (you’re welcome), but it involves sorting the lists and then using more recursive magic to search much more efficiently.
With that done, I could run my program on the big list, and I got tons of word pairs I can use. Too many. I tried just ignoring short words. Going to words that are at least 4 letters long was the sweet spot that gave me a workable number of options.
Only a computer would come up with the suggestion: “abutter ball” (that’s a neighborhood party with dancing, obviously). If you drop the first letter, you have “butterball” (the menu for our neighborhood party) and reversed you have “..l labret tuba” (the musical entertainment at our party). A labret is a lip piercing.
The big list produced plenty of gibberish, but nothing the least bit poetic. Back to the short list! I tuned up my algorithm so that it wouldn’t cheat and use the same words in the second stanza that had been in the first. For example, if your starting pair is “Bassist is” then the next stanza has “Assist is” so you can put any words after that and technically meet the rules. However, the intent is that you should have three different phrases here, so I rejected any reuse of words between the first and second stanza. In fact, I also rejected words that are just “s” tacked onto a word already used.
The last two changes were to let me specify how deep to go into the stanza (not just two words), and to make sure the last word works out cleanly (the above algorithm assumes you’ll be adding more words to the end).
With all those changes, I can run it on my list of 5000 common words, and get a pile of 6-word candidates to read. I pick one, add some punctuation, and presto:
Era war: Able to her abandon.
Raw Arab! Let oh’ era band on!
No DNA. Bare hotel bar. Aware.
To net age radar, ego : devil
One tag era: dare god evil?
Live dog era: dare gate not?
Tone butt onto net age. Yeah!
One button tone. Tag! Eye? Ah…
Ha! Eye gate not, not tube, not!
It’s practically fucking Shakespeare, right? Running it with 10 words leads to longer phrases, but you’ll notice that there is some commonality with the previous ones:
Trace butt onto net. One use rifle to her abandon.
Race button tone. Tone user, if let oh era band on.
No DNA. Bare hotel fire. Sue note not. Not tube cart.
Instead of going to longer phrases, the next thing to try is to see if I can take my 5000 common word dictionary and spice it up with some more interesting words from the big scrabble dictionary. I ran the program over the big dictionary and pulled out interesting words like “abuse” and “danger” and “retarget.” I added those to the little dictionary and ran my program on the new word list.
Unfortunately, that just generated WAY too many options again. I think the key to the small dictionary is that it doesn’t have that many good palindrome words. So you get a reasonable set of choices to choose from.
I’ll leave you with a few more gems from my experiments:
Here’s a little story: A congressman approaches a man at a bar with a particularly lewd suggestion. The man ditches his date and goes with the politician:
“Eye slop sexes?”
“Yes!” *lops exes*
*sexes pol’s eye*
How about this dominant professor’s to-do list:
_ Flog sex am
_ Logs exam
_ Maxes golf
Here is a sad tale of a coed who didn’t think about her schedule when she woke up in the morning:
Hoop sex A.M.
Who hasn’t this happened to?
Eye speeds trap
Yes! (peed strap) [the yes was ironic]
Parts. Deeps eye. [read: deep sigh]
How about a poem about Iran in the 1970s:
Elated in Shah sand
Late dins, hahs, and
DNA. Shah snide tale.
Flower go relive devil
Lower gore lived evil
Lived eviler ogre wolf
This one sounds like an NRA slogan:
Gun smarts—not stats!
Unsmart snots tats
Stats tons. Tram snug.
I suspect that if I spent enough time searching the streams of gibberish these programs produce, there probably are some decent poems in there. These examples notwithstanding.
So, expert linguist, did you think we could solve this? Well you know now, wonk.