Implementing XKCD’s Map Reactions: Part 1

Photo by Timo Wielink on Unsplash

Planning the App

The plan is to implement an application with the functionality shown in the below comic strip. So basically the end user can enter a phrase, which should be mapped to a route and displayed on a map.

Frontend

The UI will be really simple and should look like the below pictured mockups. So on the first screen the user can enter a phrase, which will be rendered to a map on the second screen.

Mockups for the map reaction app.

Backend

The backend will provide a very simple REST interface for the frontend server. It will just contain one endpoint which is defined like this:

# REQUEST 
GET /route
parameters:
- phrase: String
# RESPONSE
- 200 OK: Returns a list of places that maps to the given phrase
- 400 Bad Request: No or invalid phrase passed
- 401 Unauthorized: The user is not allowed to make this request (see next section - "Security")
- 404 Not Found: No map for the whole phrase or a word found

Security

Securing the application will be very easy, as the frontend will be accessible for the whole world anyways, so no authentication is necessary there. The backend (AWS Lambda function) will be accessible via an AWS API Gateway, which will be secured with an API key.

Implementing the Backend

Database Setup

Before starting with the backend itself, I created a PostgreSQL database on AWS RDS. Then I downloaded the US GeoNames from here and imported their names, latitude and longitude to the database using the Flyway Maven plugin. To do so I created a Java-based migration that fetches all lines from the file containing the place names and inserts them to the database. Of course the full source for this can be found on Github. Thanks to my friend Matthias for making this whole migration a lot faster.

Phrase to Route Mapping: Basic Idea

After some research about how to match words that sound similar I stumbled across phonetic matching algorithms and decided to elaborate on this approach. A phonetic matching algorithm matches words that sound similar by comparing their phonemes. So basically it tells you if two words sound approximately the same. According to Wikipedia a phoneme /ˈfoʊniːm/ is a unit of sound that distinguishes one word from another in a particular language.

Phrase to Route Mapping: First Try

The oldest and probably most widely known phonetic matching algorithm is Soundex. I stumbled across it on nearly every page I read, so I gave it a try. The fact that Soundex and other phonetic matching algorithms are included in the Apache Commons Codec makes the given problem a lot easier to solve.

Phrase to Route Mapping: Second First Try

Fortunately the Apache Commons Codec library contains the implementations of more phonetic matching algorithms than just Soundex. Therefore I decided to simply test all of them using the method shown above. Now it is the challenge to choose a good set of test words, which should include words that will be used a lot. As this is just a fun project, I’ll use the phrases from the XKCD comic strip for testing and to get a feeling on how good the algorithms work. Please note that this is just a quick and dirty approach! In a professional project I would probably choose some other methodology. A first test run with the word “truly” mapped with all algorithms returned a JSON with more than 20.000 lines. So I adapted my test to just return the number of matched places for the given words and algorithms to find the algorithms that return so many results. Below the number of results per algorithm are pictured.

Performance Optimization

As we got more than 2 million places also the tables for the Nysiis and Soundex encoded places have this size. Further the Beider-Morse table has more than 1 billion entries, as the Beider-Morse algorithm returns multiple phonetic codes for a word. For this reason computing the results for the test phrase took about 8 minutes. The solution for this problem is to add an index to code columns of the tables for the different algorithms and to the name column of the place table like this:

CREATE INDEX ON beider_morse_encoded_places (code); 
CREATE INDEX ON nysiis_encoded_places (code);
CREATE INDEX ON soundex_encoded_places (code);
CREATE INDEX ON places (lower(name));

Possible future improvements

Group matching

If you take a look at the comic strip, which this project is based on, you can see that “to loose” is matched to “Toulouse” there. This means two words have been combined into one to improve matching. This is something we are planning to consider in the future as it could improve the accuracy of the results a lot.

Route optimization

I already mentioned that sometimes a couple of equally good matches are returned. At the moment the first match of those is taken into the result. In the future it would be great if the match that fits best into the route would be chosen instead of the first one.

Dictionary

Adding a dictionary with already matched places could make the whole solution much faster. This seems to be like an easy one, but it must not interfere with the above described features if they are implemented.

TLDR;

Within this blog post I described how I implemented the backend of an application based on this XKCD strip. It is implemented in Kotlin and running inside an AWS Lambda function. I used some phonetic matching algorithms (Beider-Morse, Nysiis and Soundex as fallback) to map a phrase to a list of places that sound similar, and calculated the Levenshtein distance to find the best match. Thanks again to Matthias who implemented some small improvements so far.

Resources

The full source can be found on Github. The Pre-Release v0.1 contains the code that has been written at the time this blog post was published.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Katharina Sick

Katharina Sick

Creative and detail-oriented software developer. Advanced from mobile to backend development and now getting into full stack solutions.