Implementing XKCD’s Map Reactions: Part 1

When I saw the XKCD strip about Reaction Maps my first thought was how much fun it would be communicating like that with my friends. Well, you probably can guess the second thought now. It was about how great it would be to implement something that enables me to do so.

Photo by Timo Wielink on Unsplash

Planning the App

In the following lines I’ll explain a bit more about how the app will be structured. Basically the frontend will be a very dumb client, while the backend holds all the logic and phrase to route mapping.

Frontend

Mockups for the map reaction app.

As I’m currently learning React, I will implement the frontend with it. Because of the simple user interface it seems like a perfect beginner project to me.

Backend

# REQUEST 
GET /route
parameters:
- phrase: String
# RESPONSE
- 200 OK: Returns a list of places that maps to the given phrase
- 400 Bad Request: No or invalid phrase passed
- 401 Unauthorized: The user is not allowed to make this request (see next section - "Security")
- 404 Not Found: No map for the whole phrase or a word found

As the application will probably idle a lot I will deploy it to a serverless function, to be more precise, to an AWS Lambda function implemented in Kotlin. To stay inside the AWS ecosystem, I will use AWS RDS to host a PostgreSQL database containing the names of all places (cities, towns etc) in the United States. As this will be a very small project it will be perfect to fiddle around with technologies that are mostly new to me. That’s one of the biggest reasons I decided to implement it with React and AWS Lambda (amongst other things).

Security

Implementing the Backend

Database Setup

Phrase to Route Mapping: Basic Idea

There are multiple ways to map a sentence to a route. The most obvious approaches probably are mapping it word by word or mapping the whole sentence as one. Mapping the sentence word by word may make it easier to read the result for the end user as he doesn’t have to think about splitting words in his mind. But what if there is no good match for a word or the match would be better if two words are matched together? To describe this case we can even take an example provided in the comic strip: “to loose” was mapped to “Toulouse”. In the first iteration this problem will be ignored and the mapping will happen word by word. Depending on how good this approach works, I’ll either leave it that way or also compare the single word mapping with group mappings (e.g. always try to map the word alone, but also try to map it with the previous and next word to find the best possible result).

Another challenge will be how to decide for a single match if there are multiple results. To solve this problem a string similarity metric might be a good solution.

Phrase to Route Mapping: First Try

To test Soundex I simply created a small utility class that returns all phonetic matches for the specified words. The result is returned in a simple JSON object. Below the results for the word “truly” are listed:

Well…. Not really the result I was hoping for. It even returns some places where i can’t imagine how they should sound like “truly”. Some of these words are “Thayer Hollow”, “Torrey Hill”, “Torro Well” and so on. Of course there are some good matches as well, but all in all there are too many false positives to continue with that approach in my opinion.

Phrase to Route Mapping: Second First Try

Looking at these results it became clear that Match Rating Approach Encoder and Refined Soundex are definitely not an option. For most of the words they return a ridiculously huge number of results while they seem to have problems with very short words, wherethey sometimes don’t even return a single result (this especially applies for Refined Soundex).

On the other hand Beider-Morse & Nysiis seem quite appealing because of the low number of results. But especially the results for “friend” are interesting. Here Nysiis returns 420 results, while Beider-Morse only returns one. So lets take a look at the detailled results for these algorithms here.

When looking at the results the huge amount of matches for “friendship” with the Nysiis algorithm becomes understandable very fast. It also included all words that start with friendship (e.g. “Friendship Cemetery”) while Beider-Morse only returned “Friendship”. Besides that the results of both algorithms seem to be quite good but with the given test words I couldn’t find the one that is really better than the other. It really seemed to depend on the word which one performs better.

Therefore I decided to match the words against both of the algorithms. This means if a word is passed to the matching service, first the matches for Beider-Morse and Nysiis are computed. Next the result sets need to be combined. The first step to do so, is to check if there are results that are contained in both result sets. If yes, return these matches. If there is no intersection between the two sets, the union of them will be returned. If none of them returns a result Soundex will be used as fallback.

At this moment the service returns a nice result set, but still we need to decide for one of the results. To find the best one the Levenshtein distance seems to be a good approach. It is a string metric that measures the difference between two strings. So the Levenshtein distance of all results is calculated, and the best ones are returned. With this approach we can narrow the results down a lot, but still we can have multiple of them. For now just a random result will be chosen to be returned, but for the future I’m thinking about implementing something that decides which of the matches fits best into the resulting route.

To avoid that we always need to encode all places with the described algorithms I added a database migration which encodes all places with each algorithm and stores the computed codes and the place they belong to to the database.

After all of the above described logic has been implemented, it was time to run a first test. Passing the phrase “Truly sorry to loose a friend this way” returned "Truly","Surry","Taus","Lozes","Ai","Friend","Teas","Way", which is a result, I'm very happy about. One thing I was not so happy about, was performance - it took 8 minutes and 3 seconds (483654 ms) to compute this result, which is not acceptable.

Performance Optimization

CREATE INDEX ON beider_morse_encoded_places (code); 
CREATE INDEX ON nysiis_encoded_places (code);
CREATE INDEX ON soundex_encoded_places (code);
CREATE INDEX ON places (lower(name));

Computing the same result as above now takes only 832 ms — hooray! :D

This is already quite a big improvement but what about some further optimizations? There are some things already mentioned above that would be nice in the future. In the next section you can read up about them.

Possible future improvements

Group matching

Route optimization

Dictionary

TLDR;

Resources

You can try the backend solution by sending a GET request to https://api.map-reactions.ksick.dev/v0-1/route?phrase=[your phrase, separated with blanks]. E.g. https://api.map-reactions.ksick.dev/v0-1/route?phrase=truly%20sorry%20to%20loose%20a%20friend%20this%20way

https://download.geonames.org

Originally published at https://ksick.dev on June 11, 2020.

Creative and detail-oriented software developer. Advanced from mobile to backend development and now getting into full stack solutions.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store