Lucian Marin

Location validation on Subreply

Since 2014 when Subreply was launched as Sublevel, I always wanted for users to specify their location. After years of programming in Python I found a way to validate City, Country and Country formats in a simple and intuitive way.

The code

The first part of the problem is to have all data available so you can match the user input with the expected output. I do this by fetching world cities from Simplemaps as a hashmap or Python dictionary. There are some corrections applied to city and country names. I wanted a Django management command so I can run it every time Simplemaps update their public dataset.

import csv
import io
import json
import zipfile
from collections import defaultdict

import requests
from django.core.management.base import BaseCommand


class Command(BaseCommand):
    help = "Fetch world cities."
    url = "https://simplemaps.com/static/data/world-cities/basic/simplemaps_worldcities_basicv1.73.zip"

    def get_csv(self):
        r = requests.get(self.url, stream=True)
        c = io.BytesIO(r.content)
        with zipfile.ZipFile(c) as zip:
            with zip.open('worldcities.csv') as file:
                return io.StringIO(file.read().decode())

    def fix_country(self, value):
        replaces = [
            (", The", ""),
            ("Korea, South", "South Korea"),
            ("Korea, North", "North Korea"),
            ("Curaçao", "Curacao"),
            ("Czechia", "Czech Republic"),
            ("Côte D’Ivoire", "Cote d'Ivoire"),
            ("Congo (Brazzaville)", "Congo-Brazzaville"),
            ("Congo (Kinshasa)", "Congo-Kinshasa"),
            ("Micronesia, Federated States Of", "Micronesia"),
            ("Falkland Islands (Islas Malvinas)", "Falkland Islands"),
            (", And Tristan Da Cunha", " and Tristan da Cunha"),
            (" And ", " and "),
            (" The ", " the "),
            (" Of ", " of ")
        ]
        for before, after in replaces:
            value = value.replace(before, after)
        return value

    def fix_city(self, value):
        replaces = [
            ("Beaubassin East / Beaubassin-est", "Beaubassin East"),
            ("Islamorada, Village of Islands", "Islamorada"),
            ("Dolores Hidalgo Cuna de la Independencia Nacional", "Dolores Hidalgo"),
            ("`", "'")
        ]
        for before, after in replaces:
            value = value.replace(before, after)
        return value.split(" / ")[0]

    def handle(self, *args, **options):
        world = defaultdict(set)
        reader = csv.DictReader(self.get_csv())
        for row in reader:
            country = self.fix_country(row['country'])
            city = self.fix_city(row['city_ascii'])
            name = f"{city}, {country}"
            if name.count(", ") == 1:
                world[country].add(city)
        for country, cities in world.items():
            world[country] = sorted(cities)
        with open('static/worldcities.json', 'w') as file:
            json.dump(world, file, sort_keys=True, indent=4)

The validation

Loading JSON in memory as a Python dictionary is straight forward. Validating user input against an country: [cities] hashmap is simple as well. First you validated the country and then you validated the city is part of that country.

with open('data/worldcities.json') as file:
    WORLD_CITIES = json.load(file)

def valid_location(value, delimiter=", "):
    if value:
        if value.count(delimiter) > 1:
            return "City, Country or just Country"
        elif value.count(delimiter):
            city, country = value.split(delimiter)
            if country not in WORLD_CITIES:
                return "Country is invalid"
            elif city not in WORLD_CITIES[country]:
                return "City is invalid"
        elif value not in WORLD_CITIES:
            return "Country is invalid"

Even if you don’t have an user on Subreply, you can try it out on register page by filling only the location field. Enjoy sharing your city where you work from!