City Labels in OpenStreetMap

In the second part of his critique of OpenStreetMap Justin O’Beirne discusses various issues surrounding labelling of cities in OpenStreetMap’s cartography, specifically in our default mapnik rendering of the US.

The issues he highlights can be broadly divided into two categories: problems with our stylesheets and rendering technology; and problems with our data, and in particular with our US data.

The issue which I intend to address here is the one he tackles first – that of label density which is something that stems largely from data quality and, more importantly, consistency issues. Specifically, although the post talks about cities, the real question is about what is tagged as a city and what is tagged as some lesser type of place.

By way of explanation I should probably start by explaining that in OpenStreetMap tagging there are four commonly used used values for the place tag which designate a populated place. In order, from largest to smallest, those are: city, town, village and hamlet. The question which then arises is, how do we decide which of those values to use for a given settlement?

Like so many tags the specific names used come, because of OpenStreetMap’s origins, from typical British usage. It is therefore generally not a good idea to interpret the names too literally in other jurisdictions — indeed some tag values like highway=trunk aren’t even interpreted literally in England!

To the British the question of which places should be cities is fairly clear — there are a few alternative definitions (places with royal charters vs places with cathedrals) but those only relate to a few edge cases and in general there is little debate and only a relatively small number of large and/or important towns will qualify.

At the other end of the spectrum a hamlet would normally only be used for very small places that amount to little more than a handful of houses.

In between lies the distinction between villages and towns which is much less well defined but in my opinion would generally lie around the few thousand mark in population terms — once you reach 2-3 thousand residents you are probably a town rather than a village.

Interestingly the OpenStreetMap wiki disagrees a little here and suggests hamlet for populations up to one thousand and village up to ten thousand. I would argue that both of those values are too high for normal British usage and certainly larger than I would use when tagging places.

All of which brings us back to the variations in density in the US map…

The first thing to understand about the US is that most populated places there appear have been initially imported from the USGS GNIS data set. I haven’t found any documentation as to how places were categorised but I suspect it was done based on population and most likely using the values in the OpenStreetMap wiki or something close to them.

Justin’s first example starts with the apparent high density of places in Florida so I took a look at a randomly selected place in his example which appeared to be fairly small — the town(?) of Frostproof. The OpenStreetMap history for Frostproof reveals that it was originally imported from GNIS as a village (probably because of it’s population of 2922) but has recently been retagged as a city.

My suspicion is that this is the result of an overly literal interpretation of the place=city tag – as I understand things many relatively small places in the US officially style themselves as cities — certainly Wikipedia describes Frostproof in this way. Nobody in Britain, or indeed probably in Europe as a whole, would consider somewhere that small to be a city however and tagging it as such certainly goes against normal OpenStreetMap tagging practice.

In most of the rest of the US no such retagging of small towns as cities appears to have taken place, making place names there appear much less dense at low zoom levels. The sort of places which Justin’s article suggests should be appearing in those areas mostly appear to be in the 25-100 thousand population range and hence have been tagged as towns during the GNIS import. The solution here, if more place names are considered cartographically desirable, would either be to adjust the threshold at which places are tagged as cities instead of town, or to alter the stylesheets to render towns at lower zoom levels.

The relatively high density around Los Angeles which the article mentions appears to be the result of a fairly large number of places with populations just over the 100 thousand mark. Despite their large populations, and the fact they are likely independent cities legally, I suspect that many of them would be tagged as suburbs in Britain rather than as cities or towns and hence would be given lower priority when rendering.

The real lesson to be drawn from all this however is that the US OpenStreetMap community probably needs to reach a consensus on how to map populated places to tag values so that a better level of consistency can be achieved with less variation from area to area across the map.


    • That might work in the US where you have the GNIS data set with population data but in the rest of the world it’s a lot harder.

      The place tag provides a simple indication of the rough importance of a place which can be set with the use of nothing more than a small amount of common sense.

      If you want to tag the formal status of the place according to local political/administrative classifications then feel free – just use a different tag for it.

        • You base your rough population estimate on surveying the place in question. If you don’t have a definitive source of data (e.g. government figures) and can’t survey a place in person, you probably shouldn’t be choosing which place=* tag to use, and should leave it to another mapper who can.

    • This would complicate the city vs. suburb case that Tom mentions though. No, I don’t think rendering priority should be based on population at all. Even google hits count might be better as importance indicator, although importance is not the good criterion either. I think you could distinguish two different issues here, the zoom levels at which to show a label, and the label size, and use two different criteria.

      • The beauty of OSM is everyone can choose their own criteria for what’s important. Want a map based on population? You can have it. Want one based on land area? You can have that too. Date of incorporation? As long as you have the data. And so on…

  1. So I’m being lazy here and haven’t looked. But most wikipedia entries have the population. Most wikipedia data is in the dbpedia in an accessible format to suck in. So the data is there for the taking for a large amount of the world?

    • I think much of the population data in Wikipedia is copied from official/government sources, which may be copyright in some countries. Wikipedia considers copying “facts” from other websites/sources to be OK in terms of copyright, OSM doesn’t.

      Just like most of the coordinates in Wikipedia are probably derived from Google Maps (or other copyright maps), so its considered to be a bad idea to import this OSM.

      • However, the idea of using dbpedia (or similar) is interesting in that it can be pulled in at rendering-time. This has the benefit of us not needing to keep everything in sync as well as not polluting the OSM database with copyright material. Then, the only question is what effect this copyright status has on the rendered tile – I would say little but that’s for the lawyers to sort out.

  2. The same blog had a post a little while back about the importance of cities, which he argues is more than just population:

    I think the city/town/village/hamlet distinction is too restrictive in terms of tagging a hierarchy of population centers in such a way as to allow for mapping city names in the way he’d like to see. I know there are people thinking about an “importance” tag; I think that could help rendering decisions in a number of areas. (For instance, I’ve seen the concept mentioned in terms of distinguishing major airports like Dulles International from regional or private airstrips.)

    • The airport rendering issue could be solved much more easily by using things like the IATA code, which appears to be tagged on most international airports. No point in adding another subjective tag when objective information exists.

      • Sorry…I was under the impression regional airports did not IATA codes, but it seems they do. This would at least get rid of the problem for private airports. Perhaps some other attributes could be used to distinguish regional aiports?

  3. Pingback: » OpenStreetMap: What’s Wrong With The Picture

Leave a Reply