Initial draft "proposal" for an ATGeo location record (lexicon)

This is an initial draft proposal for an ATGeo location record (lexicon). In fact, it’s more of a suggestion rather than a proposal for the purposes or stimulating discussion and understanding issues and requirements. This document also lives over here:

https://github.com/whosonfirst/go-whosonfirst-spatial-atproto/blob/main/docs/LEXICON.md

The model described here is a simplified version of the Who’s On First “standard places response”:

https://github.com/whosonfirst/go-whosonfirst-spr/blob/main/spr.go

Model

Individual properties are discussed in detail below.

type Status uint8

const ( _ Status = iota
      Current
      Retired
      Superseded
      Deprecated
)

type Place struct {
     ID string `json:"id"`
     URI string `json:"uri"`
     Name string `json:"name"`
     Placetype string `json:"placetype"`
}

type Location struct {
     Place
     Hierarchy []*Place `json:"hierarchy"`     
     Status Status `json:"status"`     
     SupersededBy []string `json:"superseded_by"`
     Supersedes []string `json:"supersedes"`
     Geometry *geojson.Geometry `json:"geometry,omitempty"`
}

Place

A “place” is a minimum set of properties shared between both a location record and other records that parent or are ancestors of that location. That is to say an ancestor referenced in a location record has less information that the location itself.

ID

A stable, permanent (canonical) identifier for this place within the context of the location provider (gazetteer).

URI

The HTTP-addressable URI where this record can be resolved to in full. That fully-resolved record may, or may not, be an ATGeo lexicon response.

Notes

In this example, the semantics of the ID “provider” are assumed to be accounted for by the TLD of the domain. That might need to be revisited.

ID and URI could of course be simplified to a single property: A namespace prefix, mapped to a URI, and an identifier separated by a colon but then you either have to choose between all the hassle and complexity of XML namespaces or the ease, but potential ambiguity, of Flickr-style machine tags.

Who’s On First addresses this issue by endeavouring to map all properties and their sources (namespaces) to machine-readable documents that can be derived from reliable URI templates. For example wof:placetype maps to:

https://github.com/whosonfirst/whosonfirst-properties/blob/main/properties/wof/placetype.json

And wof maps to:

https://github.com/whosonfirst/whosonfirst-sources/blob/main/sources/wof.json

Name

The principal name for this location record. This is distinct from a more complete label. For example “Montréal” (name) versus “Montréal, QC, Canada”.

Notes

But what language? Exactly. The mechanics of specifying one or more names remains to be worked out. For example, note the accents in the French spelling of “Montreal”.

The approach that the Who’s On First project has taken has been to say that every record has a wof:name property in the “default” language and then zero or more language/dialect specific name: properties. For example: San Francisco has close to a “bazillion” translations:

https://spelunker.whosonfirst.org/id/85922583/geojson

Importantly as of this writing the “default” language is English but in the future it may be something else. The point is to enforce a common label, with consistent semantics, across all records.

Assuming that an ATGeo record does not want to enforce a “default” language then it stands to reason that name should not be a string but rather a struct containing label and language details.

As with the status property, discussed below, then what all of this suggests is (possibly) the need for language-specific name/label xRPC lookup methods from which translations can be derived.

As mentioned the name property is not meant to be an application-specific label nor is it meant to encode location-specific metadata, for example the address of a venue.

Placetype

The type, or descriptor, for this location.

Notes

Like names, placetypes are harder than anyone would like. In this example “placetype” is defined as a string, as opposed to fixed list, which opens it up to being a free-for-all.

Who’s On First takes a different approach. In the WOF model there are three different types of “places”: Common, optional and common optional. Any place can have any one of those placetypes and can have as complex a hierarchy of ancestor (placetypes) as necessary to represent its reality.

The only rule is that every place has a minimum of one common placetype in its hierarchy. This ensures that any two (or more) projects have a shared set of (“common”) placetypes that they can use to match place records regardless of the details or nuanced required by anyone project.

Who’s On First placetypes are discussed in detail here:

https://github.com/whosonfirst/whosonfirst-placetypes

Like sources and properties, every place type has a machine-readable representation. For example:

https://github.com/whosonfirst/whosonfirst-placetypes/blob/main/placetypes/campus.json

Although these records do not currently have language-specific name: translations there’s nothing preventing that happening.

Location

These are properties specific to a location associated with an ATProto message/event/whatever.

Hierarchy

An ordered list of ancestors that a location is “parented” by. This is meant to be used to construct application-specific labels for a location. For example “Vancouver, British Columbia CA” rather than “Vancouver”.

Notes

The property has all the same language-related issues of the name property.

Status

An enumerated list of possible states for a location record: current, retired, superseded, deprecated.

Notes

I am not convinced this is really necessary or practical in an ATGeo record since once embedded in any given ATProto message there’s no way to update it after the fact (for example if a given location record is deprecated or otherwise retired).

This is perhaps better suited to a geo-specific xRPC method associated with location records?

SupersededBy

One or more location records that supersede this location.

Notes

As with the status property it’s not clear to me that this makes much sense in an ATgeo record.

Supersedes

One or more location records that this location supersedes.

Notes

As with the status property it’s not clear to me that this makes much sense in an ATgeo record.

Geometry

A GeoJSON geometry element. Importantly, for privacy reasons, this property is OPTIONAL. Importantly this means that ATGeo location records are associated with the “idea” of a place, as defined by its id and uri properties, rather than its geographic representation.

Notes

Allowing for any valid A GeoJSON geometry element allows records to encode complex geographic data which better represents its reality. In the case of administrative locations this also allows for a minimum-bounding rectangle to be derived which allows applications to more accurately visualize that location on a map.

At the same time the absence of explicit “centroid” coordinates may prove problematic for applications. Simply deriving the “center” of a complex geometry is not always the best, or correct, point on which a human-readable label should be placed on a map. For example the geographic center of the city of San Francisco is 15 miles west of the city, in a Pacific ocean, since the city, as a legal entity, also encompasses the Farralon Islands located 30 miles of the coast.

Who’s On First addresses this issue by overloading the term “centroid” to mean “area of focus” and then defines one or more of a series of named-centroids to associate with any given record. Those named centroids are descibed here:

https://github.com/whosonfirst/whosonfirst-geometries/blob/main/geometries/README.md

@aaronofsfo.bsky.social Could you expand on the motivation for including Status / SupersededBy / Supersedes in the WoF “standard places response”?

My instinct is that this might be more metadata than the median ATProtocol app developer really wants or needs. However, I don’t have any real justification in support of that view, beyond a vague desire to offer up the simplest thing that covers most use cases? I could absolutely be persuaded either way.

The documentation you linked to seems to categorize places by containment hierarchy, rather than function. How would we model a city park, or, say, a hot dog stand?

Is the intention here that the URI is specifically an HTTP URL that returns a complete machine-readable document describing the place?

And is the idea that the ID is durable and globally unique, and, while the URI might or might not embed the ID, it doesn’t matter because the URI serves a different semantic function than the ID?

Say more about your conception of the semantics of “name” versus “label”?

@essentialrandom.bsky.social has been very emphatic in the opinion that having to iterate through a list of name structs to find the “primary” or “canonical” name is just bad developer ergonomics.

While I agree in principle, it means somehow distinguishing:

a) “this is a canonical name in some language and usage” (and also “here is the language or usage of the canonical name represented”) versus

b) “here is a list of structs representing all of the names (labels?) we know about about for this place and which language/usage/representation each conforms to”

I don’t think this is a difficult distinction to grasp semantically, but I do think we want to come up with precise terminology (and Lexicon property naming) or people are going to get easily tripped up over which “thing” we are talking about when we say “name” or “label”.

(witness the confusion I have sown over “gazetteer server” versus “gazetteer” which are two totally different things and I know, I’m sorry)

Could you expand on the motivation for including Status / SupersededBy / Supersedes in the WoF “standard places response”?

Venues, in particular, are superseded all the time. For example the physical address that houses “Rock Bar” in San Francisco has been home to a dozen other venues in the past. Likewise the “Palace Steak House” which is now something else entirely. Taken to extremes consider the infamous “poop emoji rock” on Bernal Hill which gets repainted every couple of weeks:

Or more seriously, consider the many different countries that have occupied the land mass previously known as “Yugoslavia” in the last 100 years.

The inclusion of the supersedes, superseded_by and status flags is largely dependent on whether or not there is a need to track (search, retrieve, display, etc.) place records in an historical context. If not then I agree they are superfluous. On the other hand I have never seen a geo-related project that, eventually, doesn’t need to address an historical record (see also: OpenHistoricMap and so on).

How would we model a city park, or, say, a hot dog stand?

The Who’s On First approach to place types is to keep the list of core (common) places to as few as possible. This is a design decision to try and mitigate the explosion of purpose-fit labels that other systems produce.

Importantly this doesn’t preclude projects from defining their own place type labels, either through the use of the wof:placetype_alt property or their project-specific/namespaced properties. But the goal is to ensure that there is a minimum commonality, at a global scale, across all the projects adopting WOF records.

To that end, place types are deliberately made as generic as possible. For example: Towns, hamlets, villages and cities are all classified as “localities”. In your example, a hotdog stand would be a “venue” and a park would either be a “venue” or a “campus” (note that airports and universities are also classified as “campus”-es).

And is the idea that the ID is durable and globally unique, and, while the URI might or might not embed the ID, it doesn’t matter because the URI serves a different semantic function than the ID?

The goal is mostly to reduce the burden on developers of needing to know how to untangle a unique identifier from its URI. The other option is simply to say the URI is the identifier which largely tracks with how decentralized projects seem to manage these things in the first place, albeit at the cost of database and index size which may be a relevant concern for some.

Say more about your conception of the semantics of “name” versus “label”?

The “name” of the Californian city of San Francisco is “San Francisco” but the label might be “San Francisco CA, USA” for the purposes of disambiguation since there are at least 53 different localities with that name in the world. Likewise, there are 12 “Brooklyn”-s and so on.

The “name” is the common referent (for example, in conversation) where as the “label” is used for presentation and disambiguation (for example, when choosing from a list of options in a geocoding context).

A good example of both can be found in the “WOF editor” form which lists not just language-specific inputs for names and labels but also the context in which either might be used (preferred, variant, historical, colloquial, etc.). Basically all of the label-specific variants have been included to address the needs of the Pelias geocoder:

@essentialrandom.bsky.social has been very emphatic in the opinion that having to iterate through a list of name structs to find the “primary” or “canonical” name is just bad developer ergonomics.

I don’t disagree in principle but I don’t know how you get around it in a world where people speak different languages or have differences of opinion.

WOF tries to address this by enforcing common semantics (7-bit ASCII-encoded English) across the wof:name property only so that there is consistency across the entirety of the dataset. Importantly, wof:name might become Spanish or Chinese in the future; the relevant feature of the property is that it is consistent not that it is English. But that doesn’t help someone who needs to display a place name in Khmer or Russian or any other language.

This is why I suggested that the ability to derive a “canonical” place type for a given language might be better suited to an xRPC method (which in turn places the burden of figuring that out on “someone else”) rather than trying to encode in a simple place record.

I find the distinction between name and label interesting.

My goal for a Lexicon would be, that it’s pretty minimal, but usable enough to build applications with just the information that is stored directly on ATProto. While having a way for more complex queries/analysis via the gazetteers the data came from. This means that I don’t want to create an on-protocol gazetteer, but leave it to third parties.

From the model that is outlined in this thread I don’t see the point of having Status, SupersededBy and Supersedes, due to the reasons mentioned, that ATProto on-protocol data is immutable.

I’m unsure if hierarchies are needed, I have to think about that more deeply. The point I’m rather after is, that I think a “label” should be the central and most meaningful on-protocol information. That should be app/user specific. It could be a well defined name, the user picked from some app provided drop-down. Or if the app allows free-form, it could be even be some jargon.

If you then need more information, you can get it from the gazetteer, due to the unique identifier. As I wouldn’t want to clone the whole record of the gazetteer into my on-protocol data this approach makes sense. You might not know what “more information” is, or will be in the future. It could be that you want the official name of the place in the native language where the place is. Or in the native language of the user. Or it could be that you want to get the population count.

Besides having the unique identifier to the source, I think it also makes sense to have a name. I would restrict it to a single string and make it whatever the gazetteer defines as “default”. This way, even if the gazetteer vanishes, you have at least a bit of a chance to make sense of the data. You could possibly retrieve things by that name from somewhere else.

I would certainly also include a geometry on-protocol, whether it’s from the gazetteer or user provided.