What is the Difference Between a “Federated Location Data Store” and a “Gazetteer”?

The current proposal for how to handle location data, largely shaped by the ATGeo Working Group discussions (e.g., ATGeo WG: Designing for interoperability with a common data model for places and ATgeo WG: Next Steps for Garganorn and Geographic Data in ATProtocol), defines a “gazetteer service” as an API server that provides place data using ATGeo Lexicon interfaces. The goal is to support location queries and enable AT Protocol applications to employ, create, update, reference, and exchange geospatial data, with a strong focus on interoperability.

The idea is that a gazetteer server would translate off-protocol reference place data into Lexicon-specified AT Protocol objects and serve them via XRPC, without requiring the data to be stored in a Personal Data Server (PDS). Garganorn is a demo implementation acting as a data relayer. Even with the most performant and optimized architecture possible today, the fundamental problem with this approach is it means creating a *centralized * service for location, amidst the decentralized architecture of ATProto. With this model, there is no good solution for sharing location data or derivatives. In the worst case, developers end up with a model that is very similar to commercial, proprietary APIs, where queries are made at runtime and data is stored and ranked by each individual app developer. In the words of seabass.bsky.social, simply duplicating address records “thousands and thousands of times throughout ATProto” feels wrong

The solution I proposed in my last post—to develop a Federated Location Data Store—is, I believe, the intuitive, missing data structure that resolves this incongruity by creating a traversable, decentralized and dedupable catalog of public location data that is maintained by app developers and others with a specific ATProto use case in mind. Furthermore, it aligns with ATProto architecture and philosophy, and provides for common Bluesky app development use cases like search and geotagging, geofencing and topology, respects the terms of data provider licenses, and in line with the architectural principles behind ATProto, treats location as a first-class primitive and represents a core element of the decentralized (geospatial) web.

However, the problem of creating a canonical and authoritative index of places goes deeper than redundant querying and storage. Deduplication of location data, particularly across different data sources, is an impossible problem to solve, even excluding certain types of custom or bespoke definitions, given that they may have slightly different names, location coordinates, and may come from different data sources. This is arguably the reason why walled gardens of proprietary location data exist: you can’t normalize “place” or “location”, so the best you can do is put a lot of effort into creating your own version of it. Furthermore, contrary to the sometimes stated assumption of the ATGeo Working Group, location data changes *all the time *. Data freshness is a scandalously notorious problem among POI data vendors, and public venues like malls, concert and sports arenas, and conference venues move walls and rearrange their spaces regularly, sometimes with patterns, sometimes without.

One possible way to merge the work being done by the currently active members of the ATGeo Working Group with this proposal would be to create a “gazetteer LDS". This could bootstrap efforts to federate private LDS instances by hosting collections of places and geofences sourced from publicly-available data and making it available across the distributed geospatial web.

Whereas there has been much emphasis in ATGeo Working Group discussions on remaining data provider agnostic when building a gazetteer, when building a gazetteer LDS, I would suggest starting with one rather than trying to harmonize across them all. Place is socially constructed, and locations have different representations depending on what you need them to mean. It simply is not possible to make location data that has been curated, created and maintained perfectly interchangeable, and I would argue should not be the goal. In many ways, a collection of location data, or a gazetteer is the schema more so than it is the place or location, thus representations by different providers are not interchangeable unless they were designed to be.

Ok. So then which one you ask? One obvious choice might be to port Who’s on First to STAC and host it, a web-available and downloadable gazetteer with stable IDs and tested schemas. However, for myriad reasons this could be a substantial effort, and more than the current group of volunteers and enthusiasts should undertake.

The other option is Overture Maps. Among the providers discussed in this post, Overture Maps is the most comprehensive and permissive. This is likely the best alternative for creating a centralized gazetteer. Overture favors CDLA Permissive 2.0 for places. That means ATProto devs can build shared catalogs, remix, and redistribute, whereas there is more ambiguity in the Apache license Foursquare uses about whether all operations are permitted. Furthermore, it is very likely that Foursquare will be folded into Overture Maps at some point. The Maintainers of Overture Maps are actively developing and experimenting with STAC (e.g. here is the catalog of Overture releases) and very likely will be open and enthusiastic to working with ATProto folks, so all in all it is a much easier lift.

Additional reasons, and here I don’t know who needs to hear this at this point, but, Foursquare primarily focuses on business and municipal listings, and point geometries only, whereas Overture Maps curates a broad variety of places and potential geofences, including administrative designations and buildings. Furthermore, Overture Maps is under active development, and has put a lot of thought into building stable ids and stable infrastructure, with polygons suitable for geofencing and not just points within municipalities, rather also administrative boundaries, buildings, and a host of other features.

Whew! This concludes by post blitz on location data in ATProto. I’ll be off the grid for a week, after that happy to talk more about this proposal, or else stay tuned for a future post, where I plan to outline in greater depth the schema patterns and implementation details behind Location Data Stores, with working examples.

In the meantime, tell me what you think :slight_smile:

I’d like to write a more substantive reply, but I just want to reflect back that your post references the “ATGeo Working Group” a couple of times as if it were some kind of formalized activity or a group of people making definitive decisions.

As far as I know, that’s not really the case, at least not yet?

I’m happy to engage with the ideas themselves, so long as we’re agreed that there’s no established, organized effort that needs opposing or defending.

I respectfully disagree — I don’t thing there’s anything intrinsically centralized about a “gazetteer” as we’ve been kicking around the notion. I’ve written in other threads here about how reference gazetteer services should be treated as essentially swappable.

When I say “gazetteer” all I’ve ever meant is “a network-addressable API endpoint that delivers location data from reference datasets in a Lexicon friendly format”.

Of course I think the implementation must be federated to be sustainable.

Perhaps I’ve just been taking that for granted and it needed to be said out loud?

That’s more or less what I was hoping we could accomplish as a working group in the ATProtocol Lexicon community. Thank you for making it an explicit goal that we can focus on.

That said, I plan to eat this hot dog from both ends. I do still plan to set up an instance of Garganorn that spits out some global datasets in a Lexicon format, so that folks like @essentialrandom.bsky and @tijs.org have a usable sandbox for experimentation… while we, hopefully at the same time, figure out what the “right way” is to do this for the ATProtocol ecosystem at scale.

I think this is broadly true, but when you look at location datasets that contain objects that describe what a layperson would call “a place”, they do have some commonalities.

I think the biggest sticking point will always be feature categorization, because that’s 100% where you fall into the deep end of cultural and commercial bias determining what taxonomy gets used.

But the OSM community’s practical experience over 20 years says to me that harmonizing disparate location taxonomies doesn’t have to be an impossible task, any more than keeping the data itself up to date is strictly impossible. It just takes work, and it also takes having the right structures of participation in place, if you’re relying on volunteers to do that work.

So I would want to engage with some practical counter examples before I was personally ready to conclude that modeling those commonalities for the sake of promoting interoperability is a total waste of time.

Just as a quick first comment i would like to applaud your highlighting of data freshness, i think you’re right that this needs attention when the fsq data source is concerned. Beyond that your recommendation of Overture and the establisment of a Gazetteer LDS sound like good things to consider for the community here.

For me personally any endpoint that i can get places from and some guidance on the preferred way to store the address/place records or address/place references would already help me out a lot. So i will indeed happily use Garganorn in whatever form it becomes available for my experiments while more future proof solutions are taking shape.

2 Likes

As I mentioned in the previous post, my next step will be a proof of concept of some kind. Happy to adopt whatever schema or lexicon is emerging from these experiments :).