This is a followup to Lexicon Community TSC Notes: referring to external (authoritative) data in ATproto apps, specifically moving the ATgeo WG/Garganorn discussion beyond using AtURIs to reference records (given the clear “don’t” consensus).
Aside from the aforementioned notes, there are 3 other relevant additions to the conversation that I want to use as a jumping point for further discussion:
- places.pub, a service that makes OpenStreetMap geographical data available as ActivityPub objects
- Nick’s post on Off-Protocol Data in ATProtocol Records, which suggests defining location schemas that live off-protocol but maintain strong typing using JSON Structure
- Stephanie pointed to the STAC Specification as a “standard, unified language to talk about geospatial data”.
I’m going to write some thoughts about each, as a starting point for further discussion.
1. Blending places.pub’s approach with Garganorn’s
Speaking of referring to geographic data from within a decentralized protocol, places.pub has recently solved a similar problem for ActivityPub (although only over OpenStreetMap data rather than over multiple authoritative sources).
In short, places.pub serves https://places.pub/{type}/{id} URLs that return ActivityPub-shaped data. From their website:
The URL for a place matches the URL template:
https://places.pub/{type}/{id}Where
{type}is the type of the OpenStreetMap place (node,way, orrelation) and{id}is the OpenStreetMap numerical ID of the place.The
idfield is not unique across types. For example, thenodewith ID123456and thewaywith ID123456are two different places. Thetypefield in the ActivityPub object URL is used to disambiguate between them.
This {namespace}/{id} structure is similar to the set of Lexicons records that had been proposed for Garganorn, which were the original impetus for “shoving this data into a PDS so apps can reference them through AtURIs”.
To summarize the proposal:
community.lexicon.location.foursquare/[foursquare_id] would contain the foursquare data for[foursquare_id]community.lexicon.location.openstreetmaps/[osm_id]would contain the OpenStreetMap (OSM) data for[osm_id]community.lexicon.location.place/foursquare:[foursquare_id]would contain the extended place data associated with [foursquare_id], which also includes a reference to the corresponding[oms_id]community.lexicon.location.place/openstreetmaps:[osm_id]would contain the extended place data associated with[osm_id], which also includes a reference to the corresponding[foursquare_id]
Garganorn as ATproto’s places.pub
Even without being a proper PDS (and without offering AtURIs to point to), Garganorn can act as a service similar to places.pub, using an XRPC method to return data to ATproto applications in a compatible format. While it looks different (and more ATproto-y), this is effectively doing the same mapping as places.pub does.
Examples:
- https://garganornappview/xrpc/getPlaceData?type=community.lexicon.location.foursquare&id=[foursquare_id]
- https://garganornappview/xrpc/getPlaceData?type=community.lexicon.location.place&id=[foursquare:foursquare_id]
The XRPC method would accept the following parameters:
type, the Lexicon the app is asking to receive backid, theidof the location in the corresponding lexicon, namespaced if necessary (like for places)
ATproto Lexicons that want to refer to location data would then store at least:
- The type of the location data they requested
- The ID of the location data
- (optionally) some checksum of the location data to point to potential revisions
It’d be up to applications to choose which service to resolve each Lexicon record with, although a reference to the original provider could be stored.
A few considerations:
- The data saved is very similar to the one needed to reference a record via an
AtURI, especially when saving a reference to the original provider. The main difference is that there’s no corresponding AtProto record in a PDS[-like] archive. This is fine, and sidesteps the original issue: creating records that can be referred to viaAtURIs comes with a lot of additional requirements. These requirements can be relaxed if we’re referring to them in a less “strong” way, for example viaURIs, or by saving other identifiers. - Creating Lexicons for specific sources of external data in ATproto is not the same as forcing the external data to be “ATproto-shaped”. The issues raised in the previous discussion were specifically about storing the data in PDSes and referring to them through
AtURIs. Whether to create Lexicons to refer to them in a strongly-typed way within the ecosystem is a separate issue. - This wouldn’t allow apps/other AppViews to be notified when location data has changed, in case they want to update their own references (e.g. a place closed).
The latter point remains a sticking point for me in these discussions: if I were to build a geo-aware app, I’d like to be able to know when the data of a location I’m referencing to changed. In ATproto, I’d do this by listening to the firehose for changes to records. How would I do it in this case?
2. Nick’s post about Off-Protocol Data in ATProtocol Records
In Nick’s post, he proposes using JSON Structure (json-structure.org) to define location schemas that live off-protocol but maintain strong typing. I’m going to refer the broader discussion to the appropriate topic: Off-Protocol Data in ATProtocol Records.
For the ATgeo portion of the discussion, however, the main challenge I see are:
- We’re not simply referring to data that already has a unique URL online: there’s no address for 4square location data, for example, since what Garganorn is indexing from is just a big dump of location data on S3, and not an actively-maintained dataset. If Garganorn has to host the 4square data, then why not return it in ATProto format directly and let people leverage tooling that already exists for Lexicons?
- Using external records backed by JSON Structure creates a whole new layer of complexity that’s pushed to ATproto applications (or non-geo AppViews), which is what we wanted to avoid in the first place. With that said, Garganorn could help by acting as a translation service for these external references into ATproto data, which goes back to Garganorn returning results with ATproto-specific Lexicons.
- If we need to define a
https://lexicon.community/schema/location/address/v0/#schema, we might as well maintain acommunity.lexicon.location.AddressLexicon (or both?).
My considerations would be different if all Geo data had these schemas already defined or canonical URLs, but I’m eager to hear what people with more experience with geo data think!
3.Stephanie’s STAC Specification suggestion
I’ve spent some time trying to understand STAC, but have hit the limits of my geo knowledge pretty quickly, so take this with a huge grain of salt:
My understanding of STAC is that it’s a way to aggregate geo data from multiple providers, which could inform how Geo Lexicons are defined in ATproto. However, it doesn’t seem like its sample datasets map to our usage, which left me unsure how applicable it is. We definitely need someone more well-versed in Geo data formats to shed light on how one could leverage STAC.