ATgeo WG: Next Steps for Garganorn and Geographic Data in ATProtocol

This is a followup to Lexicon Community TSC Notes: referring to external (authoritative) data in ATproto apps, specifically moving the ATgeo WG/Garganorn discussion beyond using AtURIs to reference records (given the clear “don’t” consensus).

Aside from the aforementioned notes, there are 3 other relevant additions to the conversation that I want to use as a jumping point for further discussion:

  1. places.pub, a service that makes OpenStreetMap geographical data available as ActivityPub objects
  2. Nick’s post on Off-Protocol Data in ATProtocol Records, which suggests defining location schemas that live off-protocol but maintain strong typing using JSON Structure
  3. Stephanie pointed to the STAC Specification as a “standard, unified language to talk about geospatial data”.

I’m going to write some thoughts about each, as a starting point for further discussion.

1. Blending places.pub’s approach with Garganorn’s

Speaking of referring to geographic data from within a decentralized protocol, places.pub has recently solved a similar problem for ActivityPub (although only over OpenStreetMap data rather than over multiple authoritative sources).

In short, places.pub serves https://places.pub/{type}/{id} URLs that return ActivityPub-shaped data. From their website:

The URL for a place matches the URL template:

https://places.pub/{type}/{id}

Where {type} is the type of the OpenStreetMap place (node, way, or relation) and {id} is the OpenStreetMap numerical ID of the place.

The id field is not unique across types. For example, the node with ID 123456 and the way with ID 123456 are two different places. The type field in the ActivityPub object URL is used to disambiguate between them.

This {namespace}/{id} structure is similar to the set of Lexicons records that had been proposed for Garganorn, which were the original impetus for “shoving this data into a PDS so apps can reference them through AtURIs”.

To summarize the proposal:

  • community.lexicon.location.foursquare/[foursquare_id] would contain the foursquare data for [foursquare_id]
  • community.lexicon.location.openstreetmaps/[osm_id] would contain the OpenStreetMap (OSM) data for [osm_id]
  • community.lexicon.location.place/foursquare:[foursquare_id] would contain the extended place data associated with [foursquare_id], which also includes a reference to the corresponding [oms_id]
  • community.lexicon.location.place/openstreetmaps:[osm_id] would contain the extended place data associated with [osm_id], which also includes a reference to the corresponding [foursquare_id]

Garganorn as ATproto’s places.pub

Even without being a proper PDS (and without offering AtURIs to point to), Garganorn can act as a service similar to places.pub, using an XRPC method to return data to ATproto applications in a compatible format. While it looks different (and more ATproto-y), this is effectively doing the same mapping as places.pub does.

Examples:

The XRPC method would accept the following parameters:

  • type, the Lexicon the app is asking to receive back
  • id, the id of the location in the corresponding lexicon, namespaced if necessary (like for places)

ATproto Lexicons that want to refer to location data would then store at least:

  1. The type of the location data they requested
  2. The ID of the location data
  3. (optionally) some checksum of the location data to point to potential revisions

It’d be up to applications to choose which service to resolve each Lexicon record with, although a reference to the original provider could be stored.

A few considerations:

  1. The data saved is very similar to the one needed to reference a record via an AtURI, especially when saving a reference to the original provider. The main difference is that there’s no corresponding AtProto record in a PDS[-like] archive. This is fine, and sidesteps the original issue: creating records that can be referred to via AtURIs comes with a lot of additional requirements. These requirements can be relaxed if we’re referring to them in a less “strong” way, for example via URIs, or by saving other identifiers.
  2. Creating Lexicons for specific sources of external data in ATproto is not the same as forcing the external data to be “ATproto-shaped”. The issues raised in the previous discussion were specifically about storing the data in PDSes and referring to them through AtURIs. Whether to create Lexicons to refer to them in a strongly-typed way within the ecosystem is a separate issue.
  3. This wouldn’t allow apps/other AppViews to be notified when location data has changed, in case they want to update their own references (e.g. a place closed).

The latter point remains a sticking point for me in these discussions: if I were to build a geo-aware app, I’d like to be able to know when the data of a location I’m referencing to changed. In ATproto, I’d do this by listening to the firehose for changes to records. How would I do it in this case?

2. Nick’s post about Off-Protocol Data in ATProtocol Records

In Nick’s post, he proposes using JSON Structure (json-structure.org) to define location schemas that live off-protocol but maintain strong typing. I’m going to refer the broader discussion to the appropriate topic: Off-Protocol Data in ATProtocol Records.

For the ATgeo portion of the discussion, however, the main challenge I see are:

  1. We’re not simply referring to data that already has a unique URL online: there’s no address for 4square location data, for example, since what Garganorn is indexing from is just a big dump of location data on S3, and not an actively-maintained dataset. If Garganorn has to host the 4square data, then why not return it in ATProto format directly and let people leverage tooling that already exists for Lexicons?
  2. Using external records backed by JSON Structure creates a whole new layer of complexity that’s pushed to ATproto applications (or non-geo AppViews), which is what we wanted to avoid in the first place. With that said, Garganorn could help by acting as a translation service for these external references into ATproto data, which goes back to Garganorn returning results with ATproto-specific Lexicons.
  3. If we need to define a https://lexicon.community/schema/location/address/v0/# schema, we might as well maintain a community.lexicon.location.Address Lexicon (or both?).

My considerations would be different if all Geo data had these schemas already defined or canonical URLs, but I’m eager to hear what people with more experience with geo data think!

3.Stephanie’s STAC Specification suggestion

I’ve spent some time trying to understand STAC, but have hit the limits of my geo knowledge pretty quickly, so take this with a huge grain of salt:

My understanding of STAC is that it’s a way to aggregate geo data from multiple providers, which could inform how Geo Lexicons are defined in ATproto. However, it doesn’t seem like its sample datasets map to our usage, which left me unsure how applicable it is. We definitely need someone more well-versed in Geo data formats to shed light on how one could leverage STAC.

1 Like

I’m trying to wrap my head around the need for a different lexicon for each of these external location providers.. is the thinking that an atproto app using location can use either of these, now two, available location sources and then in it’s location reference store just one preferred id?

So if i have a checkin record for my app or an event record i could choose to have a field for a foursquare_id only and then if i need to show the address i ask Garganorn; what is the address for this foursquare place? And then i cache it(?) and show it in my interface.

I am left with many questions, and i’m just going to post them here, since the practical questions sometimes lead to better theoretical designs :slight_smile:

  • While the FSQ dataset is opensource i cannot do an API request to it to turn my address/coordinates into an FSQ id unless i use the commercial API, how can i practically use these IDs?
  • The FSQ data was a nice one-time data dump but location data quickly becomes outdated so without updates it will become stale quickly. Should we be relying on a commercial vendor at all?
  • OSM is already better, but to be able to use osm data i need not just the id but also the type, can i translate osm id + type to fsq id?
  • What is the use case for translating between the two? I generally want the address not another ID?
  • If we are storing places anyway to do these conversion why not just reference the atproto places?

From the other notes posted today i am getting that people think it’s a bad idea to store all location data on atproto. But if the point of atproto is to store your own data on your own pds i don’t really see the problem? Each user of a location aware app is simply storing the places relevant to them on their own pds. None of them need to store ‘all’ the location data. If the record that these users store on their pds contains a reference to place that i can translate into an address at all times via Garganorn (and i would assume other Garganorn compatible services) that would be perfectly fine. But if i’m caching that to display it in a UI i might as well store the actual address/place data in the record as well.

All of this still does not solve the issue that my phone knows my geo coordinates but not the ‘place’ where i am so i still need a service that can translate geo coordinates into an address or a place id. If Garganorn can do that for me i would be quite happy about that and honestly it would not matter a whole lot what ID would be returned as long as i’m pretty sure i can always depend on being able to resolve it to an actual human readable place.

Thanks @essentialrandom.bsky for this great explication of the current state of affairs. I agree with pretty much everything you’ve written.

For everyone else’s reference, Garganorn is the codename for the demo implementation that I hacked together to start playing with these ideas. I am hoping that it will evolve into a reference implementation for whatever the ATGeo Lexicon working group decides to pursue.

But for now I’ll use the phrase “gazetteer server” to mean “any API server that provides place data using ATGeo Lexicon interfaces”. Which might be the Garganorn implementation or something else someday.

Here are the salient points put forth so far, as I understand them:

  1. The ATmosphere would benefit from searching reference datasets of geographic places, same as the ActivityPub world.
  2. ATProtocol “records” by definition come with specific guarantees about storage and validation. These guarantees don’t make sense to apply to data originating “off-protocol”.
  3. We can still refer to place objects with by a global path {dataset}/{id}, which uniquely identifies any place information and its source.
  4. AT-URIs by definition only apply to records. But we can still construct a URI for each place using its path, and treat that as a durable identifier.
  5. A “gazetteer server” could translate reference place data into Lexicon-specified ATProtocol objects, and serve them over XRPC. Then it looks just like any old AppView. There’s no real reason not to do it, and it makes it easier for ATProto applications to integrate gazetteer-provided place data without a whole lot of new tooling.

What’s great about this in total is that we don’t have to put reference place data into a PDS – but if you get a record from a gazetteer server, it will come back in a format that you could put into a PDS! Or embed into a real ATProtocol record and put that in a PDS!

Another great thing about this approach is you’ll always have the ATGeo URI for any place you fetch from an ATGeo gazetteer. If the gazetteer server you fetched it from goes away – no problem! You have the {dataset}/{id} path and you can just plug that into any other ATGeo gazetteer server that hosts that dataset.

Here are some other corollaries from MsBoba’s observations:

  1. Someone will to define and maintain a list of community NSIDs that uniquely identify the popular public place datasets, so that they can reused across the network.
  2. We will also want a shared community definition of the XRPC methods that an ATGeo gazetteer should support
  3. We might want a generic place schema defined in Lexicon that provides a “least common denominator” data model for places.
  4. We might want specific Lexicon schemas for each common public dataset.

I submit that this is the exact point of having an ATGeo Lexicon working group – to ideate, ratify, and maintain these consensus definitions for community use.

The WG’s goal should be to address these last 4 points, with the explicit objective of fostering interoperability between apps around place and location data. The data models designed to serve gazetteer data should work equally well for user-generated or app-generated place records. An ATProto user should be able to sign up for an event in one app, and check-in to the venue on another app, and it should be the same “place” in the ATmosphere… whatever that might mean.

I think this conversation can be had independently of “which reference place dataset do we use”, because the conclusions should apply to any of them. Ultimately which place data you decide to use as an app dev should be driven by the needs of your app.

As far as feeds of place data updates… Why not have a Lexicon type for those update events and then publish the update events on the firehose? The place object might not have an AT-URI… but it does have a URI!

Finally, we haven’t talked about what the search interface to an ATGeo gazetteer server should look like, but we will. Find nearest objects, find objects in a bounding box, filter by place name or other fields. I started noodling on this a few months back and arrived at almost the exact same place as places.pub did. I’ll write a separate post about that later.

1 Like

Thanks @essentialrandom.bsky and @schuyler.info this actually sounds pretty great already. I was hoping for translation and search but if we can also have some kind of atproto global reference to places across apps that would be very cool.

I’m reading, maybe between the lines, that an initial version might make available and use the FSQ data to basically bootstrap this setup which would be excellent.

I’m still not quite clear on how, as an app developer, I’d bring my own place data source into this and make it work with these globally addressable places but I’m sure that will clear itself out as you folks start publishing more of these conversations and ideas.

Very excited to start playing with this, thanks so far!

@tijs.org My thought has been that the ATGeo WG should publish a generic place record type, or, at least, a record interface. Gazetteer records, regardless of data source, should be delivered using a Lexicon type that allows them to be use (partially) interchangeably.

If we do it right, your application should be able to provide a Lexicon type that implements the subset of the place data model that you need, and you should be able to mix-and-match place records that your users create alongside reference places from a gazetteer server.

What’s more, if your app’s place records implement this “common place data model” interface, then you should be able to exchange place records interoperably with other ATProtocol apps that also implement the model. I think that drafting a specification for this ATGeo data model ought to be the primary near-term goal of the working group.

2 Likes

Yeah that is the part that sounds very cool, looking forward to see what you come up with!

1 Like

Thank you for all the questions and discussion!

@tijs.org I believe some of your initial questions were answered, but let me know if that’s not the case. I’m still going to go quickly through them and give thoughts, and hopefully I’m not repeating too much.

As mentioned, this is where an AppView service serving the open FSQ data would come in.

More specifically, as an AppDeveloper, you’ll have to choose:

  1. What type of location data does it make sense for your users to find, and what information does your app need to know about it? (e.g. venues and lat/long points are very different, but both valid for specific applications)
  2. Of that data, which one do you store in the user’s records? In what form?
  3. Of that data, which one do you store in your own software/AppView?
  4. Above all, what type of geographic “id” do you save so you can later get more info on this specific place, or you can correlate it with other apps’ geo data?

Some of this goes into @schuyler.info’s places proposal, but I wanted to spell out these considerations on their own.

This is a good concern to have! It goes back to “we’re building a proof of concept of how Geo interoperability might work, so we can pave the way for people/companies to come build Geo-based apps in the ATmosphere”.

That space will have to figure out how to deal with stale Geo data, and especially how to do so in the ATproto world. In the meantime, the vendor your app relies on very much depends on your use case! Your practical considerations/concerns as you choose are great data to inform the work of this WG.

I believe this point got clarified as discussion continued, but to make it explicit:

  1. It’s ok to store location data that’s generated on ATproto applications, or that benefits users/apps when stored in e.g. a PDS in ATproto records. What the discussion warned against was trying to shove everything in ATproto records just to reference them, even for data that originated outside of the ecosystem, and for use cases that did not benefit from being tied to a PDS.
  2. “What type of Geo data should then be copied in a PDS, if it’s copied from an external API?” This is, AFACIT, still an open question. It depends on a bunch of concerns with e.g. space/reliability/data freshness vs data loss. Definitely worth a separate discussion as we move forward.

That’s certainly an option! Something else that came to mind was the concept of “event streams WebSocket endpoints for data distribution” (as seen in labelers). A subscribe-able stream of updates like this would make sense: it’s not “global” like the firehose (which represents all events in the network), but it can be offered by services like Garganorn to help ATproto Apps interested in keeping up with an external data flow. It’s another layer of translation “Garganorn” is offering as a service!

This is a great topic of discussion, and I’d love to hear more about your use case/doubts/worries/expectations. Maybe a separate thread? Or we can crystalize what we have some more first, and get to more specifics as it becomes relevant.

1 Like

@essentialrandom.bsky this is great! thanks for taking the time to answers all of these. I think this, together with the other threads, paints a pretty clear picture of both where your heading and which questions are still open. I think i’m gonna hang back and just read about the updates for now. When more is clear on the format for these ‘shared’ places i’ll try to implement something again for Anchor and i’m sure i’ll end up with some more practical questions by that time. Thanks again, also for taking the time to figure this all out together.

1 Like