ATProto Location Data Store: A Source-Aware, Federated Approach to Location Data

mizmay.com · August 22, 2025, 6:58pm

In my previous posts, I’ve outlined the common applications of location data in ATProto apps, compared licensing issues across third party providers, and built out the options and common patterns for advanced implementations like adding maps and custom data visualizations, or maintaining popularity scores. In this post I’ll point out the opportunity for app developers to build federated catalogues of location data, to use privately or share.

Just to ensure we are all on the same page: AT Protocol (ATProto) is a generic federated protocol designed for building open social media applications. It emphasizes self-authenticating data and identity, which allows for seamless account migrations and content redistribution, and is built to scale to billions of accounts. ATproto’s structure relies on stable Decentralized Identifiers (DIDs) for identity, content-addressed and cryptographically verifiable Repositories for public content, and Lexicons that define APIs and record schemas for applications. Bluesky, a microblogging social app, is built on top of ATproto, utilizing these Lexicons. The protocol is designed to balance stability and interoperation with flexibility for third-party application development, allowing for the creation of new Lexicons under independent namespaces. AT Protocol aims to interconnect application backends, enabling them to share state, including user accounts and content, by making internal services public and allowing anyone to create their own instances of these services.

While ATproto provides a structured environment, handling location data presents a significant challenge due to its inherent variability and scale of external datasets, none of which are designed or curated expressly for this purpose. The solution is for app developers incorporating location data to use a SpatialTemporal Asset Catalogue (STAC). When customized to support the common applications of location data within the ATMosphere, namely search and geotagging and geofencing and topology, we can use STAC to create a distributed geospatial web that mirrors and interacts with the ATMosphere.

What is STAC?

STAC is an open standard for describing geospatial datasets using simple, developer-friendly JSON schemas. STAC also works in Python through libraries like PySTAC, which let you create, read, and manipulate Catalogs, Collections, and Items as Python objects, serialize them to JSON, and perform spatial or temporal queries programmatically. It was explicitly designed for interoperability, so that anyone publishing or consuming geospatial data can rely on a consistent structure. It provides a schema for organizing geospatial data as Collections (themes of data, e.g. “cities,” “neighborhoods,” “venues”) and Items, individual GeoJSON features.

It was developed for managing satellite images, but due to parallels in the architecture requirements, it aligns well with ATProto. Its strengths:

Standardized: Supports StableIDs
Portable, and interoperable: Every app can use the same schema while tailoring it to their own needs.
Geospatially native: Handles geometry (points, polygons), time, and metadata consistently.
Scalable: Works for a single app’s location data or for federated catalogs built from Overture, Foursquare, or community data.
Federation-ready: Developers can publish and link catalogs while maintaining local autonomy.

The core objects are:

Item → an individual record (e.g. a park or cafe), with geometry + metadata, or an irreducible set of features (e.g. the rooms at a venue)
Collection → a dataset family (e.g. of parallel things with the same type and hierarchy, like places for autocompleting your search box)
Catalog → an index that links together items, collections, and sub-catalogs so they can be crawled or browsed

In addition, you can set up an HTTP API on any STAC instance with filtering (via CQL2, an extension) so you can query “points in this polygon”, “items named like X near Y”, etc. Additional extensions add capabilities (e.g., versioning). STAC is intentionally small and extensible, which makes it a great fit for ATProto app development. The spec goes into much greater detail.

Using STAC as a Data Store

Summarizing from the exploration in the previous section, here are our design goals:

Privacy by default: treating user coordinates/EXIF as ephemeral inputs—never stored or shared
Open, redistributable content: being careful of data licensing terms
Federation-ready: independent data stores link to each other
Interoperable: allowing queries across data stores
Performant: support text autocomplete + spatial queries at scale

We can accomplish these goals by using a common schema within STAC to create a network of federating data stores.

Creating a Location Data Store

The first of our design goals, privacy by default, requires us to talk about what it takes to create a Location Data Store (LDS) using STAC. We can use these to maintain our license agreements and restrict anything PII adjacent while using a few key attributes to create the federated version, i.e. a Federated Location Data Store (FLDS).

For the sake of discussion, let’s say you want to maintain one or both of these types of collections:

Places - locations represented by a latitude, longitude, and additional metadata, anything you want to index, search on, or represent at a point
Geofences - polygons representing anything from a building to a province, including informal spaces, overlapping spaces, and exclusion zones

As the app developer, you can generate or build these collections from any number of sources, such as OpenStreetMap or public government data, curated and transformed to match your use case, so long as you store additional attributes related to source, provenance, and license restrictions. You can also use it to transform and store API calls to open data sources, such as Nominatum, a geocoder for OpenStreetMap, or import from Who’s on First, a gazetteer with both an API and per-country downloads that can be converted to STAC.

Using a STAC API with filtering and versioning extensions, you can build your own endpoints for use within your app, for instance to create a spatially-aware autocomplete text box for adding or modifying geotags on posts, profiles or events, and recreate the level of functionality you might otherwise only get from a proprietary, commercial provider.

Taking this further, you can create a utility that automatically determines whether your user is within a geofence without storing or sharing their location outside of your app ecosystem by querying your geofences collection with something like this:

  "collections": ["geofences"],  
  "filter-lang": "cql2-json",  
  "filter": {  
    "op": "s_contains",  
    "args": [  
      {"property": "geometry"},  
      {"type": "Point", "coordinates": [-122.423, 37.776]}  
    ]  
  }  
}

Or return all the elements of your places location within a city boundary you have stored in your geofences collection with something like this:

  "collections": ["places"],  
  "filter-lang": "cql2-json",  
  "filter": {  
    "op": "s_intersects",  
    "args": [  
      {"property": "geometry"},  
      {"type": "Polygon", "coordinates": [[[...]][}  
    ]  
  },  
  "limit": 200  
}

You can also do things like:

Query a text string like “coffee” within 2 miles, sort by distance and display the results
Maintain custom metrics around places access within your app so that you can rank by popularity

Provided you are abiding by the terms of the license, you can still continue to use commercial APIs for maps, there are no restrictions from any provider on displaying data you did not retrieve from their places APIs, and in many cases when you do query those APIs, it is ok to store or conflate the stable IDs with your own data, even aggregate and build metrics on top of it to generate a popularity score (see table in this post and specific license agreements for details).

There are still implementation details and data standards to work out in a future post, but this is how your private data store could work using STAC in principle.

Federating Location Data Stores

While having your own personal data store is nifty, often even necessary for storing your custom Collections of places and geofences, as well as protecting proprietary data, the real power of this model comes from federating it. Linking Location Data Stores together is what extends this model to meet larger design goals—enabling interoperability, scaling across datasets, and balancing openness with control.

Catalogs are hierarchical and designed to cross-link, so provided you’ve distinguished between private data (collected or curated for your own app, not to be shared) and shared data (which can be linked into the Federated Location Data Store, or FLDS) in your schema, your STAC is federation-ready.

For example, let’s imagine you are running an event app that lets users geotag where they are attending a show. Your LDS might store two types of records side by side:

A user-generated geofence of the tickets-only area, marked private area visible only within your app.
A geofence (or place) record from a public source (e.g. Overture, Foursquare, or another open source)

Sample JSON objects:

// Private LDS record (geofence collection)

{  
  "id": "urn:uuid:1234",  
  "type": "Feature",  
  "geometry": {  
    "type": "Polygon",  
    "coordinates": [[  
      [-122.4231, 37.7760],  
      [-122.4225, 37.7760],  
      [-122.4225, 37.7755],  
      [-122.4231, 37.7755],  
      [-122.4231, 37.7760]  
    ]]  
  },  
  "properties": {  
    "name": "VIP Show Entrance",  
    "collection": "geofences",  
    "scope": "private",  
    "source": "user-submitted",  
    "license": "internal-use-only"  
  }  
}

// Shared LDS record (places collection)

{  
  "id": "overture:venue:5678",  
  "type": "Feature",  
  "geometry": {  
    "type": "Polygon",  
    "coordinates": [[  
      [-122.4235, 37.7765],  
      [-122.4220, 37.7765],  
      [-122.4220, 37.7750],  
      [-122.4235, 37.7750],  
      [-122.4235, 37.7765]  
    ]]  
  },  
  "properties": {  
    "name": "Civic Center Plaza",  
    "collection": "geofences",  
    "scope": "federated",  
    "source": "Overture Maps Foundation",  
    "license": "ODbL",  
    "version": 1.0  
  }  
}

// Shared LDS record (places collection)

{  
  "id": "overture:place:9876",  
  "type": "Feature",  
  "geometry": {"type": "Point", "coordinates": [-122.419, 37.7749]},  
  "properties": {  
    "name": "Bill Graham Civic Auditorium",  
    "collection": "places",  
    "scope": "shared",  
    "source": "Overture Maps Foundation",  
    "license": "ODbL",  
    "version": 1.0  
  }  
}

With this distinction, your LDS publishes only the shared records into the CDS while keeping private records local. This way, your app benefits from the network effect of federated data without leaking user-generated or license-restricted content. You can share your catalogs and reference public catalogs maintained by others.

The next challenge in federation is handling duplicates, since different LDS catalogs may include overlapping records (e.g. OpenStreetMap and Foursquare both publish the same venue). A federated LDS system can address this by:

Matching records using stable IDs where available (e.g. in data from the same source).
Using geometry and metadata similarity for conflation where sources differ.
Preserving provenance by retaining the source field and version in all records.
Allowing weighting or ranking per source to help reduce the risk of conflicting records (e.g. preferring Foursquare for venues, Overture for administrative boundaries)

As with the private LDS section, there are still implementation details and data standards to work out, stay tuned.

schuyler.info · August 22, 2025, 11:27pm

There’s a lot of great ideas here! And you’ve cleared up some misunderstandings I had in my cursory understanding of STAC.

What’s the Free Software ecosystem around it like? Are there good starting points for a community LDS that we can try out and maybe build on?

schuyler.info · September 5, 2025, 6:23pm

I found this detailed blog post by Chris Holmes from about a year ago that is informative, but raises some further questions for me about STAC would be useful to the ATmosphere:

So STAC is ready to use if you want a data schema for data where the geometry is an indicator of the footprint of some other type of data, and there are links to the actual data. And then you can tap into all sorts of STAC extensions that help define additional parts of a flexible data schema. But if your data is like the vast majority of vector data, where the geometry and properties are the data, not metadata about some other data, then you can’t tap into all the great extensions and validation tools.

And this FAQ response from stacspec.org also leaves me curious:

I have vector data, should I use STAC?

Yes! Vector data can in principle be handled with STAC, but it’s not as well defined as for raster data. STAC it closely aligned with OGC API - Features though and you should have a look at that specification, too.

I think I need to be walked through a more detailed example of how STAC would be used in practice, and how we might benefit from the existing software ecosystem, to really understand the benefit of trying to use it in an ATProtocol app.