Licensing Considerations for Location Data

In my previous post, I outlined four key applications of location data for ATProto apps. In this post, we will discuss cases where location data should remain the domain of individual app developers, as well as why location data does remain the domain of individual app developers.

First, Location data should remain the domain of individual app developers for these important reasons:

  1. Data licensing restrictions on mapping SDKs and APIs: see table below, but if you are using Mapkit or Google Maps the licenses restrict you from extracting, scraping, or storing the data, and from mixing and matching APIs and services, thus it is best practice to use these resources within the specific platforms for which they are designed and not beyond.
  2. Sensitivity around personally identifiable information: as a best practice, developers are generally discouraged from storing any location information, such as a latitude and longitude and obtained from EXIF metadata or mobile phones. This is primarily due to emerging regulatory concerns. That means sharing it within the ATmosphere is out of the question, unless and until there are community-supported tools and processes in line with privacy regulations (the strongest of which are coming out of the EU and California currently) for storing anonymized user data that is never traceable back to a person. This is a big challenge with significant risks involved, and is not currently on any community roadmaps.
  3. UX concerns around showing locations on a map: even if you are not storing or sharing anything, within the Bluesky ecosystem, developers should be discouraged from displaying personally identifiable information on a map without careful attention to UX and privacy. The current backlash against Instagram is just one recent example of how even major tech companies get this wrong: if you don’t handle the display of user location with an extreme amount of caution and sensitivity, even if nothing bad happens as a result, many or even most users will be spooked and simply turn location services off, eliminating this whole category of capabilities.

I will propose ways to deal with personally identifiable information in a later post. UX concerns around showing locations on a map is a deeper dive for another time.

To underscore why the data licensing restrictions are a significant tradeoff, the table below provides a summary of the common data providers you might use to build out these applications that contrasts the license restrictions with data structures and use patterns. The TL;DR here is that the proprietary data sources are much easier for an individual app or developer to work with, but there are tradeoffs in terms of what you can aggregate, abstract or share.

Provider Restrictiveness Ranking Data Structure / Schema How to Use / Extract
Overture Maps Least restrictive – open under CDLA for Places (ODbL for some other layers if they come from OpenStreetMap) Open schema: Places, Buildings, Transportation, Admin Boundaries. Stable IDs + linkages. Bulk download, conflate, manage own DB.
Foursquare Open Source Places Permissive – open dataset (Apache 2.0) Structured POIs with fsq_place_id, categories, hierarchy, geometry, metadata. Bulk download (Parquet), conflate, manage own DB.
Apple Maps (MapKit / Places API) Restrictive – cannot store, redistribute or mix with external DBs Apple placeId + categories, opaque schema. Query via MKLocalSearch. Supports forward and reverse geocoding. IDs must remain internal.
Google Places API Restrictive – cannot store, redistribute or mix with external DBs Google place_id, categories, metadata. API (places/textSearch, places/details), supports forward and reverse geocoding
Mapbox Places / Geocoding API Restrictive — similar to Google/Apple; cannot store, redistribute or mix with external DBs id, text, place_type, geometry, properties API (/geocode) supports forward and reverse geocoding
Yelp Fusion API Most restrictive — cannot store or redistribute POIs, reviews, or ratings; even ephemeral storage is highly constrained Businesses w/ Yelp IDs, categories, attributes, reviews. Lookup via API (/businesses/search, /businesses/:id), business search only, not a geocoder

Some further salient observations:

  • Location data providers like Overture Maps and Foursquare Open Source Places expect you to spin up your own server and host their data in a database, but under open-source license, so you are free to mix and match.
  • Mapping APIs like Apple Mapkit, Yelp and Google Maps / Places APIs are intended for use exclusively by individual developers through calls at runtime, and the ToS explicitly restrict you from mixing and matching services (e.g. you can’t show Google Places on a Mapkit map and vice versa, or mix them together for purposes of search geotagging, geofencing, etc.).
  • Mapbox has contributed a lot to the open source mapping world, but their Places API is proprietary, just as restrictive, and roughly comparable to Google or Apple.
  • Similarly Yelp has great categories and attributes that make it ideal for location search (sans maps) and geotagging but the licensing is restrictive in ways that are similar to Apple and Google. A notable difference is that it will not resolve addresses to places (as you would expect of a geocoder) and has limited functionality around what it returns given a latitude and longitude (e.g. reverse geocoding)

For you as an individual developer, the convenience of the proprietary APIs may outweigh the value of the less restrictive licenses. This is unlikely to change entirely, but as I’ll describe in the remainder of this post and dive into more deeply in a future post, there are opportunities to build infrastructure around locations as first-class primitives in a decentralized web.

Any reason why OSM and the various web services built around it aren’t on your list?

Explaining how to extract data from OSM and use it in a way that’s comparable to these providers would require a much deeper dive, and is not something I think is especially reasonable.

What other “various web services” do you think make sense to include and evaluate as location data providers?

Agreed that raw OSM data is probably too low-level and abstruse to be very useful on its own to this dev community.

But just off the top of my head, Nominatim and Overpass both provide web APIs that expose OSM data in formats that might cross the threshold of usability for some ATProtocol app devs.

It might also be worth giving a mod to Who’s On First, although I don’t know if Mapzen are actively maintaining it?

Foreshadowning! OSM and Whos on First make more sense in the context of my next post. Here I’m just acknowledging what’s standard practice (and why that makes sense) so we can get more specific about what could be different.

My, granted experimental, app Anchor currently uses OSM data via Overpass and Nominatim to do location based checkins. So it both uses an unreasonable data source and allows for the unreasonable use-case of storing location information for people :upside_down_face: But these are solid posts, thanks for the deep dives! I think your mostly right that OSM data is not a good POI source in general but it works well enough for experiments like mine.

1 Like

You are a brave soul… so curious what you think of the proposal in my latest post!

Thank you for the deep dive! Question: with these restrictions, would it still be possible/useful/wise to provide a mapping from open data we can use to potential matches for the same place in closed services?

Edit: I believe this post included the answer to my question