Cache normalization in Apollo 3

We are using Apollo 2 in production and are very happy with it. Thanks for developing it! :slight_smile: Now that Apollo 3 is out, we tried to upgrade and ran into some issues with caching.

Apollo 3 seems to believe that it is practical to manually specify merge strategies for every type and/or field (hundreds or thousands of lines of configuration) and also that this manual configuration cannot be checked at build time or start time, but instead fail eventually at runtime. To me, this sounds like an unacceptable combination. Given this, I set about writing code to inspect our schema and generate type policies. However, I ran into issues with Apollo not exposing sufficient information to build these policies.

I believe that a simple and general normalization strategy is sufficient for our use and for the vast majority of non-pathologic schemas that make the following assumptions:

  1. Most objects have ids that allow normalization
  2. Objects that do not have ids are considered to be part of the parent object
  3. Root objects are singletons and therefore do not need ids to be normalized

The algorithm is as follows:

Object normalization

  1. If the object has an id (or, as Apollo calls it, keyFields), merge incoming (new object) into existing (old object)

    • The merge function must treat the same fields with different arguments as different fields
    • The merge function must treat the same fields with the same arguments but with different aliases as the same fields
  2. If the object lacks an id field, give it a synthetic id and then go to step 1

    • For root objects, the synthetic id is the same as __typename since they are singletons
    • For non-root objects, the synthetic id is derived from the parent object’s id as well as the field name and arguments that returned the child object

Field normalization

  1. Replace existing (old field) with incoming (new field)
    • Except when explicitly configured with a merging strategy to support pagination (Limit and Offset, Relay Connection (Cursor), etc.)

As an example, let’s examine the following schema and query:

Schema

type Query {
  # The current user
  viewer: Viewer!
  # Given a search string, find the associated `Location`
  geocode(search: String!): Location
  # A paginated list of all `Place`s in the system
  places(limit: Int!, offset: Int!): [Place!]!
}
type Viewer {
  name: String!
  favoritePlace: Place
}
type Place {
  id: ID!
  name: String!
  location: Location!
}
type Location {
  latitude: Float!
  longitude: Float!
}

Query

# Query is a root type, so it is given a synthetic `id` of `"Query"`.
query ViewerAndPlaces {
  # `Viewer` lacks `id`, so it is given a synthetic `id` based on the path from
  # the nearest parent with an `id`. In this case the `id` is `"Query.viewer"`.
  viewer {
    name
    # `Place` has an `id` field, so it doesn't need a synthetic `id`.
    # Assuming `Place.id` is `"123"`, the global `id` could be `"Place:123"`
    favoritePlace {
      id
      name
      # `Location` lacks an `id` field, so it is given a synethic `id`.
      # Assuming `Place.id` is `"123"`, the synthetic `id` for this `Location`
      # could be `"Place:123.location"`.
      location {
        latitude
        longitude
      }
    }
  }
  # `geocode` returns a `Location` and `Location` lacks `id`, so the synthetic
  # `id` for this `Location` could be `"Query.geocode(search:'NYC')"`.
  geocode(search: "NYC") {
    latitude
    longitude
  }
  # Paginated fields require special configuration but that can be done
  # automatically by introspecting the schema and finding fields that
  # take `limit` and `offset` arguments or return `Connection` types.
  places(limit: 10, offset: 0) {
    id
    name
    location {
      latitude
      longitude
    }
  }
}

Can Apollo 3 support this general normalization strategy? Apollo 2 seems to behave roughly like the algorithm outlined above. Any suggestions for making Apollo 3 behave more like Apollo 2 without manual and error-prone configuration of hundreds or thousands of types and fields?

Hi @domkm

The above makes a lot of sense to me.

I find it surprising to see the AC3 docs suggest objects without a unique ID are “rare”; maybe it depends on how your API persists data, but using a document store like MongoDB means you often store 1-1 data as nested objects (without the need for an ID), as it can be more efficient + it’s generally a more logical format, e.g.

# This is clumsy as the schema doesn't describe the fact that "lng" and "lat" either both exist or don't exist at all.
type User {
   id: ID!
   lng: Float
   lat: Float
}

# Whereas here, like in your example, it's clear
type User {
  id: ID!
  location: Location
}

type Location {
  lat: Float!
  lng: Float!
}

Anyway, I wondered whether you found a sane solution for generating / maintaining the merge policies for types like Location which are owned by their parent?

I wrote this script [AC3] hard to migrate · Issue #9033 · apollographql/apollo-client · GitHub which seems to work but I’m looking to hear if there are any gotchas that I might run into (at runtime :grimacing:).