Apollo Client 3 pagination + sorting

Hey,

I’ve been trying to figure this out for a few days now and the further I get with understanding caching, pagination, and Type Policies w/ Apollo Client the deeper of a hole I find myself. The package is great, I’m sure the problem lies in my lack of understanding. Help is greatly appreciated in educating me on the best method of implementation or practice here. Sorry before hand if this is not the right place or format.

Basically we have backend pagination on our server and we are currently using offset, limit, order (by, desc). On initial load I get my limit 10, offset 0 with an order by name field with the result also returning totalItems value, let’s say 20.

In order to improve performance I attempt to handle pagination, as well as delete and adding items all through AC3 caching with various updates, cache.modify and Type Policy handling detailed below:

   // type-policy
    NamespaceDocumentQuery: {
      fields: {
        get: {
          ...offsetLimitPagination({
            field: 'documents',
          }),
        },
      },
    },

Below is my custom offsetLimitPagination

// A basic field policy that uses options.args.{offset,limit} to splice
// the incoming data into the existing array. If your arguments are called
// something different (like args.{start,count}), feel free to copy/paste
// this implementation and make the appropriate changes.
export const offsetLimitPagination = (props?: IPagination): FieldPolicy => {
  const { keyArgs = false, field } = props || {}

  return {
    keyArgs,
    read(existing, options) {
      console.log('options?.args', options?.args)
      console.log('existing', existing)
      const { offset, limit, order } = parseArgs(options?.args)
      const { canRead, readField } = options
      if (offset < 0 && limit < 0) {
        return { ...existing, args: options?.args }
      }

      const items =
        existing?.[field]?.length && offset >= 0 && limit >= 0
          ? existing?.[field].filter(canRead)
          : existing?.[field]

      // sort items
      const sortedItems = items.slice(0).sort((a, b) => {
        return sortByStringField(order?.[0]?.by, order?.[0]?.desc)(
          readField(order?.[0]?.by, a),
          readField(order?.[0]?.by, b),
        )
			})
			
      const page = sortedItems?.length ? sortedItems.slice(offset, offset + limit) : sortedItems

      const itemsPerPage = limit || 0
      const totalItems =
        existing?.paginationInfo?.totalItems - (existing?.[field]?.length - sortedItems?.length) ||
        0
      const totalPages = Math.ceil(totalItems / limit)

      // A read function should always return undefined if existing is
      // undefined. Returning undefined signals that the field is
      // missing from the cache, which instructs Apollo Client to
      // fetch its value from your GraphQL server.
      if (page?.length < itemsPerPage && sortedItems?.length < totalItems) {
        return undefined
      }

      return {
        ...existing,
        documents: page,
        paginationInfo: {
          ...existing?.paginationInfo,
          offset,
          limit,
          totalItems,
          totalPages,
        },
      }
    },
    merge(existing, incoming, options) {
      // set merged object with empty array
      const merged = {
        [field]: [],
        // for non related array just return incoming
        ...incoming,
      }

      // set existing data if it exists
      // Slicing is necessary because the existing data is
      // immutable, and frozen in development.
      merged[field] = existing ? existing[field].slice(0) : []

      // perform operations
      const { offset = 0 } = parseArgs(options?.args)

      // offset
      offsetData(merged, incoming, field, offset)

      return {
        ...merged,
        order: options?.args?.org,
      }
    },
  }
}

I have a merge to handle offset, and read to handle limiting, and sorting (will look to handle filtering as well). I am using a useLazyQuery to fetch data when paginating.

  const [getData, documentRes]: any = useDocumentGetLazyQuery({
    notifyOnNetworkStatusChange: true,
  })

When I go to nextPage, and the getData is called the result is cached and in our query we ask for paginationInfo which maintains the limit and offset, as this changes the cache is refreshed and the UI is updated. So at this point limit, offset is all working well.

My first issue is when I do an initial load and sort. Because the sort is not returned in the query result there is no update to the cache, so when I sort because I’ve only received the first 10 records out of 20 it would only sort my first page data and not my entire data set. Thus I need to apollo cache know that this order field is a keyArgs (right?). I add this to my typePolicy and when sorting cache is updated adding two separate entries for based on the new keyArg so I would have two entries. I can live with that, the problem becomes when I try and delete or add items to this list.

I use update and cache.modify which looks something like this:

 update: (cache, { data }) => {
        // evict before modifying
        cache.evict({
          id: `Document:${id}`,
        })
        cache.gc()

        cache.modify({
          fields: {
            document: (existingDocumentsRef, { readField }) => {

              const getRef = readField<DocumentGetPayload>({
                fieldName: 'get',
                args: {
                  input: {
                    ...createGetRefArgs({
                      companyId,
                      paginationInfo,
                    }),
                  },
                },
                variables: {
                  input: {
                    // filter: {
                    //   companyIds: [companyId],
                    // },
                    order: {
                      by: sorting?.sortState?.backendSortKey,
                      desc: sorting?.sortState?.sortDirection,
                    },
                  },
                },
                from: existingDocumentsRef,
              })

              console.log('getRef ***', getRef)

              console.log('on delete result', {
                ...existingDocumentsRef,
                get: {
                  ...getRef,
                  documents: getRef?.documents?.filter((ref) => readField('id', ref) !== id),
                  paginationInfo: {
                    ...getRef?.paginationInfo,
                    totalItems: getRef?.paginationInfo?.totalItems - 1,
                  },
                },
              })

              return {
                ...existingDocumentsRef,
                get: {
                  ...getRef,
                  documents: getRef?.documents?.filter((ref) => readField('id', ref) !== id),
                  paginationInfo: {
                    ...getRef?.paginationInfo,
                    totalItems: getRef?.paginationInfo?.totalItems - 1,
                  },
                },
              }
            },
          },
          optimistic: true,
        })
      },

The main problem here is that know my refKey is not just a string but has the my args attached to it in the form of key({input:{order}) and in the return result of my cache.modify I do not know how to return or update that same ref with the args attached? So in this case it will create a new cache without the args and not update the correct reference.

  • In my initial issue with the sorting when keyArgs: false what’s the best approach to handling sorting? Can I refetch the data from server at some point. How can I trigger this should I use the new refetchQuery method when clicking on sort
  • When keyArgs: ['order'] How can I return the correct cache reference with the args attached in my cache.modify to ensure the correct cache reference is being updated
  • What is recommended overall when sorting comes into play with your pagination?

I will try and provide a reproducible repo or sandbox. Thanks ahead of time for your support

Without doing a bunch of cache gymnastics, one approach could be to simply get the id only from the resources you’d like, effectively giving you a non-cached list, but cached results for the entity you’d like to see.

From there you can get each result “individually”, but then bulk them all together in 1 http request, at which point the cache should assign the results for that specific type:id combination, allowing you to re-use the cached item across different queries.

Hey,

Thanks for your response.

Are you suggesting that in my query I only request for just the id within my list like below

query getItems {
 namespace {
   getItems: (input: {}) {
     items: {
      id
     }
   }
 }
}

and then make another request to get all of the items details in a single http request ? I feel like I would run into the same issue, instead now I’ve just ran two queries to get the same data I could get with one, which is one of the benefits of using Graphql, to use a single request to query my data. I may be missing something but how does that solve my issue? I will give this a try and check to understand what is going on, if my assumption above is correct. Thanks again for your reply.

The cache by default works off of IDs last I checked. the relay-style spec, which uses graph theory as a baseline.

Most GraphQL these days uses Relay pagination and conventions, which are based on Graph theory. In graph theory you have the concept of a Node, which a distinct thing. Nodes have IDs. The cache says that any Node with a particular ID should be unique, and any data fetched can be merged.

Relay creates the concept of “Edges”, which are the lines between Nodes, which is again graph theory. Relay accomplishes this using a standardized set of types called “Connections”, which can dynamically create “Pages” (although you won’t see a Page type in the schema, because a page is just a list of edges).

A Connection is a link between two types which contains multiple Edges, with different Nodes on either end. A Connection doesn’t represent Node A to Node B, it represents Node A-Z to Node A-Z, in other words, a many-to-many relationship, rather than one-to-many. When using Relay-style pagination, though, you will always be using it as one-to-many, because you always start at a specific starting Node. If an Edge is a road, a Connection is a highway (many more lanes), and a Page is a list of roads to get from A to B-Z.

A Page is a dynamic list of edges, each edge containing 1 node (the node on the other end). A page can have any size, and start at any point in the list, and end at any point in the list. The list can be ordered in any way, but it’s expected to be exactly the same order on successive queries (making your pages ordered by a timestamp by default is good for this).

Relay uses cursor-based pagination, rather than index + offset-based pagination (among others), because cursor-based pagination is generally very performant and flexible. I have made many Connection implementations, and when I’m dealing with a system that doesn’t use cursor-based pagination (which is most of the time), it’s pretty easy to implement on top of their existing pagination method. I would define a Cursor as “a point in a set of pages”, meaning for 1 set of inputs in order to generate a set of pages, a Cursor is an unchanging point. In Relay-based pagination, this means that an Edge contains only two things, a node and its cursor. You can ask for the cursor of any edge, but you are also given handy utilities for the first cursor and last cursor of a page; in other words, you don’t need to actually ask for the cursor for edges.

This allows you to do forward pagination, which is starting at the beginning of a set of pages and moving from next page to next page, as well as reverse pagination, which is starting at the end of a set of pages and working from previous page to previous page. This allows you to move arbitrarily through a set of pages and change the sizes of pages at will, always knowing where you are because cursors don’t change (unless you change the inputs of a search). Cursors don’t act as a substitute for Node IDs, although a common implementation is to simply base-64 encode the node’s ID.

So to reiterate, the above is one of the biggest conventions in GraphQL right now, and Apollo’s tools are typically made to work with that model. As a result, any Node’s ID should be what is used to create an item in the cache, and any additional data should be able to be merged into that node’s cached entry.

As a result, you can expect to be able to ask for as much or little as you want in a query, and everything will be cached based on its Type and Node ID. You don’t need to only ask for IDs if you don’t want to, but I find it easier to separate out my React components into components for lists that only handle the Relay pagination, and components for items in those lists that only deal with fetching off a specific Type of Node.

As for batching, I can’t recall exactly, but I think Apollo’s clients automatically batch when possible, but I would check the docs on that. In GraphQL you can perform many queries in parallel and everything is returned as you asked, if you make many queries that are the same you simply need to alias the query, which just gives it a name so that you know which query the results you are getting is from. So in that way you can “batch” things in GraphQL, but I think Apollo might have an additional batching method that doesn’t work exactly like that, but regardless there should be plenty of ways to fetch lots of data in 1 HTTP request.

Hey,

Thanks for this detailed explanation really appreciate you taking the time to reply.

It did provide a lot of insight but to make sure I understood, essentially you are saying, unrelated to cursor based pagination a Node ID is what is used to create an item in the cache in GraphQL and or the Apollo client tool. This I understand as this is part of cache normalization. However in my case I have a root query which will get my list of items, currently this query is not cached with an ID (maybe I need to change this and customize it via Type Policies). For example my query is

query getItems {
 namespace {
   getItems: (input: {}) {
     items: {
      id
     }
   }
 }
}

So each item under items will be cached with Item:${id} in my cache and my getItems: (input: {}) is cached in the ROOT_QUERY and the args is saved along with the query method. This is currently wherein lies my difficulty. Manipulating the normalized cache is straight forward but I do not have a single cache for my Items:${id} just the method with the args, and my issue is trying to use cache.modify to modify the data, and or adding items to this specific cache as I currently do not know how to identify the cache in the format of getItems:(input:{filter:{id:''},sorting:{}, limit: 10, offset:0). Modifying a single item is easy as my cache is normalized with _id or id and __typename but a list of items I’m having problems with.

Last I checked the LRU cache was a simple map, where the key was <type>:<id>; you should be able to use that to create your own cache entries for this. I don’t think the built-in cache caches non-Node types by default, so your list of items can’t be cached because it doesn’t have an ID.

I would probably cache a list of IDs based on a hashed stringify of the variables. object-hash is a decent lib for this.

For example, you could create a cache entry for something like getItems:<objectHash({ args })>, where <objectHash({ args })> is used to generate an ID for the query.