How to structure a graph and queries with lots of repeated data

Hi!

I have a application where the same data is present in many places in the graph and need to optimize the data queries to avoid processing and sending the same data too often.

As an example consider the following pseudo schema:

type Group {
  name: String
  members: [Person]
}
type Person {
  name: String
  follows: [Person]
  followedBy: [Person]
  contacts: [Person]
  groups: [Group]
  bookmarks: [Bookmark]
  sentMessages: [Message]
  receivedMessages: [Message]
}
type Message {
  text: String
  author: Person
  recipients: [Person]
}
type Bookmark {
  message: Message
}
type Query {
  user(auth: String!): Person
}

Querying a users data can easily contain hundreds, if not thousands, of Person-objects even though it the small circle of friends/contacts/follows only contains tens of distict users.

In my real implementation about 80% of each GraphQL query (in bytes) is redundant and considering that the client does many different queries in the same space above 90% of all data transferred and processed is redundant.

I have tried to replace the Person-objects in the queries with only its id and fetch the full Person using a query colocated in components like so:

type Group {
  name: String
  members: [ID]
}
type Person {
  name: String
  follows: [ID]
  followedBy: [ID]
  contacts: [Person]
  groups: [Group]
  bookmarks: [Bookmark]
  sentMessages: [Message]
  receivedMessages: [Message]
}
type Message {
  text: String
  author: ID
  recipients: [ID]
}
type Bookmark {
  message: Message
}
type Query {
  person(id: ID!): Person
  user(auth: String!): Person
}

The Person.contacts field is left as [Person] as an optimization because it is very likly that these are the Person-objects referenced in all the other places.

However this feels like working against GraphQL and Apollo as I would need the query in tons of components and as the Apollo cache doesn’t help me with resolving Query.user(id) to Person-objects in the cache from Person.contacts

How could I accomplish this so that I don’t have to load the same data multiple times and don’t have to complicate the client with explicit cache querying?

I’m using Apollo for both GraphQL client and server and have BatchHttpLink (as well as HTTP/2) for optimizing the querying.