gRPC microservices -> Federated Apollo

Looking for folks out there with relatable experience concerning migrating (or alternatively reconciling) a gRPC microservice ecosystem to utilize federation. We’re a SaaS company with a large and complex enough entity set to see the value, especially long-term, in transitioning to a federated structure (for the known benefits of clear divides for ownership, no monoliths, and general scalability).

I have done some preliminary research and so far the most promising solution I have found that allows for keeping the gRPC services (mostly) as-is while still swapping to federation is GraphQL Mesh, so if anyone has specific comments on that tool as a whole that would be helpful.

Also open to hearing arguments for why holding onto gRPC is not worth the trouble if anyone thought about trying to do a similar type of proxying/bridging and ultimately chose to go full GraphQL layers for all their microservices when moving to federation.

Cheers and thanks in advance for your time!

Hi James! I’m a relatively new Solutions Architect at Apollo and I have some experience with GraphQL/gRPC/federation from my time at Square. Here’s my current take on the subject (which is still evolving and not an official Apollo position!)

gRPC is solid tech and I see no reason why you’d need to entirely abandon it. It’s especially great for efficiently and synchronously shipping bytes between services in the same domain (or bounded context, in DDD terms).

Once you leave that single domain and start supporting multiple end-user apps (that require data from multiple domains), gRPC becomes difficult to optimize and difficult to evolve.

  • The RPCs are all-or-nothing (unless you invent your own query DSL in protobuf) and,
  • You can’t compose multiple RPCs together (so clients have to manage a sequence of calls to multiple services to fetch everything they need.)

The ideal architecture, to me, is that a team/org within your company owns several microservices for different aspects of their domain and uses gRPC to communicate between them. They also own an GraphQL service that exposes the “public API” (not public to the world, just public to the rest of the company) as a GraphQL schema. The schema acts as an abstraction layer over your domain’s microservices, making it easier to refactor and optimize your underlying services while not changing the API that other teams and end-user apps rely on.

The notion that the GraphQL schema is an abstraction layer makes tools like GraphQL Mesh less attractive to me. It’s great if you just want to get things started quickly (like Airbnb did) and have your GraphQL schema map directly to your RPC interfaces. But:

  1. Your GraphQL schema will have to change at the same rate as your RPC interfaces, making it harder to carefully evolve either.
  2. Sticking to GraphQL schema design best practices and idioms will be harder.
  3. You’ll need extra configuration or logic to take advantage of GraphQL’s compositionality.

IMHO, there’s no silver bullet for making a GraphQL interface on top of another API. The tools that purport to allow this will restrict you from making the best API possible today, and from being able to gradually evolve it in the future.

I recommend looking at graphql-kotlin or DGS for hand-writing GraphQL services for your domains. They can be thin, efficient, and relatively easy to write and test, and also set you up for success in the future as your microservice landscape changes over time.

And of course, I doubly-recommend using Apollo Federation to compose all your GraphQL services into a single GraphQL API for end-user apps. :grin:

Hope this helps! Looking forward to your feedback as I continue to hone this argument.

3 Likes

Wow @lennyburdette - Thank you so much for taking the time to give such an informed, thorough response!

So to oversimplify what you’ve stated - basically toss the idea of a 1:1 gRPC service/GraphQL subgraph relationship in favor of a meaningful group of services that then bubbles up to a federated gateway via a unified subgraph. Have teams own those clusters/subgraph and, should one cluster of services need to reach another separate cluster, this would be done via the cluster’s exposed API (and if the services are within the same cluster, gRPC comms between these services instead).

If I have it right, then just a couple of points of clarification from me:

While we do have pre-existing gRPC services to consider, it sounds like you are suggesting that this approach is not, in your opinion, the optimal structure overall, so would you recommend new services we need to make fresh omit the gRPC aspect and instead become pure GraphQL layers?

I know that Netflix also started with Kotlin and went on to expand their work to accommodate Java (the DGS bit) as well, but we currently have a majority of our services comprised of Node/TS atm and so apollo-server was presumably going to be our choice by default - any shortcomings you care to mention with that or were you advocating for those other libraries purely due to your own research/experience?

Possibly! It depends on the new service’s responsibilities and dependents. If the point of the service is to provide data for end-user apps, it probably makes sense for its only API to be a GraphQL schema that you incorporate into a federated graph.

Ya know, I just assumed that you weren’t on NodeJS because of gRPC. I don’t have much experience working with gRPC-node so I jumped to suggesting JVM options because they’re so mature. Apollo Server is an excellent option for your subgraph services, especially if you’re using TypeScript so that your resolvers can take advantage of the protoc-generated client code used for talking to downstream data services.

1 Like

@jamesjenkinsjr Just wanted to chime in here also, gRPC is really your datasource and it’s going to be exposed through a GraphQL server. In this case, your first GraphQL server will have gRPC datasources and could be a subgraph of a federated graph.

I’ve seen lots of teams use gRPC in Apollo Server, most of them are done in some custom fashion, but I recommend implementing something on top of the DataSource so you can take advantage of caching in the future if you want. There are some community examples of doing this and I would recommend a similar pattern since you’re thinking gRPC-node. I drew up a simple image of what that could look like architecturally:

test

1 Like