Hi Leo! I’ve actually done a lot of work with Federation in a serverless world and I’m going to outline some of the details that expand on our recommendations. I think understanding the details will help you make the right decision:
Running an Apollo Gateway in serverless (AWS Lambda)
- The reason we recommend hosting the Apollo Gateway instance in a dedicated piece of infrastructure is because of some of the performance enhancements built into it. Specifically, you have a query plan cache that lives in the Apollo Gateway that ensures the query plan generation only happens once per unique operation hash. In my experience, I see the generation of the query plan is anywhere from ~1-50ms depending on the size of the operation overall (and number of subgraphs, how the operation queries across them etc.)
- You won’t be able to avoid the Cold start aspect with serverless, but there are some things you can do to speed this up. Most cloud providers have best practices for Node applications (AWS docs) and I’ve found that setting the
keepAlive:true will re-use lambda connections and drastically improve performance in this architecture. Think about a user refreshing a page, AWS will try and route them to the same lambda if still up/available and then the queryplan cache could get a cache hit, but for the most part, the query plan cache won’t be used in serverless because new instances will be started up. There technically could be a way to pre-populate the query plan cache, but this probably wouldn’t be that performant either as you would have to generate queery plans for all your client queries ahead of time (which you might now know all the shapes that will come if it’s a public API).
- Composition - In the latest versions of the Apollo Gateway, you’ll notice a
supergraphSDL constructor option and you can provide the composed schema instead of having the gateway compose the schema at runtime. Managed Federation from Apollo works in a similar way, the big thing here is taking composition from the runtime and putting it into some build step. Ideally you’re using the rover CLI to push into the Apollo Schema Registry and then your composition is happening in your CI process. I see this as a must if you are planning on trying any Apollo Gateway in serverless
Running subgraphs in AWS Lambda
There are probably some extra details I’m forgetting about, but this should make a bulk of it. If performance is top of your mind, running the Apollo Gateway instance in Lambda wouldn’t be the first choice mainly because of the cold start issues, but the query plan cache story is important to understand.
I’m more familiar with Azure Functions than AWS Lambda, although I’ve hosted both of them with Federation and the Apollo Platform, and there are some differences I’ve found that make it more attractive to run. Azure Functions have the ability to change the timeout, meaning when the function block is spun down. This can be a 10 minute interval, so I’ve commonly had a cron function keeping that azure function block alive to avoid cold starts. It’s a workaround, but it works very nicely actually. I think Lambda spins up functions every time a new request is received when one is inflight/spinning up and I couldn’t find the same configuration options. Maybe something I don’t know about, but just some additional thoughts when I was working through different platforms.
I hope this helps, let me know if you have any other questions or want me to expand on any areas!