Ephemeral deploys for E2E testing in staging/pre-prod environments

johnmarkham · April 8, 2025, 6:56pm

We want to leverage ephemeral schema variants and ephemeral router deployments to support a stable E2E testing flow in our staging environment.

Internally, our deployment sister team is building out our ephemeral deployment architecture. PR’s get their own isolated staging environment and are routed to via headers + service mesh.

We want to hook into this architecture to enable subgraph developers to have the same experience even when they are making potentially conflicting changes to their schema. We have some questions around some of the implementation details

Questions

How do developers deploy an ephemeral router?

The lowest friction approach is deploying the router as a container within the subgraph’s K8 pod. Unclear how the router knows which schema variant to run
Another option could be to have subgraph operators manually spin up a router deployment via CLI and pass in a “variant” flag
Any other ideas?

How does the ephemeral router know which ephemeral schema variant to run at deploy time?
When and how does the schema variant get created and published to Apollo studio?

Guessing this happens at CI time via rover supergraph compose on the super graph and specifying an override of the subgraph schema on the current branch?
Ideally we would only create a variant if the subgraph operator actually intends on deploying the change to staging

What mechanism do subgraph operators use to route to the ephemeral router and how do they discover it?

The most straightforward approach I’m guessing would just be to include the schema variant hash directly as a header, e.g. X-GraphQL-Testing-Schema: staging-382ab1c

How does the routing layer dynamically update to make new ephemeral routers discoverable? Or what information do ephemeral routers expose to make dynamic routing possible?

Presumably ephemeral routers use the same DNS CNAME
Request parameters allow some component (LB, nginx, service mesh) to route to the specific router, but how do these components know about new ephemeral routers?

How do we make the ephemeral schema variants available to FE who also want to test out the ephemeral subgraph change?
How do ephemeral schema variants get garbage collected?

Our setup

We have a federated super graph composed of many subgraphs.
Our “GraphQL Router” service wraps the Apollo Router binary in a Dockerfile along with some other containers all spun up into one K8s pod
We are using Apollo Studio and Apollo’s Schema Registry
We listen internally for subgraph deployment events in a separate service, do some processing on the schema and publish the final result to Apollo

greg-apollo · April 9, 2025, 5:28pm

@Serey_Morm did a talk on something similar to this at GraphQL Summit in 2023: https://www.youtube.com/watch?v=qy9r2FTj3yk

Serey_Morm · April 9, 2025, 7:21pm

There’s a ton of context here, trying to answer through all of these at a high level, highly suggest watching my talk that Greg shared above, and happy to chat more.

How do developers deploy an ephemeral router?

We created a workflow for developers to deploy their own dedicated Router for their branch to an ephemeral deployment environment, we use Garden.io which was an existing offering at our organization.

How does the ephemeral router know which ephemeral schema variant to run at deploy time?

Part of the triggering flow for #1 — developers provide our platform with the “branch” or schema name they’re working on and we start router using the schema file as opposed to using Uplink.

When and how does the schema variant get created and published to Apollo studio?

We have a global integration service/pipeline that every subgraph interacts with. We compose the supergraph with all subgraphs that have a matching branch name, that way we can compose all the delta schemas together. This step runs every time, it’s inexpensive to do this. More importantly, we don’t publish these branches to Studio, instead we store the composed supergraphs to a static file bucket.

What mechanism do subgraph operators use to route to the ephemeral router and how do they discover it?

We have a dedicated service to handle this via a custom header. For client developers, all they need to do is tack on the header to select the variant they want.

How does the routing layer dynamically update to make new ephemeral routers discoverable? Or what information do ephemeral routers expose to make dynamic routing possible?

We have a dedicated service to handle this. What makes this discoverable is the branch name that I’ve described above.

How do we make the ephemeral schema variants available to FE who also want to test out the ephemeral subgraph change?

For us, the subgraph owners would share the branch/variant name with the FE team, all they need to do is add the special header.

How do ephemeral schema variants get garbage collected?
As mentioned, we store the composed schemas in a static file hosting service such as S3.

waynezhang · April 9, 2025, 11:29pm

Hey Serey, thank you for replying. Just have a small follow-up regarding

How do we make the ephemeral schema variants available to FE who also want to test out the ephemeral subgraph change?

How can the FE engs pull down the schema if we never push the ephemeral schema to the uplink? Do they retrieve it directly from static storage? or do they pull it via introspecting the ephemeral router?

Serey_Morm · April 10, 2025, 12:03am

Naively we exposed it through introspection, but we learned that building an API endpoint to retrieve it directly from static file storage cut out a lot of the dependency, latency, and improved availability.

Topic		Replies	Views
Using ./router without a supergraph schema... or how to generate a supergraph schema Router	6	82	March 28, 2025
Potential undocumented breaking change in rover dev when going from 0.27.1 to 0.27.2 Router rover	8	79	February 25, 2025
Voyage II: Federating the Monolith Other Apollo Topics odyssey	2	271	January 31, 2025
Router v2 configuration (telemetry) Router telemetry , configuration	1	37	May 15, 2025
Is ApolloGateway a must for federation? Schema Design federation	3	33	May 16, 2025

Ephemeral deploys for E2E testing in staging/pre-prod environments

Questions

Our setup

Related topics