Feedback thread: Future deprecation of serviceList in ApolloGateway

Using serviceList with poll interval setting for the dynamically created environments. Can’t move to the managed schema with described CI workflow. Details are here: Create/Delete graph variants using Rover CLI · Issue #722 · apollographql/rover · GitHub
In general: we are blocked by graph variants that couldn’t be managed using Rover, in my opinion each of stage envs should have its own variant to provide its implementing service address.

2 Likes

Thanks for the feedback. We agree and we’re going to look into adding this functionality in an upcoming version of Rover.

1 Like

Thanks for sharing your use-case. We appreciate the feedback and we certainly want to help you traverse any obstacles you’re still finding.

Using the rover supergraph compose command with a corresponding configuration file (which, in addition to pointing to a graph reference in the Apollo Studio Registry, can point to a subgraph’s local SDL file or its introspection endpoint), your schema never leaves your environment. In this mode, you don’t need to use Apollo Studio or managed federation.

We have a number of customers that work in spaces that need to be protective of their schemas (e.g., field names that reveal secret business functions). In practice, we’ve found that many customers in this space can still use the Apollo Studio Registry after getting it approved since they also usually have other demanding operational requirements and since having a source of truth offers visibility and accountability into how the graph is evolving. We’d encourage you to get in touch with us directly to discuss these organizational obstacles, though certainly understand if that’s an obstacle that you’re very familiar with already! :grinning_face_with_smiling_eyes:

In a similar spirit to particular operational constraints, users (both large and small) have found that serviceList was a bit brittle for their liking due to real-time composition happening within the Gateway at startup time (which can itself be slow on large graphs since composition is CPU-intensive) to subgraphs that may be in various states of evolution. This was fraught with runtime validation errors (and thus failures to compose, and thus inoperable gateways). Most difficultly, it was not a static artifact that could be analyzed and validated in pre-flight and something which stuck around for later analysis, for example, after an outage. Since pre-compiled supergraph files resolve these concerns, we believe it is a stable direction. I should note that the existing managed federation still did still do composition within the gateway, but the registry acted as the source of truth, so Gateways with schemas that didn’t validate were not

That’s all to say, I’d still suspect you would benefit from the use of a supergraph file! It’s possible we haven’t made the benefits of supergraphs over serviceList clear enough so far, so I hope this helps a bit.

To dig into one of your struggles a bit, you mentioned that the process was “super difficult”, has “huge overhead” and “adds layers of complexity” — can you elaborate on that? To build the supergraph you should merely need to run a rover supergraph compose or, if using managed federation, rover subgraph push command. We’ve looked at a number of workflows in designing this and they’ve all tried to be considerate of CI/CD environments. Can you help me understand?

We hope you can get up to date too! We don’t want that! It’s worth noting that the serviceList functionality is merely deprecated right now, and it should still be working as it did since managed federation was first introduced. The load() function, on the other hand, has been more of an implementation detail since we introduced native support for gateway on ApolloServer itself (thus de-necessitating load), so perhaps that’s what you’re finding difficult here is related to load? If that’s the case, there are probably some other experimental hooks that can help you solve it. I’d encourage you to open an issue or Discussion on the Apollo Server GitHub repository if you’re finding it problematic.

I agree! The polling is particularly less ideal from our side of things too since users can have massive fleets of Cloud Run containers polling for updates! Further, in terms of transmitting the signal with the new schema, this also poses a similar challenge. (e.g., if they were to receive webhooks we would need to know where they are). My hunch is also that Supergraphs actually let us move in a better direction here, actually, but there’s some implementation details that are worth chatting about still.

I’m curious if you’ve experimented with Google Cloud Run’s ConfigMap service? I believe this may function in a similar way to Kubernetes ConfigMaps where they can be mounted as volumes and the files on those volumes can be watched.

In this case, a configuration that might be worth entertaining here is having your Gateway “Watch” the config-map volume which has a supergraph file on disk. With your Gateway fleet watching that volume and supergraph file, you could update it using rover — writing the updated supergraph when the new subgraphs have been deployed and having the gateways roll-over. The Gateway doesn’t currently support “watching” supergraphs in this way, but it’s something we’re considering adding. (For those that do want their Gateways to update reactively, we do know that this functionality works conceivably well in Kubernetes, so it’s really a matter of whether it works for you on Google Cloud Run.)

There’s also a whole subset of users who are using tools like Argo to manage their deployments who really prefer to avoid the fully-reactive updating and opt instead for blue-green deployments that gradually roll over (and back), and we think that Supergraphs also help there, but that’s a longer discussion probably outside of the scope of this thread!

Do you have any CI workflows running on your subgraphs? If your subgraphs merely registered when they deploy — using rover subgraph publish — the supergraph would be updated automatically plus it’d let you know if you’re breaking client operations or if the supergraph didn’t compose successfully.

Introspection is an action taken against a running service, but you don’t need to use rover subgraph introspect, you can directly compose the supergraph either by publishing the subgraph’s SDL file with rover subgraph publish (to generate the supergraph in Studio) or locally using rover supergraph compose with a configuration file. If you have multiple .gql files for a subgraph, you can often just concatenate them, e.g.:

$ cat schemas/*.gql |
  rover subgraph publish my-supergraph@my-variant \
    --schema - \
    --name accounts \
    --routing-url https://my-running-subgraph.com/api

We currently don’t natively detect gql tags and extract template tagged literals in Rover — this can get tricky since you can interpolate dynamic values within them which we discourage — but if you can use .graphql files you should be good. Also, since rover accepts pipes of STDIN for schemas — and you take care to not interpolate values — you could also just use other tools (from npm) that let you extract gql tagged template literal contents and either write them to files or pipe them directly into Rover’s rover subgraph publish command.


Addressing some of your bullet-points in your follow-up response (Thanks for those thoughts!):

You can build this, but we think we can provide excellent tooling that helps enable it. At the very least, the supergraph file is intended to be your artifact (that’s definitely been one of it’s design principles!).

You would need to be considerate of whether the subgraphs themselves are in an unexecutable state though, which is something we’re considering workflows on how to facilitate and orchestrate. We think Studio and Rover can both help here, and Kubernetes is definitely a primary workflow considering.

There are also other non-Apollo open-source DevOps tools that can help coordinate these things, too. We do think that our managed federation can help avoid needing to roll a lot of this on your own though, and roughly what you’re describing is part of our free offering (and backed by cloud storage, just as reliably).

That said, I do think rover can help you build many parts of this on your own using well-defined and well-tested interfaces that have been purpose built to be defensive!

In a similar spirit to what I wrote above, I think Rover can help you if you want to build this. I’ll also note that Kubernetes ConfigMaps are a great way to hot-reload configuration on Pods that are deployed. A Gateway could conceivably watch a file and you could have a separate process merely update the supergraph (via Rover) and have all the Gateways reload. This watching functionality doesn’t exist yet, but you could build it yourself, and we’re riffing on some workflows that might take us there.

We have mechanisms for the first two of these bullet-points already that are utilized by our Registry, and we’re considering more durable hand-shaking between Gateways and services once we work out some runtime environment details where that can be tricky. Good idea though!

I think we probably need to take some more time to document and write about these workflows, so I’m glad we’re having this conversation. I do think there’s a blend here that we’ve been refining in our own iterations on this both internally and with large customers that’s becoming more crisp over time.

Sure, you can do everything without Studio and Rover the Registry or any DevOps tooling, but I think it’s worth being cautious about how much you roll on your own. Rover offers a lot of free functionality that you’d just have to rebuild yourself. We’re purpose building Rover to be part of specifically these workflows so if you’re not finding it at all useful, that’s surprising to me. (We’re putting a lot of time into evolving all of these tools to solve pretty much all of the challenges you’re describing!)

Runtime is not where we want most of this to happen either since that’s more difficult to analyze and be confident in and more challenging to analyze later! CI and CD, however, is where most of our users seem to want instrument this stuff since it allows them to be really certain prior to roll-outs that they have a defensive, well-tested, and reproducible build going into production.

Typically, we haven’t found introducing a command here or there in CI or CD workflows to be a particular challenge, but I’d be curious to understand your challenges with introducing preflight and static build commands to your existing CI/CD workflows.

Ok, I think I hear you. We’ve touched on a few subjects in this post, which I hope were helpful and enlightening and this is great feedback, so thanks for sharing. I think we’ll get to the right blend eventually, but it will hopefully be all the right amounts of flexibility, build steps, tooling, webhooks, etc. :smiley:

Thank you for the reply, it is much appreciated

Yes our difficulty is related to [the removal of] load (who moved my cheese?) and finding a new way to update our gateway on sub-graph updates. I did ask the question on this forum and it went not responded to, so perhaps the question belongs on GitHub.

What are the “hooks” you refer to, is there documentation you can point me to?

The idea of having a supergraph monitored by one or more gateway functions appears to be a good approach. I am still thinking through how && when we rebuild supergraphs in our CI/CD process which currently runs on subgraph merges.

This is not something we have experimented with. We are using the fully managed implementation of Google Run which doesn’t appear to have this option. We have been toying around with the idea of keeping a supergraph file in a bucket and having the gateway instances monitor that, or since monitoring the file isn’t an option currently, redeploying the gateway when the supergraph changes using that central supergraph file.

This is an interesting approach though perhaps overkill for us at this point.

1 Like

Do you have any CI workflows running on your subgraphs?

Not right now; we just do a normal deployment for kube, specifically using helmchart. For validating the schema without issue in prod, right now we use tooling such as Tilt which allows you to easily spin up workloads in kubernetes locally, with images from either a registry or by building locally. For our gateway, we have references to our child services, deploy everything (and their dependencies) recursively (not as crazy as it sounds), and then spin up the gateway when everything comes online. Take about 5 minutes, requires 1 command, tilt up.

In production, we log to DataDog for issues with the gateway, but we’ve never had a schema fail to compose in prod. We have about 15 services attached to our gateway.

Introspection is an action taken against a running service, but you don’t need to use rover subgraph introspect

Sure, but I don’t currently give my CI jobs access to a running server to play with, and I don’t expose ingress for the child services, only the gateway. I would have to spin up the service in CI and perform introspection there, or I would have to give our CI job access to said service.

Being able to use rover against static files would probably be ideal for me. If the gateway is going to be given a static schema at runtime, I don’t really see why it would be weird to do the same for the child services; instead of loading a directly of .gql files, I would just load one pre-built schema from rover.

That said, spinning up a service locally that suddenly requires an artifact is probably a non-trivial change to the local workflow. If that’s what you need for the gateway with rover, then yeah, that’s another thing on the list to migrate.

You can build this, but we think we can provide excellent tooling that helps enable it.

There are also other non-Apollo open-source DevOps tools that can help coordinate these things, too.

The reason that I personally went with the API of a bunch of hooks on the server/gateway, is that it would allow pretty simple plugins for certain use cases. Want things to use S3/GCP/etc? Use this utility pack for that use case which provides a few ready-made hooks, similar to things like graphql-scalars.

That way, you can let the ecosystem do all of that work for you and let the packages duke it out until maybe they get adopted by The Guild or Apollo or the like; the only thing that would need to be really consistent is the individual hooks’ inputs and outputs and the core federation workflow.

I’ll also note that Kubernetes ConfigMaps are a great way to hot-reload configuration on Pods that are deployed.

I’m not sure if I’d want to use this, as I think there might be certain cases, such as a rollout in progress with a schema change, where I wouldn’t want to globally trigger an update. ConfigMap is a solid option, though, I’d just need to be able to specify to the gateway instances what I’d like them to do, such as reload from the ConfigMap in a rolling fashion, so as to not have all the gateways update at exactly the same time.

I think it’s worth being cautious about how much you roll on your own. Rover offers a lot of free functionality that you’d just have to rebuild yourself.

I agree that deviation is obviously something you can go too far with, but in general APIs exist to accommodate easy deviation, because “deviation” there is often the actual work being done.

Right now it just looks like rover would require us to do things quite differently from the way we already do it, so to us it doesn’t really matter whether we do it in CI or “roll our own” via hooks, both of these approaches would require work in order to make sure it will work with our existing things (such as networking rules in CI). The hook approach, to me, has a lot more power, a lot less restriction, and would be really easy to have 1 dev sit down and noodle on it, rather than having to get devops and dev to figure out this process together, and probably the ongoing communication forever after.

Runtime is not where we want most of this to happen either since that’s more difficult to analyze and be confident in and more challenging to analyze later! CI and CD, however, is where most of our users seem to want instrument this stuff since it allows them to be really certain prior to roll-outs that they have a defensive, well-tested, and reproducible build going into production.

If your service needs to be running in order to use rover, how is this particularly different from doing it at runtime? Seems like the same process with extra steps, and things being “farther away” due to being in CI. Sure, it’s safer because you can stop the CI job beforehand, but you could do the same thing at runtime and create actionable alerts just the same. Seems like the difference between rover and runtime right now is that rover forces these steps to be made transparent, whereas at runtime these individual steps are not exposed; it’s all one step right now to my knowledge, so of course you can’t stop it early.

My concern is that once you implement in CI, it’s a lot harder to remove and change than at runtime. Depending on your organization, any future workflow changes, regardless of how much it improves things, are just harder to do because of the communication involved. That, and removing such a workflow is also way harder too, and since now it would cross-cut teams, the diffusion of responsibility would be much more likely to kick in, causing teams to take forever to upgrade in the event of a breaking API/workflow change.

1 Like

Thanks for your feedback. Just wanted to jump back in with one note:

I feel like one of the things I noted above may have been missed, which is that this is not the case — that’s what I meant by this suggestion:

There is no runtime subgraph here. This is coming from a file. (Note that --routing-url is the location where the graph can eventually be accessed at runtime, not where it is running right now)

1 Like

There is no runtime subgraph here.

Ah, ok. Sorry, my mistake.

1 Like

Thanks for your follow-ups!

Documentation is lacking but take a look at this code. I can’t promise this will always be around, and we’re working on covering the use cases through a more principled API.

On the point of going too far with “rolling your own”, it seems to me like a simple S3 bucket for schemas is not unlike the basic concept of a schema registry, so I totally see your point.

However, it seems like some people in this thread would be unable to use the schema registry for security or legal reasons, at which point such a custom schema registry might be in scope for them.

Obviously that would increase scope on an operational level, but I imagine there’s overlap in setup between what you would need with a custom schema registry, and rover.

Not saying that it’s a great idea, per se, but that is to say that an internal schema registry might be in scope for some organizations. We don’t use the schema registry right now, so not really sure what it would need to work. Assuming that you can’t use Apollo Studio for whatever reason, and therefore don’t need the advanced features, it seems at a glance that it wouldn’t be that much work to have a simple registry.

Hi,
I’m using managed federation here, apollo-studio works and in our company’s case, this wouldn’t necessarily impact anyone at all.

But I would like to point out that, managed federation doesn’t work for everyone, while it works for many customers, it doesn’t work for everyone, there are enterprises who won’t want a dependency on different platform, there are security requirements in some companies, etc.

I do however suggest there be a in-house hosted version of apollo-studio which enterprise can deploy by themselves. Similar to how github-enterprise works. (Unless enterprise plan is exactly that)

Without that if I were to do managed schema, I’d have no option whatsoever.

One option always exist that the graphql schema of apollographql is quite open, and I’m sure people looking at schema can build a schema registry by themselves, but I think apollo team providing a hosted solution is definitely one of the thing you should look into :slightly_smiling_face:

We use managed schemas for our integration and production environments, but I’m not sure how this is supposed to work for local development. Currently, we pull the repositories for each of the sub-graph services, start them up and then start the graph gateway to pull the schema from each of the local instances, running on our laptops. (We’ve added custom code to the gateway keep polling the subgraph services until they’re up). Is the idea here that instead of the gateway pulling the sub-graphs automatically, we’d have to add a manual step to do the same thing, so we could provide that unified schema to the gateway?

In other words, how do you see graph federation development working for developers running on their individual machines?

Hi @StephenBarlow @abernix

We are looking at Apollo Federation at the moment and investigating with our internal teams if we can use managed federation for security/legal reasons. I also looked at alternatives if we cannot use managed federation including generating the supergraph file using Rover and got that working in a POC. We could have that as part of our CI/CD process and push the supergraph file to AWS S3 or similar. For the gateway to pick up any changes to the supergraph file I found some “experimental” hooks on the GatewayConfig object. What is happening with these “experimenal” properties and are they going to stay or go? See below example.

gatewayconfig.ts

import { GatewayConfig } from '@apollo/gateway'
import fs from 'fs';

export const gatewayConfig: GatewayConfig = {
  experimental_pollInterval: 10000,
  experimental_updateSupergraphSdl: async(config) => {
    
    console.log('reading supergraphSdl file');
    // this could be pulled from say AWS S3 or similar
    const supergraphSdl = fs.readFileSync('prod-schema.graphql', {encoding: 'utf-8'})

    return {
      id: new Date().toISOString(),
      supergraphSdl
    }
  }
};

index.ts

import 'reflect-metadata';
import { ApolloServer } from 'apollo-server';
import { ApolloGateway } from '@apollo/gateway';
import { listen as userListen } from './user-subgraph/index'
import { listen as transactionsListen } from './transactions-subgraph/index'
import { listen as paymentsListen } from './payments-subgraph/index'
import { gatewayConfig } from './gatewayConfig'


async function bootstrap() {

  const gateway = new ApolloGateway(gatewayConfig);  

  const server = new ApolloServer({
    gateway,
    tracing: false,
    playground: true,
    subscriptions: false
  });

  await Promise.all([
    userListen(3001),
    transactionsListen(3002),
    paymentsListen(3003)
  ]);

  server.listen({ port: 3000 }).then(({ url }) => {
    console.log(`Apollo Gateway ready at ${url}`);
  });
}

bootstrap().catch(console.error);

Would be great to get a response on this and see if this solution would be viable going forward.
Many thanks,
John Gobl

So, going to share a bit of a hot take here. :stuck_out_tongue: I want to start by being VERY CLEAR that this is my attempt to be honest and not to be hostile. I love Apollo and the products this team pumps out!

I’ve been a massive fan of federation and my team was a very early adopter of it. I understand the advantages/disadvantages of managed versus unmanaged but thus far we have gone with unmanaged for three reasons:

1.We like the dynamic nature of being able to deploy a single microservice and the changes in schema be picked up immediately… especially when doing local development. It’s nice to not have to restart services.
2.We don’t have any plans to utilize Apollo Studio for a variety of reasons.
3. There isn’t a stable, well-documented way to do home-grown managed federation.

What worries me about this deprecation, and the direction I’ve seen Apollo seeming to go, is it seems that there is a push for “use our paid services or else”. This is a major step back from the typical monetized open source model that I typically see where it is more “Here’s an awesome open source product. Here are additional features/functionality/support that you CAN use to add additional value.”

This change to remove serviceList, with a lack of stable and documented way to NOT use studio, is anti-open source, in my opinion, because it leaves zero options for consumers who are not in the Studio ecosystem.

For this to be “fair” open source product that doesn’t back consumers into a corner and that doesn’t leave consumers stranded, consider either…

  1. Leave serviceList as is or
  2. Provide VERY CLEAR alternatives for consumers not using the Studio ecosystem.

Again, trying to provide this outside prospective as a consumer who loves the library. :slight_smile:

6 Likes

Hey all, firstly thank you for everyone’s input. I know this is a hot topic and I appreciate everyone’s willingness to contribute to the conversation.

I don’t think what we’re proposing is a contentious change, so I’d like to make at least this bit perfectly clear in case there was any misunderstanding. We have every intention of supporting past functionality with the replacement we’re proposing. Use of the word “deprecating” may have sent the wrong message, since our plan is to replace this bit of API with something entirely more capable than just polling your services. The new API should be inclusive of all existing use cases while supporting a reactive model that we’d like to recommend going forward.

I’ve opened a GH issue and expect to start work on this change as soon as I reasonably can, which realistically should be some time this month. There’s something of a proposal written in there which I’m happy to refine/concretize if anything is unclear, but I think there’s enough there for others to chime in and provide feedback. I’m very interested in hearing if there are any gaps for existing use cases, so I encourage anyone to look and participate.

https://github.com/apollographql/federation/issues/1180

4 Likes

This change has landed to the v0.x branch and is available as an alpha for testing. I encourage anyone interested in these changes to try it out and provide any feedback! I expect to release this on both v0.x and v2 late this week / early next week.

npm install @apollo/gateway@0.46.0-alpha.0

The PR: https://github.com/apollographql/federation/pull/1246

Thanks again!

4 Likes

This is now officially released for both latest branches of the @apollo/gateway package :tada:

v0.46.0 (latest tag)
v2.0.0-alpha.4 (latest-2 tag)

1 Like

Hey!
Didn’t read through the entire thread but I assume it is related – we are currently using serviceMap to identify our subgraphs in runtime and fetch data from our database according to the subgraph that the request is made to, I elaborated more on this stackoverflow topic →

https://stackoverflow.com/questions/69988248/apollo-graphql-dynamic-authenticated-subgraphs

Where I want to create dynamic flow in accordance to the required subgraph, and I see that internally serviceMap is using serviceList in its implementation – is it also on the way to deprecation?

Hey @Avivon, I’m not quite sure I follow. I don’t see where you’re making use of serviceMap in your example.

I think I see where you’re referring to serviceMap depending on serviceList, however I think that particular serviceList is not the same as the one that’s public API and deprecated.