Do you have any CI workflows running on your subgraphs?
Not right now; we just do a normal deployment for kube, specifically using helmchart
. For validating the schema without issue in prod, right now we use tooling such as Tilt which allows you to easily spin up workloads in kubernetes locally, with images from either a registry or by building locally. For our gateway, we have references to our child services, deploy everything (and their dependencies) recursively (not as crazy as it sounds), and then spin up the gateway when everything comes online. Take about 5 minutes, requires 1 command, tilt up
.
In production, we log to DataDog for issues with the gateway, but we’ve never had a schema fail to compose in prod. We have about 15 services attached to our gateway.
Introspection is an action taken against a running service, but you don’t need to use rover subgraph introspect
Sure, but I don’t currently give my CI jobs access to a running server to play with, and I don’t expose ingress for the child services, only the gateway. I would have to spin up the service in CI and perform introspection there, or I would have to give our CI job access to said service.
Being able to use rover against static files would probably be ideal for me. If the gateway is going to be given a static schema at runtime, I don’t really see why it would be weird to do the same for the child services; instead of loading a directly of .gql
files, I would just load one pre-built schema from rover.
That said, spinning up a service locally that suddenly requires an artifact is probably a non-trivial change to the local workflow. If that’s what you need for the gateway with rover, then yeah, that’s another thing on the list to migrate.
You can build this, but we think we can provide excellent tooling that helps enable it.
There are also other non-Apollo open-source DevOps tools that can help coordinate these things, too.
The reason that I personally went with the API of a bunch of hooks on the server/gateway, is that it would allow pretty simple plugins for certain use cases. Want things to use S3/GCP/etc? Use this utility pack for that use case which provides a few ready-made hooks, similar to things like graphql-scalars
.
That way, you can let the ecosystem do all of that work for you and let the packages duke it out until maybe they get adopted by The Guild or Apollo or the like; the only thing that would need to be really consistent is the individual hooks’ inputs and outputs and the core federation workflow.
I’ll also note that Kubernetes ConfigMaps are a great way to hot-reload configuration on Pods that are deployed.
I’m not sure if I’d want to use this, as I think there might be certain cases, such as a rollout in progress with a schema change, where I wouldn’t want to globally trigger an update. ConfigMap is a solid option, though, I’d just need to be able to specify to the gateway instances what I’d like them to do, such as reload from the ConfigMap in a rolling fashion, so as to not have all the gateways update at exactly the same time.
I think it’s worth being cautious about how much you roll on your own. Rover offers a lot of free functionality that you’d just have to rebuild yourself.
I agree that deviation is obviously something you can go too far with, but in general APIs exist to accommodate easy deviation, because “deviation” there is often the actual work being done.
Right now it just looks like rover would require us to do things quite differently from the way we already do it, so to us it doesn’t really matter whether we do it in CI or “roll our own” via hooks, both of these approaches would require work in order to make sure it will work with our existing things (such as networking rules in CI). The hook approach, to me, has a lot more power, a lot less restriction, and would be really easy to have 1 dev sit down and noodle on it, rather than having to get devops and dev to figure out this process together, and probably the ongoing communication forever after.
Runtime is not where we want most of this to happen either since that’s more difficult to analyze and be confident in and more challenging to analyze later! CI and CD, however, is where most of our users seem to want instrument this stuff since it allows them to be really certain prior to roll-outs that they have a defensive, well-tested, and reproducible build going into production.
If your service needs to be running in order to use rover, how is this particularly different from doing it at runtime? Seems like the same process with extra steps, and things being “farther away” due to being in CI. Sure, it’s safer because you can stop the CI job beforehand, but you could do the same thing at runtime and create actionable alerts just the same. Seems like the difference between rover and runtime right now is that rover forces these steps to be made transparent, whereas at runtime these individual steps are not exposed; it’s all one step right now to my knowledge, so of course you can’t stop it early.
My concern is that once you implement in CI, it’s a lot harder to remove and change than at runtime. Depending on your organization, any future workflow changes, regardless of how much it improves things, are just harder to do because of the communication involved. That, and removing such a workflow is also way harder too, and since now it would cross-cut teams, the diffusion of responsibility would be much more likely to kick in, causing teams to take forever to upgrade in the event of a breaking API/workflow change.