I appreciate this feedback thread, thank you. This shift away from serviceList has caused us many hours of investigation time and our production graph is still running v 2.22.2 to retain certain functionality. We are really looking to the future and are hoping we can find a new solution as are very interested in getting our bits up to latest versions, I may break out in hives being so far behind.
We run a multi function GraphQL endpoint in GCP’s Google Cloud Run which contains 6 child, sub-graph endpoints and a Gateway endpoint. All of our code is open source in GitHub. CI/CD is managed in GitHub Actions where approved PR’s to main are deployed to Google Cloud Run, verified, and finally the gateway is instructed to reload the schema. For this most part this has been a flawless approach for the last two years.
With all of our graph end points running in GCP’s Google Cloud Run, the paradigm to a polling system has been a hard pill to swallow. Being a containerized function having the process poll every 10 seconds seems like a waste (and may have other issues if it is a background process). Our current configuration utilizes the function apolloGateway.load() to instruct the gateway to refetch its schema. In most cases this works fine, and we prefer the push/trigger approach to updating the gateway instead of always polling. It would be preferred to have managed federation trigger the gateway when there is a schema change and allow it to grab the latest supergraph, over continuously polling to see if there are changes. apolloGateway.load() was removed in federation v2.23.
I have read the limitations list for serviceList and can’t disagree with them. Though in my experience with a federated gateway if an endpoint doesn’t respond when the graph fetches (and rebuilds) the schema the api will probably be broken as well as the are many extended types between the sub-graphs.
So what we are seeing is there are only two paths for a federated gateway going forward, managed federation and composing a supergraph. I have already explained one hesitation we have with managed federation approach. I am guessing we can add scripts to our CI/CD process to build a supergraph and redeploy or restart the gateway, though honest that does feel wrong and error prone.
We considered restarting the gateway when there is a sub-graph update, though this is actually not a straight-forward thing to do in Google Cloud Run. And now knowing that serviceList will be going away, this will be a sticking point for trying to use a supergraph as well.
In summary, after getting all that out of my head, I am not strongly against using managed federation, but with the environment we run in, polling is a really poor approach. In addition I find it frustrating that a third-party service must be used to properly utilize a software library like Apollo Federation.
Again, thank you for providing this space for feedback.