I have multiple fields each of which computes some kind of statistic / aggregate based on a large dataset I’d like to stream into the server, e.g. coming from a database.
It would definitely be wasteful to request the whole dataset in each resolver separately; making a DataSource with some caching support would be one way to avoid requesting the whole dataset at once. However, the dataset is quite large, so buffering the whole thing into memory at once isn’t an option.
Instead, I want to stream the data into the server. I’ve represented the dataset as an AsyncIterable, and I’ve written a generic wrapper for AsyncIterable that can “tee” it into multiple consumers such that every consumer must see data item N before any consumer is able to see data item N+1.
The problem now is that I somehow need to synchronize the different resolvers which each calculate different statistics on this data. Without any form of synchronization, it’s possible that one resolver begins seeing some data before the second resolver hooks into the AsyncIterable tee which would then make them operate in lock-step. This is undesirable because it will throw off the calculation of the statistics.
Is there some kind of “resolver instantiation” I can hook into on each query that would be guaranteed to finish for each resolver before any resolver’s main body would start to run? This way each resolver can hook into the tee before they begin to loop over the data stream.
Some extra details: each of these resolvers is a direct child of a “Statistics” resolver. Is there some way this parent resolver could detect how many child resolvers will run? This way it could prepare the right number of synchronized stream iterators ahead of time, and the children could retrieve them.
I was thinking this might be possible by writing a custom apollo server plugin and hooking into the
executionDidStart event for the Statistics operation. In this hook, the plugin would prepare the main stream, which would be in scope in a subsequent
willResolveField event that would obtain a synchronized replica of the main stream. However, this would approach would hinge on every
willResolveField hook completing before any of the resolvers actually start running. Is this the case?