DataSource Initialization in AS4

sailesh · February 3, 2025, 6:02am

I’m facing memory leaks and it seems like related to the way we initialize our datasources.

As per this issue #6047, we’re creating a new DataSources object on each request, which we were doing in AS3 as well. But the difference was that, we’re instantiating the datasources and passed the object of RESTDataSource sub-class.

Now in AS4, we’re creating a new instance for each request, which will impact If we do some heavy operation in constructor of the RESTDataSource subclass, which is our case. We’re following Circuit breaker pattern (opposum npm library). By doing this, we observed a significant hit to our CPU & Memory usage as for each graphql request, DataSource subclass is getting instantiated, which indeed creates a new instance of CircuitBreaker as well.

Though this can be avoided If we store the CircuitBreaker instance as a static property of the DataSource subclass, wanted to understand, what is the cache obj that we’re passing to DataSources in expressMiddleWare?

To check the behaviour If we don’t pass cache to DataSources, I moved the logic of instantiating the DataSources outside of expressMiddleWare callback and passed the instantiated DataSource objects in callback and I didn’t notice any change in caching behaviour. We’re using responseCachePlugin & I validated this by logging the cache.valList in node_modules/@apollo/server-plugin-response-cache/dist/cjs/ApolloServerPluginResponseCache.js in requestDidStart.

Please let me know If I’m missing anything and checking the cache at a wrong place or the cache obj passed to DataSources is used for a different purpose.

sailesh · February 3, 2025, 6:17am

Found this post which partially answered my question here Why do we need a *per request* based data source instance?.

Still not clear on the cache Object passed to DataSources in expressMiddleWare…

shanemyrick · February 3, 2025, 5:45pm

Hi @sailesh, In Apollo Server v4 we no longer create Datasources on a per-request. See our migration guide to learn more, but the short answer is that we leave it up to you to decide what is going into the context per-request and you could provide singleton references if you wanted. If you did want to replicate the behavior of ASv3 you could, but it is not required.

David answered this in the linked question about why there is two different caches: Why do we need a *per request* based data source instance? - #3 by glasser

The cache in expressMiddleware function is the top level shared HTTP cache

sailesh · February 3, 2025, 6:54pm

Hii @shanemyrick

Thanks for the clarification. But I’m still not so clear on

Because I see my constructor of RESTDataSource subclass is running, everytime I hit a query belonging to that datasource, which wasn’t happening in AS3.

shanemyrick · February 3, 2025, 8:04pm

@sailesh If you are seeing your constructor being invoked on every request that in ASv4 that is because you must be calling it from the context function when creating your server.

For those who rely on the ASv3 behavior you can still do this if needed but it is not required

const { url } = await startStandaloneServer(server, {
  context: async ({ req }) => {
    const token = getTokenFromRequest(req);
    const { cache } = server;
    return {
      token,
      dataSources: {
        moviesAPI: new MoviesAPI({ cache, token }),
      },
    };
  },
});

sailesh · February 3, 2025, 8:54pm

Understood, But I’ll loose the top level shared HTTP cache this way.

As explained in #6047:

The AS3 behaviour combines the object returned from DataSource & object returned from context and calls the dataSources.moviesAPI.initialize({ cache, context }) with both shared cache & context.

In AS4, If I want to use the singleton pattern, how do I pass the cache to my datasources?

Are you referring to use the plugins approach mentioned in migration guide?

If yes, would like to know If plugins approach is a long term solution and whether it will be available in upcoming major version upgrades of AS as well.

@shanemyrick

shanemyrick · February 3, 2025, 9:09pm

The plugins approach you references is another way to call initialize on every request. You can do that if you want to invoke some function every time and our plugins API is fairly stable, but I can not guarantee a future major release would not have breaking changes as the primary reason for a major version would be to make a breaking change.

However, if I was building my server I would instead just reuse the context function instead and save a reference to any top level objects you need there so then in the datasources you can read all the values in the context (like a reference to the cache) per request

sailesh · February 4, 2025, 4:41pm

Can you please share a snippet explaining the above? @shanemyrick

As mentioned in the first question, I tried instantiating the datasources once and passed the reference in expressMiddleWare, like below:

const dataSources = {
  moviesAPI: new MoviesAPI()
};

app.use(
  '/graphql',
  expressMiddleware(server, {
    context: async ({ req }) => {
      const { cache } = server;
      return {
        headers: req.headers,
        dataSources: dataSources // how to pass cache here?
      }
    }
  })
);

But in above snippet, it’s deviating from AS3 way of initialization. The initialize() in AS3 gave us the flexibility to instantiate once, but make sure the context & cache is passed to DataSources per-request.

Can we achieve that in AS4?

I want to see If we can leverage the AS3 behaviour in AS4 without going through the plugins approach, as it might be an overhead for us.

shanemyrick · February 4, 2025, 6:54pm

You can create the cache and reference it when creating the context per request

const jsonApi = new JSONPlaceholderAPI();
const topLevelCache = new InMemoryLRUCache();

const server = new ApolloServer({
  schema: buildSubgraphSchema({ typeDefs, resolvers }),
  cache: topLevelCache
});

const { url } = await startStandaloneServer(server, {
  listen: { port: 4001 },
  context: () => {
    return {
      dataSources: {
        jsonApi
      },
      cache: topLevelCache
    };
  },
});

Or in your example you could update to this

app.use(
  '/graphql',
  expressMiddleware(server, {
    context: async ({ req }) => {
      const { cache } = server;
      return {
        headers: req.headers,
        dataSources: dataSources,
       cache // This is how you set the cache in the context, then read it from context in dataSources/resolvers
      }
    }
  })
);

Feel free to start here as an example: GitHub - apollosolutions/example-subgraph-rest-datasource: Example ApolloServer with RESTDatasource

Topic		Replies	Views
Accessing context values in Datasource with AS4 Server server	7	1429	January 24, 2023
Apollo Server 4 and apollo-datasource-mongodb Other Apollo Topics	6	3038	February 6, 2025
When use the Apollo Client server side rendering, How to share the cache in cluster server (k8s)? Client SDKs client , server , web	8	449	May 1, 2024
What is cache different with redis Server server , odyssey	3	432	February 6, 2025
Cache Not Working With Redis and ApolloServer 4 Server server	2	1432	February 6, 2025

DataSource Initialization in AS4

Related topics