What is the N + 1 problem in GraphQL and how to solve it?

What is the N + 1 problem in GraphQL and how to solve it?

GraphQL is an awesome tool to build data communication between parts of your application. It is flexible, easy to use and the front-end developers love the simplicity of load different properties and sub-entities of a particular entity. But of course, as always, there is a downside. GraphQL has huge benefits, but if you start to build a little bit more complex application than a blog, you can easily meet the N + 1 problem. The great news is, this is not a problem because there are solutions out there to handle this. But to able to build fast GraphQL communication, we need to know our enemies, so let’s see what exactly is the N + 1 problem.

The N + 1 problem

Basically, the N + 1 problem is not a problem with the GprahQL but the way how GraphQL resolves the complex queries. Let’s see a basic video platform app example.

We have users in our database, the users have videos and the users can comment other videos. We have a front-end query, that loads all the videos and the comments for a particular user. The query looks something like this:

query {
  user(name: "Influencer John") {
    id
    videos {
      id
      title
      comments {
         id
         text 
      }
    }
  }
}

It is awesome! We can load all this information from one query. But there is a problem with this. The GraphQl resolves the pieces of the query line by line and if we construct proper field resolvers to get the specific child entities our query will be quite unoptimized. It is because each resolver function knows only about it is own parent, so by default, the comments field resolver doesn’t know that we want to load all comments of all videos of the particular user but it knows only that it needs to load all comments for a particular video. And the query does this as many times as many videos the user has. If John has 3 videos, the raw SQL queries look something like this:

SELECT *
FROM users 
WHERE name="Influencer John"; 

SELECT *
FROM videos 
WHERE user_id in (1);

SELECT *
FROM comments 
WHERE video_id in (1); 

SELECT * 
FROM comments 
WHERE video_id in (2); 

SELECT *
FROM comments 
WHERE video_id in (3);

As you can see we have a lot of unnecessary SQL queries here. And just imagine what if John has 100 videos, then we need to fire 100 + 1 queries to get all comments for these videos. This can cause some trouble on the backend side and make the complex queries slow. In the SQL world the ideal way would be something like this:

SELECT *
FROM users
WHERE name="Influencer John";

SELECT *
FROM videos
WHERE user_id in (1)

SELECT *
FROM comments
WHERE video_id in (1, 2, 3);

We have only 3 queries here. If John has 100 videos, the all number of the queries is still 3. Can we do this, or should we leave GraphQL because of the slow queries? Spoiler: of course not. GraphQL is still great, and I am going to show you two possible solutions for this problem.

DataLoader

DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.

So basically, DataLoader is a tool helping to organize the queries into batches that makes the data loading more efficient, as we saw above. And top of that it, provides a caching solution that makes it possible to save more time on the queries.

Let’s see how the batch loading works and how we can use this, to solve the N + 1 problem. As the documentation says

A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values or Error instances.

async function batchFunction(keys) {
  const results = await db.fetchAllKeys(keys)
  return keys.map(key => results[key] || new Error(`No result for ${key}`))
}

const loader = new DataLoader(batchFunction)

And it looks like this in our video platform example:

const batchGetCommentsByVideo = (ids) => {
  return ids.map((videoId) => {
    return databseORM.get('comments').where('comments.video_id = :videoId', { videoId }).getMany()
  });
};

const commentLoader = new DataLoader(batchGetCommentsByVideo);

I am using here a fake ORM because the exact syntax is not so important. You can pass any ORM or database connection inside the map, and it will work as we expect.

After this, if we use our data loader in our comment field resolver, the result will be incredible. Our backend won’t go to the database on every video to load the comments, but it will run the query once with all of the video ids to get all comments.

const resolvers = {
  Query: {
    comments(parent) {
      return commentLoader.load(parent.id);
    }
  }
}

It is pretty great, DataLoader gives a nice tool to make the GraphQL operations efficient. I just wish the ORM itself could give us a solution to this problem. We would not need any third-party library and would not need to worry about the integration. Fortunately, there is an ORM that does exactly what I want.

Prisma

Prisma helps app developers build faster and make fewer errors with an open source ORM for PostgreSQL, MySQL and SQLite.

Prisma is a great ORM that makes it super easy to connect our backend application to the database. It provides a typesafe solution to load data with Typescript, and it has a built-in solution for the N + 1 problem too. We don’t need to define our data loaders, our only job is to use the particular query methods, and in the background, Prisma handles all the batching stuff.

The Prisma Client dataloader automatically batches findUnique queries that ✔ occur in the same tick and ✔ have the same where and include parameters.

Let’s get back to our video platform example. Our comment field resolver looks like this with a simple Prisma query:

const Video = objectType({
  name: 'Video',
  definition(t) {
    t.nonNull.int('id')
    t.string('title')
       t.nonNull.list.nonNull.field('comments', {
      type: 'Comment',
      resolve: (parent, _, context) => {
        return context.prisma.comment.findMany({
             where: { videoId: parent.id || undefined }
         })
      },
    })
  },
})

This solution is not completely cool, it fires as much, as many videos the particular user has. It has the same problem, like any other general ORM. So instead of this solution, we need to combine the findUnique method with Prisma's fluent API to load the results in batches.

const Video = objectType({
  name: 'Video',
  definition(t) {
    t.nonNull.int('id')
    t.string('title')
    t.nonNull.list.nonNull.field('comments', {
      type: 'Comment',
      resolve: (parent, _, context) => {
        return context.prisma.video
          .findUnique({
            where: { id: parent.id || undefined },
          })
          .comments()
      },
    })
  },
})

And that’s it. In this way, Prisma automatically sends our comments query in batches to the database and reduces the data loading time, especially when the user has a lot of videos on our platform.

The N + 1 problem not only appears in field resolvers but can cause problems in the background code anywhere if we use ORM blindly. Prisma has great tools to help optimize these queries and make our backend code much more efficient. If you’re interested in how to make your database queries more efficient, you can read more about it here.

Summary

Data loading problems can cause a lot of slow operation on the back-end side, and if you want to build high-quality applications, you need to take care of this problem. Fortunately, nowadays great ORM solutions help us to make the database connection and database querying much easier. We don't need to know raw SQL queries, but this comfort makes us lazy, and if we don’t think, our backend application easily becomes a slow mess. The N + 1 problem is one of the huge red warnings that need to be solved before we deploy our super shiny application. As we saw in this article, some solutions help to write efficient batch queries, which end of the day, help to build great backend applications. So we don’t need to switch from GraphQL because at the top of all benefits we have handy solutions for the N + 1 downside as well.


If you found this article interesting I would love to hear from you in the comment section or feel free to reach out on Twitter! 👨‍💻