Article

Migrate from next-sitemap to the Next.js App Directory's sitemap

This post walks through the process of migrating from the next-sitemap library to the Next.js App Directory's sitemap.

Transitioning from next-sitemap to Next.js's Built-in Sitemap.xml and Robots.txt APIs

In this post, I'll walk you through the process of migrating from the next-sitemap npm package to using Next.js's built-in API for generating sitemap.xml and robots.txt files. This transition not only simplifies our codebase but also leverages the capabilities of Next.js to enhance our SEO strategy.

Why Migrate?

The next-sitemap npm package was a great tool for generating sitemaps, but as Next.js has evolved, it has introduced built-in support for sitemap and robots file generation. This change allows for a more streamlined approach, reducing dependencies and improving maintainability.

Admittedly, I have also been a bit of a laggard in updating my site to the new app directory - this is actually the very first use of App Router on this site!

What is a sitemap?

A sitemap is a file that lists all the URLs on your site, helping search engines discover and index your content. It provides a roadmap for search engine crawlers to navigate your site and understand its structure. They're also used by other services like Google Search Console to understand your site's structure. If you haven't used Google Search Console, it's an invaluable resource for measuring your site's search engine performance.

I've written at length about the value of SEO, including the SEO tools I used to grow my sites to 20k+ visitors/month.

What is a robots.txt file?

The robots.txt file is a configuration file used to instruct web crawlers (like Googlebot) on how to crawl your site. It allows you to specify which pages should be indexed and which should be excluded, helping control the visibility of your content to search engines. In this case, we're using robots.txt to stop search engines from indexing URLs that we don't want to show up in search results, like Next.js API routes, and other pages that are generated dynamically.

Removing the next-sitemap dependency

The first step is to remove the next-sitemap dependency from your project:

  1. remove next-sitemap from your package.json file - make sure to check both "scripts" and "devDependencies", and remove it from both places.
  2. run npm install (or yarn install or pnpm install) to update your lockfile
  3. delete your next-sitemap.config.js file if it exists
  4. Important: If you have sitemap.xml, sitemap-0.xml, etc. files in the /public directory of your repo, delete any that exist. Next won't generate a sitemap file if these are already there - and it can be tricky to debug!

Side note: next-sitemap was a fantastic tool, and served me for quite a long time. I'm super thankful to the maintainers for their work on the package - it filled an important need for a very long time!

Generating a sitemap.xml file with Next.js

Next.js has a great built-in API for generating sitemaps and robots files. We'll use this API to generate our sitemap.xml and robots.txt files. The API is simple to use and allows for customization of the sitemap and robots file generation process.

There are a couple choices for creating a sitemap with Next.js:

  1. Write and maintain the XML for your sitemap yourself - this may be okay for really small sites, but you will need to add an entry to this file for every page you want search engines to know about. This quickly becomes cumbersome for sites with dynamic content, or even just a lot of pages.
  2. Use Next.js's built-in API to generate the sitemap.xml file - I am going to opt for the built-in API because my site has hundreds of pages published, and I don't want to have to maintain a sitemap file with an entry for every single page.

Configuring the sitemap.xml file

Create a new file in the App directory called sitemap.ts:

1
import type { MetadataRoute } from 'next';
2
3
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
4
return [
5
// TODO: one entry for every page you want search engines to know about
6
];
7
}

This file is where you will add an entry for every page you want search engines to know about. The goal is to return an array of objects of the following type:

1
type SitemapFile = Array<{
2
url: string;
3
lastModified?: string | Date;
4
changeFrequency?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
5
priority?: number;
6
alternates?: {
7
languages?: Languages<string>;
8
};
9
}>;

A couple things to note here, from the sitemap spec:

  1. The url property is required and should be the full URL to the page you want to include in the sitemap.
  2. The changeFrequency property is optional and can be one of the following values: always, hourly, daily, weekly, monthly, yearly, or never. This is a hint to search engines about how often the page is likely to change. Google and other search engines may choose to use different values than you specify if it becomes clear that your page changes at a different rate than you specify.
  3. The priority property is "The priority of this URL relative to other URLs on your site." It is an optional property and can be any number between 0 and 1. If you don't include it, it will default to 0.5 for every page. I recommend setting this to 1 for your homepage, and cascading it down to 0.5 for other pages, depending on their relative importance to your site. It may not actually influence search engine behavior - and it's not a guarantee that a page will rank higher just because it has a higher priority - but it is a useful hint for search engines.
  4. alternates.languages is a feature of Next.js that allows you to specify alternate versions of your page in different languages. This is useful if you have a blog post that is published in multiple languages, and you want to include all of the versions in your sitemap. I only publish in English (en este momento!), so I haven't tested this yet, and it isn't included in this tutorial.

Populating your sitemap.ts file

Static pages

For static pages on my site, I opted for a semi-manual approach to adding them to the sitemap. I created an array of strings that represent the routes I want to include in the sitemap, and then mapped over that array to create the sitemap entries.

1
// Define your static routes
2
const routes: string[] = [
3
'', // home page
4
'/about',
5
'/integrity',
6
'/newsletter',
7
'/podcast',
8
'/posts',
9
'/tags',
10
'/work',
11
'/shop',
12
];
13
14
// Create sitemap entries for static routes
15
const staticRoutesSitemap = routes.map((route) => ({
16
url: `${baseUrl}${route}`,
17
lastModified: new Date(),
18
changeFrequency: 'weekly' as const,
19
priority: route === '' ? 1 : 0.8,
20
}));

When run, this will generate an array of sitemap entries for each of the routes in the routes array - note that the homepage has a priority of 1, and all other pages have a priority of 0.8.

Dynamic pages

For pages that are generated dynamically, or hosted by your CMS, the process isn't too different. You will need to fetch the URL of each page you want to include in the sitemap, and then create a sitemap entry for each one. In my case, I have 3 different sources of dynamic content:

  1. Newsletters: these are dispatches of my newsletter 💌 Tiny Improvements, which I send out every week.
  2. Blog posts: these are the blog posts I publish on this site (you're reading one right now!).
  3. Tags: These are metadata tags I use to categorize my blog posts and newsletters issues. I generate a list of all tags on the site, and then map over that list to create a sitemap entry for each one.

I write all my site content in MDX, and save them as files in my site's repo under the src/data/ directory.

I used a utility function to grab all of the newsletter issues, blog posts, and tags from the src/data/ directory, and then mapped over each one to create a sitemap entry. For newsletters, that looks like this:

1
// this is a helper function used across my site.
2
// replace with whatever is needed to fetch from the CMS you use
3
const newsletters = await getAllNewsletters();
4
5
// create a sitemap entry for each newsletter
6
const newslettersSitemap = newsletters.map((newsletter) => ({
7
url: `${baseUrl}/newsletter/${newsletter.slug}`,
8
lastModified: new Date(newsletter.frontmatter.date),
9
changeFrequency: 'weekly' as const,
10
priority: 0.8,
11
}));

The same pattern is used for blog posts and tags.

Sitemap.ts: The complete file

1
import type { MetadataRoute } from 'next';
2
import { getAllPosts } from '@lib/blog';
3
import { getAllNewsletters } from '@lib/newsletters';
4
import { getAllTags } from '@lib/tags';
5
6
import { BASE_SITE_URL } from '@/config';
7
8
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
9
const baseUrl = BASE_SITE_URL;
10
11
const newsletters = await getAllNewsletters();
12
const newslettersSitemap = newsletters.map((newsletter) => ({
13
url: `${baseUrl}/newsletter/${newsletter.slug}`,
14
lastModified: new Date(newsletter.frontmatter.date),
15
changeFrequency: 'weekly' as const,
16
priority: 0.8,
17
}));
18
19
const posts = await getAllPosts();
20
const postsSitemap = posts.map((post) => ({
21
url: `${baseUrl}/blog/${post.slug}`,
22
lastModified: new Date(post.frontmatter.date),
23
changeFrequency: 'weekly' as const,
24
priority: 0.8,
25
}));
26
27
const { allTags } = await getAllTags();
28
const tagsSitemap = allTags.map((tag) => ({
29
url: `${baseUrl}/tags/${tag}`,
30
lastModified: new Date(),
31
changeFrequency: 'weekly' as const,
32
priority: 0.5,
33
}));
34
35
// Define your static routes
36
const routes: string[] = [
37
'', // home page
38
'/about',
39
'/integrity',
40
'/newsletter',
41
'/podcast',
42
'/posts',
43
'/tags',
44
'/work',
45
'/shop',
46
];
47
48
// Create sitemap entries for static routes
49
const staticRoutesSitemap = routes.map((route) => ({
50
url: `${baseUrl}${route}`,
51
lastModified: new Date(),
52
changeFrequency: 'weekly' as const,
53
priority: route === '' ? 1 : 0.8,
54
}));
55
56
// combine all the sitemap entries into a single array
57
return [
58
...staticRoutesSitemap,
59
...newslettersSitemap,
60
...postsSitemap,
61
...tagsSitemap,
62
];
63
}

Testing your sitemap

Fire up next dev and navigate to /sitemap.xml - you should see your sitemap! Make double and triple sure that your sitemap has an entry for every single page you want search engines to know about. You may also want to check that it updates correctly by adding a new page and checking that it appears in the sitemap.

Creating a robots.txt file

Next up, robots.txt. This is a very similar process - we will follow the docs for robots.txt from Next.js.

This is a deceptively important file for your website's SEO. It is a really good idea to familiarize yourself with the robots.txt file format, and to understand the meaning of each rule you set.

Disallow everything for non-production environments

**🚨 Important: ** If you are running a non-production environment, you will need to add a check to your robots.ts file to disallow all requests. This is because non-production environments often have URLs that you don't want search engines to index - it would be bad news if Google decided that a deploy preview of your site should rank higher than your production site!

Robots.ts: The complete file

Much like static pages for the sitemap, you will need to add an entry for every path you don't want search engines to index. In my case, I have a reverse proxy in front of my site that handles analytics and logging with PostHog, and I don't want search engines to index those URLs. You'll see that handled below with noIndexPaths.

1
import type { MetadataRoute } from 'next';
2
3
import { BASE_SITE_URL } from '@/config';
4
5
const noIndexPaths = [
6
'/ingest', // posthog's reverse proxy
7
'/ingest/*', // posthog's reverse proxy
8
];
9
10
export default function robots(): MetadataRoute.Robots {
11
// 🚨 IMPORTANT: if this is not a production environment, disallow all requests
12
if (
13
// Vercel-specific environment variable. Please check the docs for your hosting provider!
14
env.VERCEL_ENV !== 'production' ||
15
// for a generic node environment
16
process.env.NODE_ENV !== 'production'
17
) {
18
return {
19
rules: [
20
{
21
userAgent: '*',
22
disallow: '*',
23
},
24
],
25
};
26
}
27
28
return {
29
rules: [
30
{
31
userAgent: '*',
32
disallow: '/api/', // Next.js API routes
33
},
34
{
35
userAgent: '*',
36
disallow: '/_next/', // Next.js build output
37
},
38
{
39
userAgent: '*',
40
disallow: '/public/', // static files like css, images, fonts. This one's up to you!
41
},
42
...noIndexPaths.map((path) => ({
43
userAgent: '*',
44
disallow: path,
45
})),
46
],
47
sitemap: `${BASE_SITE_URL}/sitemap.xml`,
48
};
49
}

A few important callouts here: BASE_SITE_URL is a variable I set in my site's config, and it is the full URL to the root of my site (https://mikebifulco.com). You will need to replace it with the actual URL of your site.

Deployment checklist

Now that everything's done, make absolutely sure you've got everything configured correctly. Here's a quick checklist I used to verify that my new sitemap and robots.txt were set up correctly:

1
**Before going live**, verify that:
2
- robots.txt
3
- [ ] visit `/robots.txt` and verify that it exists
4
- [ ] check that there is an entry for your sitemap that contains a full URL, including the protocol (`https://`)
5
- [ ] read through the disallow rules to make sure they make sense for your site. If you're using next.js, this will include `/_next/` at a minimum, and possibly `api/` and `public/` if don't want those directories indexed.
6
- sitemap.xml
7
- [ ] visit `/sitemap.xml` and verify that it exists
8
- check that it contains the correct entries for all:
9
- [ ] static pages
10
- [ ] dynamic pages (newsletters, blog posts, tags)
11
12
**While testing deploy previews of your site**, verify that:
13
- [ ] your sitemap is being generated correctly
14
- [ ] your robots.txt is being generated so that it **disallows all requests** in deploy previews
15
16
**After going live**, check that:
17
- [ ] your sitemap is being picked up and read by [Google Search Console](https://search.google.com/search-console)
18
- [ ] use something like the [Ahrefs Webmaster Tools Site Audit](https://ahrefs.com/signup?plan=awt&return=website-checker) to scan your site and verify that it is indexed correctly and that there aren't any issues.

And that's it! You've successfully migrated from the next-sitemap library to Next.js's built-in sitemap and robots.txt APIs.

See the code changes I made to my site

If you're curious to see the code changes I made to my site to implement this, you can view the diff on GitHub.

Hero
Migrate from next-sitemap to the Next.js App Directory's sitemap

This post walks through the process of migrating from the next-sitemap library to the Next.js App Directory's sitemap.

next.jsseo
***
Mike Bifulco headshot

💌 Tiny Improvements Newsletter

Subscribe and join 🔥 947 other builders

My weekly newsletter for product builders. It's a single, tiny idea to help you build better products.

    Once a week, straight from me to you. 😘 Unsubscribe anytime.


    Get in touch to → Sponsor Tiny Improvements