Background
A robots.txt
file is an important tool in Search Engine Optimization (seo). The file allows you to help control crawler/bot access, assists in search engine treatment of duplicate content, improves crawl efficiency so that you're optimizing your crawl budget and more. Any public facing website should have one. Any non-public website probably want's one too to avoid indexing of access points leading to private pages.
In this article, we're going to look at a couple of ways to implement a robots.txt
file in a Django application. The obvious solution is a template-based approach. However, there's even simpler methods for basic robots.txt
files that can avoid having to render templates and keep the logic to a view. We'll explore both methods and weigh the benefits of each.
Alternatives
Before jumping in and creating your own solution, be aware that alternatives exist which might deliver what you're looking for out of the box. One such package is django-robots by Jazzband. This package allows you to get many common items into a robots.txt
file using settings defined in settings.py
. Since the values are defined in your settings module, no additional files or code are needed. Simply install the package, add the URL pattern, and define your settings.
If this package get's you up and running quickly, then great. No need to reinvent the wheel.
That said, managing a robots.txt
file internally is a simple process and it might be more efficient to handle the job yourself. Let's take a look at how to do that.
Set Up
Assume all logic will be nested in a "demo" app. URL patterns for each solution are relative to the project's root URL. The source code, if you'd like to follow along on your local machine, is available at this repo.
Template Approach
The first approach we'll look at is the template-based approach to serving robots.txt
files. First, we'll create a template file within our "demo" app that contains the text content we want rendered at the URL /robots.txt
. Note, the file itself won't be directly served. It isn't a static file. Instead, the contents of the template file will be rendered and served dynamically with a Django TemplateView
class based view.
Because the implementation of the TemplateView
is simple, we don't need to subclass it in a views.py
file. Instead, we can entirely structure the view directly in the URL pattern. Because TemplateView
has a default "content type" header value of "text/html," we want to explicitly set this to "text/plain" so that crawlers are able to process our file correctly.
# demo.urls.py
from django.urls import path
from django.views.generic import TemplateView
urlpatterns = [
path(
'robots.txt',
TemplateView.as_view(
template_name='demo/robots.txt',
content_type='text/plain'
)
)
]
In the template file, we'll add the rules we'd like to be rendered. For demonstration purposes, I'm using the example rules that Google uses with their guide.
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: https://www.example.com/sitemap.xml
That's it. Spool up a development server and you'll see your template file contents rendered at the desired location. Next, let's see how we can avoid the use of templates with a view-only approach.
View Only Approach
Considering the simple nature of a robots.txt file, rendering a template to accomplish this might be overkill. The file is only a few lines in total and is simple text. As an alternative, I like to use view functions to construct and deliver the response.
# demo.views.py
from django.http import HttpResponse
# noinspection PyUnusedLocal
def robots_txt_view(request):
txt_lines = [
'User-agent: Googlebot',
'Disallow: /nogooglebot/',
'',
'User-agent: *',
'Allow: /',
'',
'Sitemap: https://www.example.com/sitemap.xml'
]
content = '\n'.join(txt_lines)
return HttpResponse(content, content_type='text/plain')
The handful of lines I need are stored in a simple list of strings. When the request is made, the strings are concatenated and then delivered using HttpResponse
. Again, we're setting content_type
to "text/plain" because the value will default to "text/html" if left alone. We want to explicitly set this value so that crawlers don't have trouble parsing the file.
Last, I need to import the view into my urls.py
file and create the URL pattern.
# demo.urls.py
from django.urls import path
# from django.views.generic import TemplateView
from .views import robots_txt_view
# template-based solution
# urlpatterns = [
# path(
# 'robots.txt',
# TemplateView.as_view(
# template_name='demo/robots.txt',
# content_type='text/plain'
# )
# )
# ]
# view-only solution
urlpatterns = [
path('robots.txt', robots_txt_view),
]
I left the first approach in the file but commented out to demonstrate this is all occurring in the same location.
And with our URL pattern routing requests for /robots.txt
to our view function, we have a working solution.
Final Thoughts
It's important to provide a robots.txt file so crawlers and bots engage with your site as intended. There's multiple approaches that can be taken to server a robots.txt file. Here we looked at how to do so using a TemplateView class based view as well as using a simple view function that handles all logic.
Source code available at Github.