Technology

Next Article Suggestion in Django

Considerations in developing a content suggestion engine in Django.

August 27, 2023

#django #python

Background

You have a content platform (built with Django) and you want to increase engagement by providing suggestions for what the reader should read next after they're through with a given article. How do we decide which article to promote? I ran into this problem while developing this project. The solution seemed obvious at first, build upon logical grouping and move chronologically through material. But after digging deeper, I realized there was a bit more nuance to the issue.

What's discussed here is my first pass at the problem. As your project evolves, I would imagine your recommendation engine will follow course and I expect that to be the case here as well. But to begin, I want to keep the method or function as simple as possible. Furthermore, I'm not going to be too concerned about efficiency in my operations since this project is new and heave traffic isn't anticipated. I just want something simple and reliable from which to iterate upon.

For purposes of this discussion, assume we have the following Django model:


from django.db import models

class Article(models.Model):
    title = models.CharField(max_length=100)
    category = models.ForeignKey('article.Category', on_delete=models.PROTECT)
    tags = models.ManyToManyField('article.Tag')
    content = models.TextField(blank=True, null=True)

Here we have a model for an article post comprised of a title, content, and two model relationships: one to the model Category and a ManyToMany relationship to the model Tag. Realistically, I would use a third-party package for managing tags rather than build out the logic internally. Also, a blog/article would contain several other fields such as published by, published date, meta information, etc.

Determine Relevance

Above all else, I want the recommended content to be as relevant to the reader as possible. However, what markers determine relevance and in what priority? Is recency a greater determiner than topic, or the other way around? Are topics established at just one level, like category, or do you have multiple dimensions. For example, articles could be grouped by a category and then further refined by one or more tags as is the case with this project. But that introduces an additional problem: tags may span categories. Which should have a higher precedence, tags or categories? Or better, should categories and tags be used as dependent conditions where articles with a tag must belong to a given category?

The above are some of the questions I considered when starting to develop an approach. More simplistically, these two questions may aid in the development of your approach: why are my visitors here and how did they get here? In a sense, that's to say know your target audience and build accordingly. I want this to stand in contrast to, perhaps, a more prevalent concept: know your audience. In my opinion that's a problem for advertisers where the user is the product. If you're producing content, the content is the product. I place greater importance on the reason for visiting, rather than the who.

For my project, a wide variety of topics is touched upon, so I want to be careful to suggest an article within the same category as the referring article. My categories are simply too different. Furthermore, I expect most traffic to be organic at first. Most visitors, perhaps nearly all, won't know who I am from Adam. I expect the topic to be the reason for their visit and not that they believe me, specifically, to have anything useful to say. Since the topic is what I believe their reason to be for visiting, I want to stay within these bounds in recommendations. Of course, you could recommend across categories elsewhere, like a trending articles sidebar, or something of the like. But as for a 'next' recommendation, I'm going to keep it narrow at this time.

Consider the following method:


class Article(models.Model):
    ...

    def get_next_suggestion(self):
        return Article.objects.filter(category=self.category).last()

This method will give us the most recent article of a given category. It would be a good idea to throw 'ordering' in the mix, publication date for example might be a useful means of ordering, so that way we're retrieving objects in some expected manner. I'll touch on ordering in the next section.

For my project, I'm going to exclude tags from consideration when grouping. I don't have a ton of content right now so I don't want to limit the pool of choices. When this project expands, filtering on tags may be more appropriate.

Something worth mentioning, this method will always return an instance. That's because 'self' isn't excluded from the queryset. Since the originating/referring article of a given category will also be present in a queryset, you stand the risk of suggesting the same article. If the article is the most recently produced article, this is guaranteed. That may not be desirable so let's update the method.


# we know what class this belongs to already so I'm just showing the method

def get_next_suggestion(self):
    return Article.objects.filter(category=self.category).exclude(pk=self.pk).last()

Now, we'll return the most recent article other than self. If the referring article is the most recent publication, then you'll see the next most recent. Note, since ordering hasn't been specified, ordering by default will be on the primary key.

Also, this method can return None. You may want to consider handling this at the method level, but I'm not going to. I may place logic in the template to handle None, but this would mean that you only have one article for a given category which isn't a likely situation. If you're launching a new category, it would be prudent to have a handful or articles ready to go for that category. Just one article may not be desirable user experience. So, I'm going to assume that there will never be just one article to a category and not handle that situation.

Ordering

Once we've determined the most logical grouping of content to prioritize relevance, we then need to determine which item in the group to select for the suggestion. I found this to be the most difficult issue to solve gracefully. You could simply suggest articles chronologically. Doing so may be the least difficult approach to implement. However, you probably want to prioritize your most recently published content over older content. In that case, we'd want to return sequentially newer material within the pool.

Let’s add the field "publication_date" to the model and update the method to provide suggestions for anything newer than the referring article.


class Article(models.Model):
    ...
    publication_date = models.DateTimeField(blank=True, null=True)

    def get_next_suggestion(self):
        return Article.objects.filter(category=self.category,
                                      publication_date__gt=self.publication_date).exclude(pk=self.pk).first()

First, we've added a filter to exclude everything of equal age or older using Django's "greater than" Field Lookup. Next, we switched out the last() method for the first() method because we want to show the very next article rather than the most recently published. This way, we work through the articles by order of recency.

Now that we're considering the publication_date in our selection rather than using the primary key as the determiner of recency, it would be appropriate to apply ordering. I'm also going to break apart the filters into two steps. I'll build on my reasoning for that shortly.


def get_next_suggestion(self): 
    # first, let's retrieve the 'eligible' articles that belong to the category but doesn't include self 
    # add ordering by publication date 
    qset = Article.objects.filter(category=self.category).exclude(pk=self.pk).order_by("publication_date") 
    if qset: 
        # at least one article was found 
        return qset.filter(publication_date__gt=self.publication_date).first() 
    return None

A good start to producing suggestions within a category and prioritizing more and more recent content. However, this doesn't handle the situation where you begin with the most recent article and need to suggest backwards. In this case, I might recommend based on popularity if that data is available to me. I might track how often an article is viewed and then select one at random from the most viewed so many from the pool. However, I want to be careful not to create a "winner take all" model with my own content where once an article becomes popular, that becomes recursive.

That said, I'm going to assume I don't have the ability to prioritize based on views and I'm working with the previously defined models alone. Instead, I'm going to take a random choice using the python module random. Furthermore, I want to limit the number of random options for the selection to the most recent 25 choices just in case my options are quite large. But remember, negative indexing isn't available on Django querysets and our ordering isn't set up to accommodate this. We'll need to reverse the ordering.


import random

def get_next_suggestion(self): 
    # first, let's retrieve the 'eligible' articles that belong to the category but doesn't include self 
    # add ordering by publication date 
    qset = Article.objects.filter(category=self.category).exclude(pk=self.pk).order_by("publication_date") 
    if qset: 
        # at least one article was found 
        next_article = qset.filter(publication_date__gt=self.publication_date).first()
        if not next_article:
            # there isn't anything newer, so lets take a random selection from recent articles
            return random.choices(qset.order_by("-publication_date")[:25])[0]. # choices returns a list, so take the first item (indexed at 0)
        return next_article
    return None

We can eliminate the re-ordering that's occurring in the above method if we start out with the most recent items in the upstream statement "qset = Article.objects.filter(category=self.category).exclude(pk=self.pk).order_by("-publication_date")" and change the first() method to the last() method for the evaluation of "next_article". That will accomplish the same thing and save a step. Probably not too costly to begin with, but cleaner the better.


import random

def get_next_suggestion(self): 
    # first, let's retrieve the 'eligible' articles that belong to the category but doesn't include self 
    # add ordering by publication date 
    qset = Article.objects.filter(category=self.category).exclude(pk=self.pk).order_by("-publication_date") 
    if qset: 
        # at least one article was found 
        next_article = qset.filter(publication_date__gt=self.publication_date).last()
        if not next_article:
            # there isn't anything newer, so lets take a random selection from recent articles
            return random.choices(qset[:25])[0]  # choices returns a list, so take the first item (indexed at 0)
        return next_article
    return None

For a simple recommendation engine, I think we're in a good place. You could make this infinitely complex, continuing to add bias or filters on other dimensions. But for my purposes, this is good enough.

Caching

I'm not concerned about the performance of the above method for the scale of my project. Traffic is low and posts aren't large in volume. But, if this was to change, I might consider retaining the result of the suggestion method as a model field. For example:


class Article(models.Model):
    ...
    next_suggestion = models.ForeignKey("self", blank=True, null=True)  # 'self' indicating the relationship is an article to an article

The method, as defined above, should return the same result every time it's evaluated if there aren’t any changes to the data stored in your database. But let's assume that changes are being regularly committed. I would consider how often changes are being made and refresh the cached value based on this turnover.


from django.utils import timezone

class Article(models.Model):
    ...
    next_suggestion = models.ForeignKey("self", blank=True, null=True)
    last_called = models.DateTimeField(blank=True, null=True)

    def get_next_suggestion(self):
        if next_suggestion and last_called:
            now = timezone.now()
            if last_called <= (timezone.now() - timezone.timedelta(1)):
                # the method was called less than one day ago so lets use the stored value
                return next_suggestion
            else:
                # the suggestion is stale, re-evaluate
                ...

If the suggestion is older than one day, the method will make a fresh calculation. Don't forget to save the result and timestamp to the article.

Other Considerations

There may be situations where a evaluated suggestion isn't desirable and a more fixed solution would better fit your needs. For example, you have a series of articles that are related, and you want "Article Part 2" to be the suggestion following "Article Part 1." In that case you could organize a group of articles with a foreign key relationship to a hypothetical model "Series." Then, provide a suggestion based on the referring article's relative position in the series. The suggestion method would first check if the article belongs to a series and provide the suggestions accordingly.

Alternatively, you may want to simply fix the suggestion. You could add an additional field "static_suggestion" where the suggestion method will always return this article, if present. If there isn't a static_suggestion, the method would continue along in its evaluation.

You could use both methods if your project calls for it. I'm going to use neither at this point. I don't want to overengineer the solution and for my purposes this may be overkill. But I plan on implementing series in the future and will likely build something similar when that time comes.

Final Thoughts

Providing a suggestion as to what the reader should move onto next is a great way to increase engagement with your content and project. But we want to make sure that the material is as relevant as possible to the reader to increase the likelihood they'll follow the link. I've provided examples above on my approach in determining relevance. If any of the above makes sense for your project, take what you will, but be sure to customize the method for your particular needs.