IAN WALDRON IAN WALDRON

File Naming Convention Ideas

A handful of concepts I use to efficiently store and retrieve resources on our file servers.
August 11, 2023

Introduction

Businesses big or small need to have a file naming convention that's effective, easy to use and remember. Most importantly businesses need one that's used consistently throughout. Saving files organically may seem harmless when volume is low. But as the quantity of resources on your file system expands, you'll begin to feel the pain from clutter and inefficiency in identifying and retrieving resources. Developing a convention early can save you a lot of headaches down the road.

So much in business can be built along the way. While having a business plan is critical to having vision and direction, the reality is we rarely follow the path we first plot. But one thing that should be established from the onset is an approach to file storage. Imagine retroactively updating files to conform to a new naming convention. This would be quite labor intensive if you're able to complete such a project at all. I've also found that files named inconsistently with a new standard are likely to be passed over entirely when you're retrieving a resource. If information isn't being portrayed in a consistent, and sufficiently (but not overly) detailed way, your eyes when scanning many file names will simply pass over the unrecognizable.

For the reasons above, and others, establish a file convention early and stick to it. I'll share with you here a few concepts that I use in my businesses to stay organized and allow for our resources to be stored and retrieved efficiently. If you're interested in learning about my comprehensive approach to file storage, including top-level folders, subfolders, among other considerations, check out this article.

In this discussion, we'll be only focused on naming files.

Components

Your file names should contain the least amount of information necessary to properly identity a resource. The components you incorporate may be specific to your use case, but I've found a few elements to be universal across applications: ID/Timestamp, Core Label, Versioning. These are the only three elements my businesses' procedures utilize.

ID

Working from left to right across the file name, the component you're likely to encounter first with my convention is an identifier. I use identifiers primarily when there is a unique entity involved where I'm likely to have many discrete data. Rather than repeat the entity's full label (vendor name, etc.), I'll assign an ID to use instead. For example, if I'm using the Vendor "Joe the Plumber" on a regular basis, repeating this name every time I save a document, such as an invoice or proposal, to my file server would be heavily redundant and take up precious space within the file name.

Our objective is to keep file names as short as possible to make the most pertinent information stand out. Instead of labeling something like "Joe the Plumber-Invoice for Work on August 11," I can use an ID, let’s say 55, so the name would be shortened to "55-Invoice for Work on August 11."

Suppose now there are additional entities I'd like to represent with IDs for a given file. Perhaps I'd like to show this invoice not only relates to "Joe the Plumber" but also a specific property with an ID of 27 and an address of 1234 Main Street. In this case, I chain the IDs beginning with the higher-order relationship and working downwards. From a database perspective, if the direct relationship is Vendor (Joe the Plumber) to property with ID 27, then I'll traverse these relationships in the inverse order. The above example would then be 27.55-Invoice for Work on August 11. You can see, my chosen separator for chaining IDs is the period. More on this in the below section "Separators."

Timestamp

When providing a date timestamp, I use the format YYMMDD. For example, August 11, 2023 would be 230811. Combining the invoice example presented for the ID component, we would express the ID-Timestamp combination as 27.55.230811-Invoice. Note, I'm not using any separators with the date. The reason is two-fold. First, I reserve separators for use with the discrete components of the file name. This allows for conveniently extracting a given component programmatically. The second is that the character most would use with a date value is the slash. And, in Unix based systems as well as Microsoft OS, this is a reserved character. See the following resources for more information on reserved characters:

My files won't always have an ID but I always use a timestamp. Your file system will often track a couple date values in its meta data, date created, date modified, etc. However I don't rely on these values and elect to use an explicit value in the filename. One reason, many but not all file systems track the creation date, which is the date I'm most interested in. Some Linux file systems only track the date accessed, modified, and changed (the difference between 'modified' and 'changed' for those curious is the former refers to the contents of a file whereas the latter refers to attributes like permission, name, etc., ~ furthermore I believe the present tense 'modify' and 'change' are used by Linux in place of the past participles I used), excluding the creation date.

Another reason is you may affect these dates in ways that aren't expected with certain file operations. For example, let's create a file to demonstrate how creation times can be affected. I'm using macOS/Terminal, which uses commands like Linux. I'll create a file named 'example_file.txt' on my Desktop and then display the creation date.


cd ~/Desktop
touch example_file.txt
stat -f "%SB" -t"%Y-%m-%d %H:%M:%S" example_file.txt
# output
2023-08-11 09:17:57

Now, I'll copy the file over to my Documents folder and check the timestamp there.


cp example_file.txt ~/Documents
cd ~/Documents
stat -f "%SB" -t"%Y-%m-%d %H:%M:%S" example_file.txt
# output
2023-08-11 09:18:45

You can see that I now have a new creation date for the file after copying it over. Please note, there are ways to preserve this date. For example, performing the same operation with the "-p" flag will preserve a file's meta data.


cp -p example_file.txt ~/Documents && cd ~/Documents
stat -f "%SB" -t"%Y-%m-%d %H:%M:%S" example_file.txt
# output
2023-08-11 09:17:57

Our creation timestamp remains the same. Additionally, if I'm using the command "mv" instead of "cp" to move rather than copy, this will also preserve meta data. But the point I'm trying to show here is that if you're not careful, you can end up with information inconsistent with the user's intent. Including the timestamp as an element to the filename means I'll always have the information I want unless the information is deliberately altered.

Core File Name

The words you include in a file name indicate to someone searching or retrieving a resource the purpose of the file. The objective is to include the least amount of information possible that conveys sufficient context to make retrieval possible. Whereas the prior components ID and Timestamp answer the questions "who" and "when," the core name answers the "what" question.

Something to consider, do you want to allow spaces? The above example included spaces in the core file name. However, some approaches call for using a separator in place of the space, commonly the underscore.

Probably the most common reason against spaces that's still relevant, when you're working with files in terminal, spaces separate your arguments. If your file names include spaces, the shell will interpret words in your file name as arguments and the operation will fail.


mv example file.txt ~/Documents
# not allowed

Not a catastrophic limitation as the fix is simply to wrap the file name in quotes. For example:


mv "example_file.txt" ~/Documents
# allowed

So basically, you've added a couple keystrokes in adding quotes when you want to perform this, and similar operations. Probably not the end of the world. But depending on your volume, one approach versus the other may be less cumbersome.

Another situation may be where you're using a browser-based intranet to work with your files. A browsers url doesn't allow for spaces. For the url to accept this character, it must be encoded as "%20" when passed to the browser, and then decoded on the other side of the operation. That said, a well-developed system will likely expect the existence spaces and handle the encoding/decoding natively. But that doesn't have to be the case, especially with legacy systems, so check your docs.

And speaking of legacy systems, another common reason I see underscores used in place of spaces is that whoever is setting the standard is older. There was a time when spaces weren't allowed in file names. Old habits die hard.

Personally, I use spaces. It's my opinion that the core file name component is most readable by human eyes when the words are spaced, rather than separated by underscores or some other separator. Ultimately, our systems are to aid the human user. So, it makes greater sense to me in accommodating the human, rather than the system, in establishing the standard.

Versioning

A files 'version' is the final component I'll include in the name and it's an optional component at that. I use the format "v#" when versioning files.  Since versioning is only something that's done when you have two or more version instances, I begin with "v2" when a second version is created. I don't start off a file with a "v1" postfix. If another version never comes along, I wasted space in the file name. Furthermore, having a "v1" alone may communicate to the user that other versions exist but aren't accounted for when there’s just the one file. We want to communicate the most meaningful information in the file name. Adding a "v1" right out of the gate is anticipatory and not relevant to the first instance.

That said, I do retroactively add a "v1" once a second version is created. The reasoning for this is I want the final version to sit on top. Whether I use a "final" postfix or some other moniker, the draft versions beginning with "v" will be subordinate to the final and appear in the order most appropriate so long as the leading character is of lower precedence than "v." For example, see how the following items are ordered with this approach:

  • 230811-Office Lease-Final
  • 230811-Office Lease-v1
  • 230811-Office Lease-v2

In practice, I'll probably create a "_working" subfolder to contain draft versions and other supporting materials to keep the parent folder neater. Whether you choose to do so or not, using the above standard will ensure that there is chronology to your versioning with the final version sitting on top.

Separators

Separators are used to distribute a file name's discrete components making these data more easily identifiable. By combining this standard with the one's mentioned above, I know what component I'm working with positionally. Let's take the invoice example used above: 27.55.230811-Invoice. I know everything contained in the first element pertains to identifiers whereas the second element is the core name. If I found there are three elements instead of two, then I know the file has been versioned. For example, I could easily access the date information using python:


# PYTHON

# represent the file name as a simple string for demonstration purposes
file_name = "27.55.230811-Invoice"
fn_components = file_name.split("-")
len(fn_components)  # 'len' will tell me how many elements have been split apart
# output
2
# only two elements so we know this hasn't been versioned

# access the id/date component
fn_components[0]  # remember, python uses a 'zero' index
# output
27.55.230811

# the date is what we want, so lets refine this a bit further
fn_components[0].split(".")[-1]  # the date is always the last element
# output
230811

# this will work even if the date is the only element
"230811".split(".")[-1]
# output 
230811

# last, lets clean up this string representation as a date value
from datetime import datetime
datetime.strptime(fn_components[0].split(".")[-1], "%y%m%d")  # remember to pass the format of the date
# output
datetime.date(2023, 8, 11)
# now we have a date value that can be operated on: comparisons, etc.

Many aren't going to be manipulating file names programmatically. But placing specific types of data into discrete elements separated by an allowed character like a dash will make it considerably easier for the user to retrieve files. This way, you can easily scan for data in predictable locations. You aren't forced to visually process each component to extract a specific piece. When you're searching through large numbers of files, using separators consistently will make your life easier.

Last, any allowed character could be used as a separator. For Unix systems, the only reserved character is the forward slash. Microsoft also excludes the slash along with a few more. That said, most separators I come across are in the form of dashes or underscores. Personally, I use the dash to separate the components of the file name. I then further break apart the ID/Date section with decimals. Others will use the underscore. Choose anything you'd like so long as its not a reserved character and apply your standard consistently.

Ordering

The last main consideration deals with ordering. I've chosen the elements to be in the position they are due to how file systems natively handle ordering. I want my standard to work with a file system rather than fight it. Some will use symbols or characters to force certain files to float to various positions. For example, "_important file.txt" will float to the top due to its leading underscore.

Use this sparingly and avoid it altogether if possible. When you apply exceptions to the standard ordering, you're expecting any other user accessing those resources to expect the same presentation. This may not always be the case. But most users will be able to intuitively work with the native numerical ordering. Since ascending/descending is what the user will expect, this approach has the least training cost as well as maintenance cost. You won't necessarily have to educate your users on your approach.

Keep in mind how resources are ordered. The ordering operation is taking place not on the visual representation of the character, but the computer representation. Generally, ASCII format. So, it's important to know where the computer representations lie in magnitude. Generally, you can expect order to work like this: Symbols < Numbers < Upper Case Letters < Lower Case Letters. That is, when sorted ascending, you can expect files beginning with symbols to appear at the top, followed by numbers, then uppercase letters, then lower case letters. If sorted descending, reverse this.

Here's an example of how files appear when sorted ascending:

  • _example file.txt
  • 00 example file.txt
  • Example file.txt
  • example file.txt

Now let's break down the original example and examine how ordering will be affected: 27.55.230811-Invoice.

First, the two ID components, one representing the building and the other a vendor, will enforce a grouping effect. Resources will first be grouped by the building (27), and then grouped again by vendor (55) as the sorting mechanism traverses the file name left to right. Next, the date will be encountered and resources are then sorted chronologically. Next the core name is sorted alphabetically. And finally, if there is versioning, versions will be displayed in chronological orders: v1, v2, so on and so forth. 

Final Thoughts

When establishing a naming convention consider your use, what types of data you're likely to encounter, etc. The objective is to represent a file with the least amount of information needed to properly inform a user retrieving resources. Replace redundant pieces of information, like names/entities with ID numbers to further shorten your file name and version your files as needed. Consider how the pieces of information you choose to include in your file names will affect the display ordering of the files and position the data within the file name accordingly.

Above all else, don't overcomplicate it. Create a standard that's easy to communicate, implement, and enforce. Something scalable but not so complex it's not relevant at your beginnings. Something you'll stick to.