Introduction
Businesses big and small benefit from having a file naming convention that's effective, easy to use and remember. Saving files organically may seem harmless when volume is low. But as the quantity of resources on your file system grows, you'll begin to feel the pain from clutter and from the difficulty in retrieving records.
While much in business can be built along the way, having some form of a procedure on how to name and store files will save headache down the road. Imagine retroactively updating files to conform to a new naming convention. This would be quite labor intensive if you're able to complete such a project at all.
For efficient resource retrieval and maintenance I recommend that a file naming convention be adopted early and that you stick to it. I'll share with you here a few approaches that I use in my businesses to stay organized.
If you're interested in learning about my comprehensive approach to file storage, including top-level folders, subfolders, among other considerations, check out File and Folder Concepts for Business. In the discussion here, we'll focus on file names.
Components
Your file names should contain the least amount of information necessary to identity a resource. The particular components you incorporate will be specific to your use case, but I've found a few elements to be universal across applications: ID/Timestamp, Core Label, Versioning. And generally, in that order.
ID
Working from left to right across the file name, the component you encounter first with my convention is an ID. I use ID's primarily when I have many resources that are related to a particular entity. Rather than repeat the entity's full label (vendor name, etc.), I'll assign an ID to use in place of its name. For example, if I'm using the vendor "Joe the Plumber" on a regular basis, repeating this name every time I save a related document, such as an invoice or proposal, would be redundant and take up space within the file name.
Our objective is to keep file names as short as possible so that pertinent information stands out. Instead of labeling something like "Joe the Plumber-Invoice for Work on August 11," I can use an ID, let's say "55", so the name would be shortened to "55-Invoice for Work on August 11."
Suppose now there are additional entities I'd like to represent with IDs for a given file. Perhaps this invoice not only relates to "Joe the Plumber" but also a specific property with an ID of "27" and an address of "1234 Main Street." In this case, I chain the IDs beginning with the higher-order relationship and working downwards. From a database perspective, if the direct relationship is vendor "Joe the Plumber" to property "1234 Main Street" with ID "27", then I'll traverse these relationships in the inverse order. The above example would then be "27.55-Invoice for Work on August 11." You can see that my chosen separator for chaining IDs is the period. More on this in the below section "Separators."
Timestamp
When providing a timestamp, I use the format YYMMDD. For example, "August 11, 2023" would be "230811." Combining the invoice example presented for the ID component, we would express the ID-Timestamp combination as "27.55.230811-Invoice." Note, I'm not using any separators with the date. The reason is two-fold. First, I reserve separators for discrete components to the file name. This allows for conveniently extracting a given component programmatically. The second reason is that the character most often used with a date value is the slash. And, in Unix based systems as well as Microsoft OS, this is a reserved character. See the following resources for more information on reserved characters:
I use timestamps often when naming resources. While your file system will likely track a couple date values in its meta data (date created, date modified, etc.), I don't rely on these values and elect instead to use an explicit value.
One reason, not all file systems track the creation date, which is the date I'm most interested in. Some Linux file systems only track the date accessed, modified, and changed (the difference between "modified" and "changed" for those curious is the former refers to the contents of a file whereas the latter refers to attributes like permission, name, etc.), excluding the creation date.
Another reason is you may affect these dates in ways that aren't expected with certain file operations. For example, let's create a file to demonstrate how creation times can be affected. I'm using macOS/Terminal, which uses commands like Linux. I'll create a file named 'example_file.txt' on my Desktop and then display the creation date.
cd ~/Desktop
touch example_file.txt
stat -f "%SB" -t"%Y-%m-%d %H:%M:%S" example_file.txt
# output
2023-08-11 09:17:57
Now, I'll copy the file over to my Documents folder and check the timestamp there.
cp example_file.txt ~/Documents
cd ~/Documents
stat -f "%SB" -t"%Y-%m-%d %H:%M:%S" example_file.txt
# output
2023-08-11 09:18:45
You can see that I now have a new creation date for the file after copying it over. Please note, there are ways to preserve this date. For example, performing the same operation with the "-p" flag will preserve a file's meta data.
cp -p example_file.txt ~/Documents && cd ~/Documents
stat -f "%SB" -t"%Y-%m-%d %H:%M:%S" example_file.txt
# output
2023-08-11 09:17:57
Our creation timestamp remains the same. Additionally, if I'm using the command "mv" instead of "cp" to move rather than copy, this will also preserve meta data. But the point I'm trying to show here is that if you're not careful, you can end up with inconsistent data and lose the original timestamp. By explicitly including the timestamp in the filename I'll always have the information I want unless the information is manually altered.
Core File Name
The words you include in a file name indicate to someone searching or retrieving a resource the purpose of the file. The objective is to include the least amount of information possible that conveys enough context to make retrieval possible. Whereas the prior components ID and Timestamp answer the questions "who" and "when," the core name answers the "what" question.
Something to consider: do you want to allow spaces? The above example included spaces in the core file name. However, some approaches call for using a separator instead of a space (commonly an underscore).
A common argument not to use spaces that's still relevant: when you're working with files in terminal, spaces separate your arguments. If your file names includes spaces, the shell will interpret words in your file name as arguments and the operation will fail.
mv example file.txt ~/Documents
# not allowed
Not a catastrophic problem considering you simply need to wrap the file name in quotes.
mv "example_file.txt" ~/Documents
# allowed
So basically, you've added a couple keystrokes in adding quotes when you want to perform this, and similar operations. Probably not the end of the world. But depending on your volume, one approach versus the other may be less cumbersome.
Another situation may be where you're using a browser-based intranet to work with your files. A URL doesn't allow for spaces. A "space" character must be encoded as "%20" in the URL when passed to the browser, and then decoded on the other side of the operation. That said, a well-developed system will likely expect the existence spaces and handle the encoding/decoding elegantly. But that doesn't have to be the case, especially with legacy systems, so check your docs.
And speaking of legacy systems, another common reason I see underscores used in place of spaces is that whoever is setting the standard is older. There was a time when spaces weren't allowed in file names. Old habits die hard.
Personally, I use spaces. It's my opinion that the core file name component should be structured so that it's most readable by human eyes. For me personally, spaces are easier to read than underscores or other characters.
Versioning
A file's "version" is the final component I'll include in the name and it's optional. I use the format "v#" when versioning files. Since versioning is only something that's done when you have two or more instances of the same resources, I begin versioning when a second instance is created.
I don't start off a file with a "v1" postfix. If another version never comes along, I wasted space in the file name. Furthermore, having a "v1" alone may communicate to the user that other versions exist but aren't accounted for when there’s just the one file. We want to communicate the most meaningful information in the file name. Adding a "v1" right out of the gate is anticipatory and not relevant to the first instance.
That said, I do retroactively add a "v1" once a second version is created. The reasoning is I want the final version to sit on top. When I use "Final" as the version, the draft versions beginning with "v" will be subordinate to the final. This is because the leading character "F" has greater precedence than "v." For example, see how the following items are ordered with this approach:
- 230811-Office Lease-Final
- 230811-Office Lease-v1
- 230811-Office Lease-v2
In practice, I'll probably create a "_working" subfolder to contain draft versions and other supporting materials to keep the parent folder neat and organized. Whether you choose to do so or not, using the above standard will ensure that your versioning appears chronologically with the final version sitting on top.
Separators
Separators are used to define boundaries for a file name's discrete components making these data more easily identifiable. By combining this standard with the one's mentioned above, I know what component I'm working with based on position.
Let's take the invoice example used above: "27.55.230811-Invoice." I know everything contained in the first element pertains to identifiers whereas the second element is the core name. If three elements are discovered instead of two, then I know the file has been versioned. For example, I could easily access the date information using python:
# represent the file name as a simple string for demonstration purposes
file_name = "27.55.230811-Invoice"
fn_components = file_name.split("-")
len(fn_components) # 'len' will tell me how many elements have been split apart
# output
2
# only two elements so we know this hasn't been versioned
# access the id/date component
fn_components[0] # remember, python uses a 'zero' index
# output
27.55.230811
# the date is what we want, so lets refine this a bit further
fn_components[0].split(".")[-1] # the date is always the last element
# output
230811
# this will work even if the date is the only element
"230811".split(".")[-1]
# output
230811
# last, lets clean up this string representation as a date value
from datetime import datetime
datetime.strptime(fn_components[0].split(".")[-1], "%y%m%d") # remember to pass the format of the date
# output
datetime.date(2023, 8, 11)
# now we have a date value that can be operated on: comparisons, etc.
Many aren't going to be manipulating file names programmatically. But placing specific types of data into discrete elements separated by an appropriate character like a dash will make it considerably easier for the user to retrieve files. This way, you can easily scan for data in predictable locations. You aren't forced to visually process each component to extract a specific piece. When you're scanning through large numbers of files, using consistently applied separators will make your life easier.
Last, any allowed character could be used as a separator. For Unix systems, the only reserved character is the forward slash. Microsoft also excludes the slash along with a few more. That said, most separators I come across are in the form of dashes or underscores. Personally, I use the dash to separate the components of the file name. I then further break apart the ID/Date section with decimals. Choose anything you'd like so long as its not a reserved character and apply your standard consistently.
Ordering
The last main consideration deals with ordering. I've chosen the elements to be in the position they are due to how file systems natively handle ordering. I want my standard to work with a file system rather than fight it. Some will use symbols or characters to force certain files to float to various positions. For example, "_important file.txt" will float to the top due to its leading underscore.
Use this sparingly and avoid it altogether if possible. When you apply exceptions to the standard ordering, you're expecting any other user accessing those resources to expect the same presentation. This may not always be the case.
On the other hand, most users will be able to intuitively work with the native numerical ordering. Since ascending/descending is what the user will expect, this approach has the least training cost as well as maintenance cost. You won't necessarily have to educate your users on your approach.
Keep in mind how resources are ordered. The ordering operation is taking place not on the visual representation of the character, but the computer representation (generally ASCII). The system often orders resources like this: Symbols < Numbers < Upper Case Letters < Lower Case Letters. That is, when sorted ascending, you can expect files beginning with symbols to appear at the top, followed by numbers, then uppercase letters, then lower case letters. If sorted descending, reverse this.
Here's an example of how files appear when sorted ascending:
- _example file.txt
- 00 example file.txt
- Example file.txt
- example file.txt
Now let's break down the original example and examine how ordering will be affected: "27.55.230811-Invoice."
First, the two ID components, one representing the building and the other a vendor, will enforce a grouping effect. Resources will first be grouped by the building ("27"), and then grouped again by vendor ("55") as the sorting mechanism traverses the file name left to right. Next, the date will be encountered and resources are then sorted chronologically. Next the core name is sorted alphabetically. And finally, if there is versioning, versions will be displayed in chronological orders: v1, v2, so on and so forth.
Final Thoughts
When establishing a naming convention, consider your specific needs, what types of data you're likely to encounter, etc. The objective is to represent a file with the least amount of information needed to properly inform a user. Replace redundant pieces of information, like names/entities with ID's to further shorten your file name and version your files as needed. Consider how the pieces of information you choose to include in your file names will affect the ordering of the files and position the data within the file name accordingly.
Above all else, don't overcomplicate it. Create a simple standard that's easy to communicate, implement, and enforce. Choose something scalable but not so complex it isn't relevant at your beginnings. Choose something you'll stick to.