Background
Managing files and folders well is a critical component to operating a successful business. We encounter, create, retrieve these objects daily. Perhaps in small quantities at first, but volume of files and data can grow quickly as your business picks up speed. The file servers for our business, as small as it is, contained hundreds of thousands of resources and growing fast.
When I started out in business, I was surprised to learn good organizational practices weren't as widely adopted as I might have guessed. While it took years for my system to evolve to what it is now, I assumed others were up and running with efficient, well thought out systems. I have been shocked on many occasions to see how primitive the conventions of many organizations were, if there was an established procedure at all.
Over the years, I've noticed a couple of themes. First, businesses tend to operate with either a very flat architecture, where everything is stored within a small number of top-level folders, or at the other extreme operating within a mess of nested folders that makes navigation difficult and inefficient. Furthermore, I've found that the names of the files themselves were often too long, too short, or contained inconsistent data making it difficult to locate or identify resources within a given directory. Last, I found that within the filenames, there's often a degree of cheating taking place. Rather than work with the file systems, users tend to use items like dashes, underscores, or numbers to control the order in which resources are displayed.
Poor file and folder schema isn't just a condition of small inexperienced businesses either. One of the biggest revelations in my career was in learning that there are very large businesses, those with tens of thousands of employees around the world, operating with mediocre practices. That said, while I've witnessed large organizations operating with unsophisticated structures, I haven't seen a small business operate with an especially impressive file and folder schema. If someone was doing a good job, it was a large entity.
Objectives
The first step to establishing an adequate file and folder convention is identifying your objectives. You need to have an idea of what you're looking to accomplish with your system before you can create or add to it. Your specific use will have unique constraints, but there are common elements regardless.
First, consider how your organization is structured. Your highest-level folders should reflect your organization’s layout. If there are distinct departments, these department names would be good candidates for top-level folders. It's at this level I often consider permissions. Top-level folders are great ways of controlling who has access to what resources within an organization. From your top-level folders, subfolders can be utilized to break apart functions, roles, products, etc.
Next, think about what types of data and what volume you'll need to handle. Low volume items may fit perfectly well within a subfolder. For example, a small business could have an "Employee" subfolder within their "Human Resources" top-level folders with the subfolder containing a folder for each employee. The relative path could look something like: "Human Resources > Employee > Smith, John." If you're a larger entity, perhaps you further group your employees by region to make the information more manageable.
Note, be careful using commas or other special characters as this can affect command line/terminal operations. Probably not a big deal for a small business. However, a large institution with millions of files may need to perform more complex lookups than what a file explorer/graphical user interface offers. In that case, I'd go with "Smith_John" instead of the above "Smith, John."
Last, think about how you want to structure file names. Consider what type of information is relevant to identification and retrieval. Is there information that can be excluded to keep names shorter while maintaining enough informative value and avoiding name collision? Is there redundant information that can be compressed with some other identifier like an id?
Ultimately, you want to arrive at a structure where someone other than the party that created the data/file/folder can intuitively navigate to and retrieve the resources they need. How often have you been asked where a resource was on your file server? What time resources are being wasted by one party educating another on where they stored something? A well-structured system should make it clear where resources are to a relevant party. If you had to walk away with a single objective to implement, it would be this.
Components
Top-Level Folders
Given everything else will live under these folders, this will probably be where you'll start. If you're a new business, I wouldn't overthink this. Rather keep your hierarchy as simple as possible to keep maintenance low. Generally, my goal is to have a top-level folder for each department/function within the business. I do this for three reasons:
- Specific Needs
- Permissions Management
- Portability
The first is obvious. Compartmentalize your organization so that you can individually respond to a department's unique constraints and needs.
Once this has been considered, my next focus is on permissions. If my folders are broken up by departments, then permissions are much simpler to maintain. Assuming employees of a given department would all have equal permissions, I can create permissions groups for each of the top-level folders and assign users to these groups accordingly.
If you need more granular control, consider what subfolders would accommodate your needs. Try to keep this as flat as possible, meaning limit nested subfolders. The deeper into your folder tree permissions are assigned, the greater the difficulty in maintenance and an increased likelihood of mistakes.
Last is portability. That is, the ability to move a top-level folder to a different solution if necessary. Especially within the context of regulation, you may have different standards to uphold to (HIPAA, FTC rules, etc.). Certain standards may carry additional cost. You may wish to maintain different standards across your data if your resources are numerous. For example, you may not want store low-value data with a high-cost, secure solution. If your data is compartmentalized, it will be much simpler to move your high-value, sensitive data to a different solution as regulations change without having to incur the cost of maintaining the same standard for data that isn't as sensitive.
Here's a handful of top-level folders I tend to use for my real estate businesses:
- Administration
- Deal Flow
- Investors
- Marketing
- Portfolio
For my real estate businesses, most activity relates to a property. With that in mind, we're doing most of our work within a location such as: "Portfolio > Property Name."
Subfolders
I've seen a lot of mistakes made with subfolders. Often, organizations either run a schema too shallow, or far too deep. Either way, you're adding cost from inefficient retrieval of resources.
When folder structures are too flat, you're spending too much time scrolling and scanning for the right item. When folder structures are too deep, you're wasting time clicking in and out of folders in search. It's a hard balance to get right.
As a rule of thumb, I won't look to add a nested folder until I at least have ten or more child resources. Conversely, if my file explorer is the full height of my viewport and I still have to scroll to find a resource, then my structure may be too flat. When I'm navigating, I don't want to have to scroll to determine if I need to continue deeper in the folder structure.
Often when I'm adding subfolders purely due to reducing the number of resources in a parent folder, I use Year & Month folders. With the month folders, I add a two-digit numerical representation as a prefix for purposes of maintaining chronological ordering. Some people like to use a leading "0" for months 1-9. For example, a subfolder structure to group large data could be "2023 > 08 August."
In many of the circumstances I encounter, I use date related subfolders. But that's due to the types of data I deal with. For your circumstances, consider location-based subfolders (states, etc.), or even letters. If I had thousands of employees, I could see the HR department storing an employee's folder within a folder labeled as a single letter for the corresponding starting character of a name.
Another objective with subfolders is standardization. If I have a group of folders that are being use in a substantially similar manner, I may want to establish a standard for what the names of subfolders should be.
I've seen recommendations of building out empty subfolders whenever you create one of these template parent folders. I don't follow this advice. I only add a subfolder when I have information to commit to the subfolder. I don't allow for empty folders in our system. The reason for this, I don't want employees wasting time clicking on empty folders. Your folder structure should communicate something about its contents.
Instead of building out empty subfolders, consider making your desired standard a component to your employee handbook or other procedure manual. Alternatively, you could have an example folder set in a parent folder for reference. In my experience, employees tend to copy what's in the most recent folder. Which may not be a bad approach as it allows your structure to evolve with time. If possible, let it evolve and don't retroactively standardize your subdirectories if avoidable.
The last subfolder issue I want to mention is one where we need a particular folder to float to the top. Try to use the file systems ordering wherever possible. Chronology is intuitive. Just because it makes sense to you to float a folder doesn't mean it will make sense to another party.
That said, there's a couple circumstances where I'll float a folder to the top. Most frequently, it's a folder like "_archive." For example, I tend to keep properties that are currently owned in a folder like "Portfolio" or "Properties." When a property is sold, it's simply moved to "_archive" so we don't clutter active resources that need to be accessed more frequently.
Another folder I'll use but less frequently is "_general." This folder is equivalent to "Miscellaneous" or the like. But I consider this bad practice and a bit lazy. It communicates that you really don't know what to do with a resource if it ends up in a folder like this. Try to find an appropriate location for a resource or perhaps reconsider whether it needs to be retained at all.
Just because you receive a file doesn't mean you need to keep it. Especially early while building a business, I had tended to save everything. I feel less compelled to do so now that the volume of materials I encounter has increased on an order of magnitude. That said, storage is cheap so keep it if you need to.
If we need to make a folder float, we generally do so with a leading character like an underscore. Be careful not to use a character that your filesystem restricts. Even if a character isn't restricted, there are many characters that are advised to avoid. Consult the docs of your file system. Better yet, stick to dashes, underscores, numbers, and letters. I've also seen files be made to float with "00" or "AA" though I think this is aesthetically much less elegant than a leading underscore.
ID Numbers
Depending on your industry and specific use, numbers in folder and file naming could be handy in reducing name lengths while still communicating necessary information. Industries that rely heavily on this are the legal and professional services fields. Law offices often use a leading client number followed by a matter number. I've seen consultants use this same approach. For example, a client with an id of 123 and a matter/project of 456 may have files labeled as "123.456-Example File.pdf." This way, we can exclude the client and project name from the file name but still reference that information, if need be, without needing to traverse up the folder structure to learn this.
In my real estate businesses, I identify the following entities with IDs:
- Properties
- Tenants
- Vendors
It's important to note here that relationships exist between these entities. A tenant could exist on multiple properties and a property likely will have multiple tenants. With Vendors this is almost certainly the case as we re-use vendors across our portfolio. Following this convention, with a property with an ID of 123 and a tenant with an ID of 456, I may have a lease named 123.456-Office Lease.pdf. There would also be versioning information in there but let's ignore that until file names.
File Names
Just like subfolders, there's a balance that needs to be achieved with providing too much information with file names or not enough. We want file names to be informative but also easily readable. I don't want to have to scroll horizontally to read a long file name if that can be avoided.
1. Avoid Redundancy
The first thing I look for in managing file name lengths is repetitive information. If I'm repeating the information in parent folders, a leading ID may be something to consider. I don't need to see the property name followed by the tenant's name as leading information if I already know I'm in a tenant's folder for a given property. This is highly redundant and might make it so the data I'm most interested in isn't visible in the file explorer without scrolling right.
2. Date Your Resource
I always include a date before the core file name. If there's a leading ID, I place the ID first then immediately followed by the date. Importantly, the date must be numeric since this will affect ordering. I also don't use separators with dates because those often are reserved for other purposes within the file name. The most common set up I use is YYMMDD. For example, "August 4, 2023" would be 230804. This is still very readable for me, and ordering will still be as expected in chronological order.
3. Use Appropriate Separators
When I write a file name, it is always in the format of "ID.Date-File Name-Version (if any).extension." Our standard is to use the dash. Other organizations prefer the underscore. I believe this falls to preference as well. I wouldn't, however, use a different separator than a dash or an underscore. By using a dash for the purpose of separating components alone, I can tell programmatically if I have a versioned document or not. If I split the file name at the dash, versioned files will have three elements belonging to the file name whereas non-versioned files will only have two.
Keep to a standard like the above and you'll be able to utilize programmatic approaches, command line utilities, etc., to work with your resources. Once the volume of resources reaches a sufficiently large threshold, having this ability is vital for the efficient management of files.
4. Version Files
I use the standard "v#" and Final approach to versioning files. As mentioned above, this will follow a dash after the core file name. A draft item will begin with the postfix "-v1" whereas the final version will be marked "-vFinal." A nice side effect of this approach is that your drafts, while still in chronological order, will appear beneath your file marked final. If you have a heavily versioned document, it's nice to have the execution/final copy sitting on top to avoid scrolling.
5. Consistency
Regardless of what components you include in the file name, be consistent so the position of an element is indicative of what type of information you're retrieving. My preference is "[ID.]Date-File Name[-Version]." Find something that works for you and stick to it.
But keep in mind that engaging with the file system's native chronological flow is key to making resource retrieval intuitive for someone that's not acquainted with the resource. Minimize the training cost of using your standard. The closer to intuitive you can get, the lower the training cost.
6. The Name Itself
I have the least to say about this topic since it's so subjective. Any professional should know how to create meaningful names that contain relevant materials without being repetitive or too verbose. What I will say on the topic has to do with spaces.
Depending on your file system and your company's practices, you may or may not want to use spaces in file names. So far, this is only area in the convention I've presented where a space could even exist. If these files may be accessed using a browser rather than a native file explorer, it may be best practices to use an underscore or a dash because it reduces the challenges associated with encoding/decoding.
You'll notice in your URL bar that there aren't any spaces. That's because they're not allowed. Even if you are using a browser-based solution, like a browser-based intranet, to access resources, it still may be possible to use spaces as long as encoding/decoding is handled by the given application. You'll need to consult the docs for the system you're using to determine that. Your system just needs to take into consideration encoding/decoding. Personally, I use spaces.
Wrapping Up
Looking back at the content I've shared here, I'm surprised by how much I had to share on the topic. And I've only scratched the surface. One thing not mentioned here is that folder structures can also be used to manage workflows.
Rather than group by nature, you can group by process. The idea is that you can build in a degree of guidance to the folder structure. I prefer anything procedural to exist in a company manual or handbook, so I don't personally utilize this approach. But it is a concept that's out there in the world so if you find that intriguing, perhaps research that topic further.
I've shared with you a close representation of the structure I use across my businesses. Not a perfect copy, as I've tried to generalize as much as I could to keep the information broadly applicable. But it's close.
Finally, almost nothing here is original thought. The components mentioned here have likely all been used by some other organization in some other way. I just put it together in my efforts searching for a better way. Take what you like and continue that process forwards.
Further Reading
If you found this discussion interesting, check out my article that dives into file naming logic: File Naming Convention Ideas