IAN WALDRON IAN WALDRON

File and Folder Concepts for Business

An approach to developing file and folder conventions allowing your business to stay organized.
August 4, 2023

Background 

Managing files, folders and other data well is a critical component to operating a successful business. We encounter, create, retrieve these objects daily. Perhaps in small quantities at first, but volume of files and data can grow quickly as your business picks up speed. I was surprised to learn that the file servers for our business, as small as it is, contained hundreds of thousands of resources and is growing fast.

In the beginning, I was certain the topic of file and folder organization was well understood and applied. While it took years for my system to evolve to what it is now, I thought everyone else was meanwhile operating with sophisticated, proprietary structures. I suspected that having not worked my way up the corporate ladder, instead opting to start the entrepreneurial journey younger in my years, I was on this outside of this propriety knowledge and best practice. Because of this insecurity, I've made it a habit to look at the file servers of businesses I've partnered with, or who was otherwise willing to share this information with me whenever I had the chance. I have been shocked on many occasions to learn how primitive the conventions of many organizations were if a convention was applied at all.

Through my informal research, I've found several recurrent themes. Many businesses operate with either a very flat architecture, where everything is stored within a small number of top-level folders, or at the other extreme a business would operate within a mess of nested folders that makes navigation difficult and inefficient. Furthermore, I've found that the names of the files themselves were often too long, too short, or contained inconsistent data making it difficult to locate or identify resources within a given directory. Last, I found that within the filenames, there's often a degree of cheating taking place. Rather than work with the file systems, users tend to use items like dashes, underscores, or numbers to control the order in which resources are displayed. While I use these in limited quantities (for example "_archive" is a common folder I like to float to the top), let's instead assume that the engineers who designed file systems and something in mind when they established how resources are displayed. Work with the file system and leverage its existing capabilities rather than fight it.

Poor file and folder schema isn't just a condition of small inexperienced businesses either. One of the biggest revelations in my career was in learning that there are very large businesses, those with tens of thousands of employees around the world, operating with primitive, if not mediocre, practices. As a side note, coming to this realization was instrumental in dealing with a degree of imposter syndrome that many young entrepreneurs experience. The world is far, far less sophisticated than the established attempts to communicate. Experience does not necessarily translate to ability. That said, while I've witnessed large organizations operating with unsophisticated structure, I haven't seen a small business operate with an especially impressive file and folder schema. If someone was doing a good job, it was a large entity.

A quick note before jumping in: This article deals with file and folder conventions as it relates to the user experience, not how file systems work themselves. File systems deal with how resources are stored/retrieved on a technical level whereas we're interested in what the user sees on their screen and how they engage with the displayed resources. For that reason, I'm using the term "folder" rather than the term "directory" to describe a container of resources such as files and other folders. These are similar concepts but not perfect substitutes.

Objectives

The first step to establishing an adequate file and folder convention is identifying your objectives. You need to have an idea of what you're looking to get out of your system before you can create or add to your system. Your specific use will have unique constraints, but there are common elements regardless. This article delves into what I look for in a file/folder convention as well as how I approach my specific industry, real estate private equity.

First, consider how your organization is structured. Your highest-level folders should reflect your organization’s layout. If there are distinct departments, these department names would be good candidates for top-level folders. From your top-level folders, subfolders can be utilized to break apart functions, roles, products, etc. It's at this level I often consider permissions. Top-level folders are great ways of controlling who has access to what resources within an organization.

Next, think about what types of data and what volume you'll need to handle. Low volume items may fit perfectly well within a subfolder. For example, a small business could have an "Employee" subfolder within their "Human Resources" top-level folders with the subfolder containing a folder for each employee. The relative path could look something like: Human Resources > Employee > Smith, John. If you're a larger entity, perhaps you further group your employees by region to make the information more manageable.

Note, careful using commas or other special characters as this can affect command line/terminal operations. Probably not a big deal for a small business. However, a large institution with hundreds of thousands of files may need to perform more complex lookups than what a file explorer/graphical user interface offers. In that case, I'd go with "Smith_John" instead of the above "Smith, John."

Last, I think about how I want to structure file names. Consider what type of information is relevant to identification and retrieval. Is there information we can exclude to keep names shorter while maintaining enough informative value and avoiding name collision? Is there redundant information that can be compressed with some other identifier like an id? 

Ultimately, you want to arrive at a structure where someone other than the party that created the data/file/folder can intuitively navigate to and retrieve the resources they need. How often have you been asked where a resource was on your file server? What time resources are being wasted by one party educating another on where they stored something? A well-structured system should make it clear what resources are where to a sufficiently relevant party. If you had to walk away with a single objective to implement, it would be this.

Components

Top-Level Folders

Given everything else will live under these folders, this will probably be where you'll start. If you're a new business, I wouldn't overthink this. Rather keep your hierarchy as simple as possible to keep maintenance low. Generally, my goal is to have a top-level folder for each department/function within the business. I do this for three specific reasons:

  • Specific Needs
  • Permissions Management
  • Portability

The first is obvious. Compartmentalize your organization so that you can individually respond to a department's unique constraints and needs.

Once this has been considered, my next focus is on permissions. If my folders are broken up by departments, then permissions are much simpler to maintain. Assuming employees of a given department would all have equal permissions, I can create permissions groups for each of the top-level folders and assign users to these groups accordingly. If you need more granular control, consider what subfolders would accommodate your needs. Try to keep this as flat as possible, meaning limit nested subfolders. The deeper into your folder tree permissions are assigned, the greater the difficulty in maintenance and with it an increased likelihood of mistakes.

Last is portability. That is, the ability to more a top-level folder to a different solution if necessary. Especially within the context of regulation, you may have different standards to uphold to (HIPAA, FTC rules, etc.). Certain standards may carry additional cost. You may wish to maintain different standards across your data if your resources are numerous. For example, you may not want store low-value data with a high-cost, secure solution. If your data is compartmentalized, it will be much simpler to move your high-value, sensitive data to a different solution as regulations change without having to incur the cost of maintaining the same standard for data that isn't as sensitive.

Here's a handful of top-level folders I tend to use for my real estate businesses:

  • Administration
  • Deal Flow
  • Investors
  • Marketing
  • Portfolio

For my real estate businesses, most activity relates to a property. With that in mind, we're doing most of our work within a location such as: Portfolio > Property ID.Property Name > Subfolder Name.

Subfolders

I've seen a lot of mistakes made with subfolders. Often, organizations either run a schema too shallow, or far too deep. Either way, you're adding cost from inefficient retrieval of resources. When folder structures are too flat, you're spending too much time scrolling and scanning for the right item. When folder structures are too deep, you're wasting time clicking in and out of folders in search. It's a hard balance to get right. As a rule of thumb, I won’t look to add a nested folder until I at least have ten or more constituent resources. Conversely, if my file explorer is the full height of my viewport and I still have to scroll to find a resource, then my structure is too flat. When I'm navigating, I don't want to have to scroll to determine if I need to continue deeper in the folder structure.  

Often when I'm adding subfolders purely due to reducing the number of resources in a parent folder, I use Year & Month folders. With the month folders, I add the two-digit numerical representation as a leading number for purposes of maintaining chronological ordering. Some people like to use a leading "0" for months 1-9. I tend not to, but that's personal preference. For example, I'll use a subfolder structure of " ~ > 2023 > 8 August" where others might use "~ > 2023 > 08 August."

In many circumstances I encounter, I use date related subfolders. But that’s due to the types of data I deal with. For your circumstances, consider location-based subfolders (states, etc.), or even letters. If I had thousands of employees, I could see the HR department storing an employee's folder within a folder labeled as a single letter for the corresponding starting letter of a name. File systems maintain alphabetical order so a directory containing single letter will present nicely. Consider the best, most logical way to group your data when breaking down larger sets to subgroups.

Another objective with subfolders is standardization. If I have a group of folders that are being use in a substantially similar manner, I may want to establish a standard for what the names of subfolders should be. I've seen recommendations of building out empty subfolders whenever you create one of these template parent folders. I don't follow this advice. I only add a subfolder when I have information to commit to the subfolder. I don't allow for empty folders in our system. The reason for this, I don't want employees wasting time clicking on empty folders. Your folder structure should communicate information about its content. If a subfolder exists, it communicates that subordinate information exists and likewise. The exception to this rule is when I'm relying on folders to be created programmatically and there's a high likelihood that contents will be added in the near term.

Instead of building out empty subfolders, consider making your desired standard a component to your employee handbook or other procedure manual. Alternatively, have an example folder set in a parent folder for reference. In my experience, employees tend to copy what's in the most recent folder anyways. Which may not be a bad approach as it allows your structure to evolve with time. If possible, let it evolve and don't retroactively standardize your subdirectories if avoidable.  

The last subfolder I want to mention is one where we really want the folder to float to the top. Try to use the file systems ordering wherever possible. Chronology is intuitive. Just because it makes sense to you in floating a folder to a given position doesn't mean it will make sense to another party. That said, there are a couple circumstances where I'll float a folder to the top. Most frequently, it's a folder like "_archive." For example, I tend to keep properties that are currently owned in a folder like "Portfolio" or "Properties." When a property is sold, the property is simply moved to "_archive."

Another folder I'll use, but substantially less frequently is "_general." This folder is equivalent to "Miscellaneous" or the like. I consider this bad practice and a bit lazy. It communicates that you really don't know what to do with a resource if it ends up in a folder like this. Try to find an appropriate location for a resource or perhaps reconsider whether it needs to be retained at all. Just because you receive a file doesn't mean you need to keep it. Especially early while building a business, I had tended to save everything. I feel less compelled to do so now that the volume of materials I encounter has increased on an order of magnitude. That said, storage is cheap so keep it if you need to.

If we need to make a folder float, we generally do so with a character like a dash or an underscore. Be careful not to use a character that your filesystem restricts. Even if a character isn't restricted, there are many characters that are advised to avoid. Consult the docs of your file system. Better yet, stick to dashes, underscores, numbers, and letters. I've also seen files be made to float with "00" or "AA" though I think this is aesthetically much less elegant than a leading underscore. But I suppose that's preference.

ID Numbers

Depending on your industry and specific use, numbers in folder and file naming could be handy in reducing name lengths while not decreasing informative value. Industries that rely heavily on this are the legal and professional services fields. Law offices often use a leading client number followed by a matter number. I've seen consultants use this same approach. For example, a client with an id of 123 and a matter/project of 456 may have files labeled as "123.456-Example File.pdf." This way, we can exclude the client and project name from the file name but still reference that information, if need be, without needed to traverse up the folder structure to learn this.

In my real estate businesses, I identify the following entities with IDs:

  • Properties
  • Tenants
  • Vendors

It’s important to note here that relationships exist between these entities. Specifically, in the SQL world we'd call this a Many-to-Many relationship. A tenant could exist on multiple properties and a property likely will have multiple tenants. With Vendors this is almost certainly the case as we re-use vendors across our portfolio. Following this convention, with a property with an ID of 123 and a tenant with an ID of 456, I may have a lease named 123.456-Office Lease.pdf. There would also be versioning information in there but let's ignore that until file names.

File Names

Just like subfolders, there's a balance that needs to be achieved with providing too much information with file names or not enough. We want file names to be informative but also easily readable. I don't want to have to scroll horizontally to read a file name if that can be avoided.

1. Avoid Redundancy

The first thing I look for in managing file name lengths is repetitive information. If I'm repeating the information of parent folders, a leading ID may be something to consider. I don't need to see the property name followed by the tenant’s name as leading information if I already know I'm in a tenant's folder for a given property. This is highly redundant and makes it quite likely the data I'm most interested in isn't visible in the file explorer.

2. Date Your Resource

I always include a date before the core file name. If there’s a leading ID, I place the ID first then immediately followed by the date. Importantly, the date must be numeric since this will affect ordering. I also don't use separators with dates because those often are reserved for other purposes within the file name. The most common set up I use is YYMMDD. For example, August 4, 2023 would be 230804. This is still very readable for me, and ordering will still be as expected in chronological order.

3. Use Appropriate Separators

When I write a file name, it is always in the format of "ID.Date-File Name-Version (if any).extension." Our standard is to use the dash. Other organizations prefer the underscore. I believe this falls to preference as well. I wouldn't, however, use a different separator than a dash or an underscore. By using a dash for the purpose of separating components alone, I can tell programmatically if I have a versioned document or not. If I split the file name at the dash. Versioned files will have three elements belonging to the file name whereas non-version files will only have two.

Keep to a standard like the above and you're able to utilize programmatic approaches, command line utilities, etc., to work with your resources. Once the volume of resources reaches a sufficiently large threshold, having this ability is vital for the efficient management of files.

4. Version Files

I use the standard "v#" and Final approach to versioning files. As mentioned above, this will follow a dash after the core file name. A draft item will begin with the postfix "-v1" whereas the final version will be marked "-Final." A nice characteristic of this approach is that your drafts, while still in chronological order, will appear beneath your file marked final. If you have a heavily versioned document, it's nice to have the execution/final copy sitting on type to avoid scrolling.

5. Consistency

Regardless of what components to the file name you include, be consistent so the position of an element is indicative of what type of information you're retrieving. That said, it's my opinion that the ordering I mention here, "[ID.]Date-File Name[-Version]" is the way to go due to it's effect on ordering. Engaging with the file system's native chronological flow is key to making resource retrieval intuitive for someone that's not yet acquainted with the resource. Minimize the training cost of using your standard. The closer to intuitive you can get, the lower the cost of employee training.

6. The Name Itself

I have the least to say about this topic since it's so subjective. A professional should know how to create meaningful names that contain relevant materials without being repetitive or too long. What I will say on the topic has to do with spaces. Depending on your file system and your company's practices, you may or may not want to use spaces in file names. So far, the only area in the convention I've presented where a space could even exist is in the core name. If these files may be accessed using a browser rather than a native file explorer, it may be best practices to use an underscore or a dash because it reduces the challenges associated with encoding/decoding.

You'll notice in your url bar that there aren't any spaces. That's because they're not allowed. Even if you are using a browser-based solution, like a browser intranet, to access resources, it still may be possible to use spaces as long as encoding/decoding is handled by the given application. You'll need to consult the docs for the system you're using to determine that. Your system just needs to take into consideration encoding/decoding. Personally, I use spaces.

Wrapping Up

Looking back at the content I've shared here, I'm surprised by how much I had to share on the topic. And I've only scratched the surface. One thing not mentioned here is that folder structures can also be used to manage workflows. Rather than group by nature, you can group by process. The idea is that you can build in a degree of guidance to the folder structure. I prefer anything to procedural to exist in a manual or handbook, so I don't personally utilize this approach. But it is a concept that's out there in the world so if you find that intriguing, perhaps research that topic further.

I've shared with you a close representation of the structure I use across my businesses. Not a perfect copy, as I've tried to generalize as much as I could to keep the information more broadly applicable. But it's close.

Finally, almost nothing here is original thought. The components mentioned here have almost all been used by some other organization in some other way. I just put it altogether in my search for a better way. Take what you like and continue that process.