Keep your Enterprise Data Tidy (Part 3)

Those who are familiar with Marie Kondo know that she is a ruthless disposer. If you’ve read parts one and two of this series, you know that the process is more nuanced than just “throw it all away,” but we’ve come to the point in the process where it’s important to focus on discarding. If you haven’t read parts one and two of this series, please do so; they provide context for the content of this post. Armed with categories that work for your organization and a solid set of values that the data you keep must uphold to be useful to your business, this part of the process is primarily dedicating time to pruning your files and records, and documentation.

Data Lifecycle Policies

“The fact that you possess a surplus of things that you can’t bring yourself to discard doesn’t mean you are taking good care of them. In fact, it is quite the opposite.” It’s interesting to note that, while there are many book collectors who lament Kondo’s popularity and cry, “You can pry my books out of my cold, dead hands,” there aren’t many librarians who hold this sentiment. Professionals know that collections must be pruned and managed. In fact, your organization may have one or more policies about managing data and documents. At a minimum, data lifecycle policies cover three points of a document’s existence within an organization: creation or acquisition, use and storage, and disposition. These policies may be driven by the systems used to manage your documents (Microsoft SharePoint comes to mind) or they may be driven by government mandates. These should be your guide on what and when you discard. If your organization has these policies outlined clearly, the hard work is already done, and you can begin using parts one and two as your guide to systematically deleting unneeded data and documentation. It may also be that some of this lifecycle management functionality is encoded in your systems, but it’s important to understand the policies if you’re making the decisions about data disposition. If your organization does not have a data lifecycle policy, you can explore creating one while you work on becoming data centric.

Data Configuration Management

Outside of an overarching strategy or policy for managing your organization’s data and information, your organization may have various configuration management tools in place (e.g., Git or Subversion) to manage drafts and backups. Many large organizations use file sharing systems to govern who has privileges to directories and files. If you’re attempting to KonMari your files when such systems are in place, it will be necessary to work collaboratively to get access to the files in your control.

When do you actually discard???

One of the key ideas in Marie Kondo’s method is that when you discard, you only discard your own belongings. If you are the owner and CTO of a company, then you have the freedom to discard what no longer sparks joy. In a large company, that question of ownership is far more complex and possibly beyond the reader’s paygrade. It might be beyond the CEO’s paygrade. It is certainly beyond the paygrade of the writer, except with a select few files on a laptop and in a removable storage device used for backups. But the question of ownership can often be established by completing the work recommended in this series of blog posts. And once you’ve established ownership, even complex ownership, you can use metadata to describe ownership and provenance, making it easier to manage that data’s future state, discarded or otherwise.

Futureproofing your Data

Now that we’ve considered the end of the data lifecycle management picture, take a look at the start—data acquisition and creation. If you’ve done the work so far of identifying your business processes and assessed how well your data supports your goals and aligned to your data lifecycle management policy (formal or otherwise), you know how important it is to also consider the introduction of new data. We touched on this in the first two parts, but there’s a subtle difference between considering how data came to be in your collection and considering data that you will include in your collection from this point forward.

This is something you can specify with policy, and it’s something you can anticipate with a robust ontology. However, it’s not as simple as building robust metadata. An ontology that is carefully anchored to your organization’s processes, has sufficient input from the right subject matter experts, and is developed within a hospitable IT infrastructure, is far more likely to be a sound gatekeeper for your incoming data.

In the IT industry, this is referred to as Futureproofing, and is designed to minimize the need for down-stream development to make corrections to work you’re doing now. It’s often a judgment call as to whether the application or system is introducing too much technical debt, but there is no argument that being able to understand each piece of data that goes into your system is critical to avoiding such debt. The way to ensure your data will be understandable downstream is to have adequate metadata. If you want your data to be sophisticated and able to support complex information needs, you need to use semantics.

“The secret to maintaining an uncluttered room is to pursue ultimate simplicity in storage so that you can tell at a glance how much you have.” -Marie Kondo

Read Part 1: Does your Data Spark Joy?

Read Part 2: Setting the Stage for Success

Written by Meika Ungricht