Unstructured data management has risen to the top of the heap of enterprise content concerns. Corporate information management arenas such as e-discovery, enterprise search, records management and information governance all require searchable, accessible content.
Unstructured data management has risen to the top of the heap of enterprise content concerns.
Yet just when companies get a handle on these issues, new challenges surface, thanks to mobile devices. To stay on the right side of regulators and to stay competitive, experts advise integrating structured and unstructured content -- and knowing why that content is collected in the first place.
Enterprises often collect content without a justification and have poor retention strategies, opening themselves up to legal exposure, according to Steve Weissman, minister of process and information betterment at Boston-based consultancy Holly Group. Whether the reason is to keep records for litigation purposes, create audit trails or build a prospect list, enterprises need to ask themselves why they are saving this content before investing in its retention, he said
Once the business case for the content is in place, companies can then start creating proverbial handles on the unstructured content, such as tags to include for automated business processes, Weissman said. "You have to think about applying or gluing on those handles yourself. . . . I often think about unstructured content as being part of the same equation that big data fits into, because it suggests a more holistic mindset toward managing content," he added.
Metadata adds structure
Even unstructured data has the beginnings of structure that enterprises can exploit, according to Weissman. For example, cell phone pictures often include location data if the geolocation feature is enabled on the phone. An insurance adjuster's photos uploaded to the system not only include the metadata surrounding the picture, such as the time, date and location, but also the user information during the upload, he said.
Even unstructured content has the beginnings of structure that enterprises can exploit.
"It's all about metadata. . . . That's what it boils down to, to really get the most value out of all this content," Weissman said. "That's why I kind of roll big data into it, even though big data databases are unstructured. We tend to think of them being outside the bounds of content management." ERP, CRM and HR (enterprise resource planning, customer relationship management and human resources) databases, for example, involve a lot of documents, but there needs to be a linkage for the documents to be retrievable. Email, which is unstructured particularly when stored on a local hard drive, also needs that linkage, and it does in the form of metadata, he added.
"When you really sit down to think about it, part of me wants to say there is no such thing as unstructurable data, but the trick is to sit down and think about it," Weissman said.
Technology adds a handle to grab unstructured content
Broadly, classification engines offer a way to add structure to unstructured data, according to Weissman. These can be used to catch hashtags and timestamps on tweets, for example, by monitoring Twitter feeds, setting up searches, capturing the content and classifying according to the business's taxonomy, he said.
Another way to get a handle on unstructured data is to run analyses on hard drives and mailboxes to discover what the average file size is, according to Leigh Isaacs, director of records and information governance at Washington, D.C.-based law firm Orrick, Herrington & Sutcliffe LLP and member of the advisory board for think tank Information Governance Initiative. "We set limits to reduce the size drastically," she said. This incentivizes users to use only hard drives as a staging area and to move files to the appropriate directories.
Setting policy to strike a balance
Yet technology is useless without a solid reason behind it, and policy helps set a baseline for user expectations. "To the extent that you can take that policy and automate that policy, the more [user] compliance you're going to get," Isaacs said.
Orrick, Herrington & Sutcliffe sets a policy for users to assist with records management. Instead of a Wild West file share repository, for example, the firm requires users to request top-level folders in the file share. This allows users the flexibility to manage their own files but brings a measure of control for the records management team, Isaacs said.
"One of the challenges is striking that balance between governing information … and the ability to allow people to work the way they want to work," Isaacs said. "We're seeing that when you try to push people into structured repositories, they're going to find another way to bypass the system -- then you've totally lost control."
That delicate balance is what will allow companies to tell stakeholders and clients thatdata is safe and avoid breaches. Clients are demanding secure information and auditing IT environments, and managed data is how to reassure them that their data is protected, according to Isaacs.
"It becomes a business development thing. If Orrick can't assure financial clients we have these measures in place, but [the competition] can, it becomes a point of whether we keep and retain work," she said.