Before you can get into discussions about how to implement the technology behind email archiving, it's important to establish an email retention policy. Without it, it's impossible to make most of the technological decisions. For instance, the storage media you choose will depend largely on whether the archiving project is meant solely to reduce strain on existing storage units, or whether you also want to have it on hand for lawsuits or regulatory compliance audits.
Regulatory compliance and e-discovery are two of the biggest drivers for email archiving. In both cases, the archives need to be able to prove that the email in question wasn't altered -- including its metadata, such as send date. If you are looking at email archiving for those kinds of legal purposes, make sure you speak with your lawyers to understand your company's specific requirements and policies.
This first step isn't always easy, said Dick Benton, principal consultant at GlassHouse Technologies. Lawyers tend to look for shades of gray and keep decisions as broad as possible in order to cover all their bases, he said, but an email retention policy needs to specify exactly what gets stored and for how long.
The easiest way to ensure that is to keep everything forever. That's actually not a bad starting point, but it's important to make sure your IT and legal staff understand that it's only a temporary solution, said George Crump, founder and senior analyst at consulting
A good rule of thumb is to apply the same retention policies for electronic communication that already exist for paper documents, Crump said. Those policies differ from company to company, of course, and often depend on factors like the regulatory laws that govern each company's industry.
You should also make sure your other data lifecycle management policies don't short-circuit email retention policy, Benton said. For instance, if you decide to keep email archives for three years but your general backups go back seven years, you could end up liable for producing seven years of emails. This is especially important because more and more emails contain attachments, such as spreadsheets and Word documents, which users often save to their desktops. Those files can then end up, untracked, on backups; your decision to keep emails for only three years would be counteracted.
It's important to try to find those documents at the start of the archiving deployment, since documents created after your client starts archiving can be indexed and tracked as they're created. That's not necessarily easy, Benton said; copies can exist on the Exchange server, individual desktops and several backup disks or tapes. "Email's very much like cockroaches," Benton said. It's ubiquitous, pervasive and hard to track down, but to be fully covered, you have to try to find every corner it's hiding in.
Another question is whether the email retention policy will mandate capturing emails before or after they get to a user. That's also something for the company's legal department; some regulations require companies to keep a record of every message that's sent to or from the organization. If you don’t have that requirement and are primarily trying to save on storage space, it may be enough to archive whatever messages haven't been deleted after 30 days.
Because of the sheer scale of a full archiving project, it's not a bad idea to start with a specific subset of your client's data, Crump said. One CIO he spoke to had an enormous amount of directory information that was becoming unmanageable, while another company, a major chip manufacturer, had years' worth of old designs. They wanted to get the old and mostly unneeded data off their primary storage, but wanted it to be accessible in case they ever had to refer back to it. The same can be true of email; you could start the archiving deployment only in a certain department, for instance.
Taking into account the volume of data to be moved and the complexities of an email retention policy, it's a good idea to start simply and refine on the fly, Crump said. If you just start off by moving a roughly defined block of data to archives and revisit the policies regularly, the details tend to work themselves out, he said.
Beyond email archiving
Although email is one of the most common targets for archiving projects, practically any kind of data could benefit from archiving in principle. Email archiving is still the low-hanging fruit if a company hasn't implemented it yet or wants to revisit its email retention policy, but the next step is to capture documents. Companies often keep years' worth of Office documents -- Word documents, spreadsheets and PowerPoint presentations -- that can take up valuable storage space.
Companies are also starting to look at archiving other forms of communication. As employees use instant messaging and text messaging on handheld devices more, the risk of an IM or text containing legally significant information is rising. This puts companies under more pressure to record those messages for the same legal and regulatory reasons that email records are so important, but one problem with IMs and texts is that they're practically written in a foreign language, Benton said. So-called "l33t speak" (or "leet speak") can be difficult to analyse and index, he said.
But the Holy Grail of archiving will be databases. Companies typically have huge volumes of data in databases like those powering ERP systems, and much of it is outdated, Benton said. Keeping all of that data in the database requires maintaining increasingly beefier hardware to keep up, but taking it out isn't trivial.
Database vendors have come up with a partial solution in partitioning and federated databases which distribute a database across computers. But the real goal should be to remove the data altogether, yet have it readily available if it's needed. There is some software that tries to do that kind of archiving, but on the whole, database archiving is still a thing of the future for most companies, Benton said.