Over-retained data

This has definitely gone over the retention period…


“Pretty Rotten Apples” by cogdogblog is licensed under CC BY 2.0

“Unstructured data over-retained from specific file kinds”

The expected outcome from this recipe is a list of folders in some kind of priority order (based on the amount of problematic data).

Additional scripting than the platform can provide is necessary in the last few steps. This recipe presumes you have an expert to hand who will be able to undertake that part of the process.

  1. Crawl your data sources and assign them to datasets appropriately.

  2. Think about what identifies the kind of file you are looking at. Are there particular headings, or particular phrases which appear in a template for that file?

    If there are no key identifiers, is there some kind of proxy measure for problems in folders? For example, you could use CreditCardCount to look at the number of occurrences of credit card numbers.

  3. Create an advanced search based on your criteria.

  4. Export the result set to a CSV file.

    Use a full export, not a summary.

    Remember to include items you want to include summary statistics for in your final report. For example, you might want to include the credit card count.

  5. The CSV export will contain an entry for each file. You’ll need to do some extra scripting work to group the files by directory to your satisfaction.

more See also: For more variations on this recipe, take a look at The juicy bits.