PII out of place
Knowledge is knowing a tomato is a fruit. Wisdom is knowing it has no place in a fruit salad
The expected outcome from this recipe is a list of files in unsafe locations that contain PII or SPI.
Various Kinds of PII (Personally Identifiable Information) and SPI (Sensitive Personal Information) are already picked out by the NOW Privacy platform by default. If you want to detect other kinds of PII for which you have a format that you can express as a regular expression, you can perform that task as a custom rule.
Crawl your data sources.
You can then set up an advanced search. The PII search terms in NOW Privacy, and any that are defined using custom rules, are accompanied by counts. It is important to note that no form of PII is completely bulletproof, so you may need to periodically adjust where you set the threshold of People Name Count to give you the level of sensitivity compared to false positive risk that you are comfortable with. That is to say, add salt and pepper to taste!
You may want to detect various forms of PII or SPI.
Some points of interest
Different forms of PII are subject to different levels of over- or under-detectability in some contexts.
Names, obviously, come with complications such as identifying a person rather than a popular brand built by a person.
Credit card numbers are not simply 16-digit strings of numbers, but actually have a mathematical check and other constraints. The problem is that they are matched by certain German phone numbers by chance, and the IMEI serial number for mobile phones is actually based on the same specification, so will often meet at least the mathematical check requirement.
UK National insurance numbers are a relatively unusual format, but there’s nothing to stop someone else using it.
You may need to include additional ingredients to avoid hot water!