This recent news item about Target using analytics to target (pun intended) promotions to newly pregnant mothers and the controversy surrounding it illustrates a profound dilemma. How should data be used for predictive purposes? The privacy issues, loss of control on the sharing of personal information, much less the risks of unexpected consequences, raise serious questions for those of us in the industry who develop the models to answer business questions of apparent importance to an organization.
Target's business question seems innocent enough – determine as quickly as possible those customers who are likely to be pregnant and interested in certain products and promotions to capture their purchase and loyalty before losing it to the competition. But at what cost? In the case of the father who found out his teen daughter might be pregnant because of coupons sent to the home before she shared the information with the family, Target is facing more than just a public relations challenge. A false positive for this family might have created a bit of an awkward firestorm at home. In this case, the correct prediction did more than create a firestorm, it changed their lives and took the choice and control away from their customer. Is that really the desired result?
What should we be asking? What is the appropriate use of source data? What are the possible implications of accurate predictions and false positives? False positives in a predictive model to identify fraudulent tax refunds might only embarrass the taxpayer or delay the processing while scrutinized in a deeper review. There may be no lasting damage other than a frustrated taxpayer. Furthermore, correct predictions may have no negative consequences for the tax agency, but appropriately negative consequences for the perpetrator of refund fraud. Determining which students may be likely to drop out of university, accept an offer of admission to a program, or be delinquent in tuition payments also seem relatively innocuous.
What do we ever really know about what organizations might be doing with information collected about us? Very little. Should this level of use be disclosed and required in privacy notices? Should it depend on the type of use? Recently when my mortgage was sold to another servicer, I received a privacy disclosure that made it very clear that I had no rights or choice on how the bank used my personal and loan data for internal purposes. The notice pointed out this was legal under federal law. I only could indicate my preferences for how data was used with affiliates and how it was shared outside the bank. That still leaves the problem of how they might use personal data internally for their own predictive modeling that I may find inappropriate.
As BI professionals we should consider more than just the technical accuracy of a predictive model and the selected target variable. Is would also seem appropriate to consider privacy, potential consequences, and whether the end customer has a choice in saying how that data is to be used or not for decision making purposes. Perhaps the Golden Rule would be a prudent test.