When does predictive analytics go too far?

March 20, 2013 | Leave a Comment

This recent news item about Target using analytics to target (pun intended) promotions to newly pregnant mothers and the controversy surrounding it illustrates a profound dilemma. How should data be used for predictive purposes? The privacy issues, loss of control on the sharing of personal information, much less the risks of unexpected consequences, raise serious questions for those of us in the industry who develop the models to answer business questions of apparent importance to an organization.

Target’s business question seems innocent enough–determine as quickly as possible those customers who are likely to be pregnant and interested in certain products and promotions to capture their purchase and loyalty before losing it to the competition. But at what cost? In the case of the father who found out his teen daughter might be pregnant because of coupons sent to the home before she shared the information with the family, Target is facing more than just a public relations challenge. A false positive for this family might have created a bit of an awkward firestorm at home. In this case, the correct prediction did more than create a firestorm, it changed their lives and took the choice and control away from their customer. Is that really the desired result?

What should we be asking? What is the appropriate use of source data? What are the possible implications of accurate predictions and false positives? False positives in a predictive model to identify fraudulent tax refunds might only embarrass the taxpayer or delay the processing while scruitinized in a deeper review. There may be no lasting damage other than a frustrated taxpayer. Furthermore, correct predictions may have no negative consequences for the tax agency, but appropriately negative consequences for the perpetrator of refund fraud. Determining which students may be likely to drop out of university, accept an offer of admission to a program, or be delinquent in tuition payments also seem relatively innocuous.

What do we ever really know about what organizations might be doing with information collected about us? Very little. Should this level of use be disclosed and required in privacy notices? Should it depend on the type of use? Recently when my mortgage was sold to another servicer, I received a privacy disclosure that made it very clear that I had no rights or choice on how the bank used my personal and loan data for internal purposes. The notice pointed out this was legal under federal law. I only could indicate my preferences for how data was used with affiliates and how it was shared outside the bank. That still leaves the problem of how they might use personal data internally for their own predictive modeling that I may find inappropriate.

As BI professionals we should consider more than just the technical accuracy of a predictive model and the selected target variable.  Is would also seem appropriate to consider privacy, potential consequences, and whether the end customer has a choice in saying how that data is to be used or not for decision making purposes.   Perhaps the Golden Rule would be a prudent test.

BI, Uncertainty, and the Two Watch paradox – Part 2

January 29, 2013 | 1 Comment

In part 1 of this post, I described what I called the “two watch” phase of BI system adoption, when discrepancies between existing reports and the results of a newly implemented BI system cause angst of the part of the client and can bring the process of system acceptance and adoption to a screeching halt.   In this second part of the post, we move from describing the problem to asking whether there are constructive steps to help everyone through it.  In other words, what can be done to get a client to the point where they feel comfortable putting that old watch back in their pocket for good, and accepting the numbers coming out of the BI system as the new single watch allowing them to know the time without any lingering doubts?  

I think there actually are a couple of things that can help.  First, you can discuss the questions up front that are going to get asked anyway when there are discrepancies between old and new reports.   What level of accuracy is needed from the new BI system? Does the analysis to be done require correct tallying of every single system transaction?   Are the existing reports known to REALLY be accurate?  Are they validated on an ongoing basis in any way?   What will make the client comfortable about pulling the plug on the old reports for good?  What are the essential acceptance criteria for the new system?  Discussing these issues ahead of time can instill confidence, exactly as asking them only after the results of the new and old systems diverge will sound defensive and instill doubt.

However, the most critical aspect of the old and new comparison game is the ability to pinpoint differences in terms of the specific transactions causing the deltas in total numbers.   The conceptual cul-de-sac to be avoided at all costs is the deadly standoff where the new system is trying to simply match an aggregate number – “Our old report says 45,039 and the new one says 44,823.  That can’t be right!”.  In this case, you are simply shooting in the dark, not knowing if the old report is any good, or if there really is a bad business rule or programming error in there somewhere.   It is critical that both the old and new reports be traceable to the individual transactions which they represent, and those that are causing any discrepancies can be individually evaluated.  This can actually translate a negative (the reporting discrepancy) into a positive (increased customer confidence) when individual transactions can be isolated and their inclusion or exclusion explained in terms of the organization’s business rules.

And finally, at the right time, the client needs to agree to turn off the old reports, to cut the cord to the past.  It is always tempting to keep the old ones around “just in case”, but this will always provide a lingering organizational dependence that really needs to be nipped in the bud.   If the new reports are validated, it should be in with the new and definitely out with the old. 

However, at the end of day, there is that moment when that happens, when existing reports can be safely relegated to the vast scrap heap of obsolete software and everyone can settle down with the one, shiny new watch that delivers the single accepted version of reality.  And that day, rather than any other milestone in the BI project, is the real finish line we should keep our eyes on from day one.

ASR Analytics Announces new Partnership with Achieving the Dream

December 18, 2012 | Leave a Comment

Potomac, MD  December 18, 2012 — ASR Analytics, a company specializing in data warehousing, business intelligence, and analytics solutions for Higher Education and Government, is pleased to partner with Achieving the Dream by sponsoring a scholarship that will help offset the cost of attendance to DREAM2013, the Annual Meeting on Student Success, for those who otherwise may not be able to attend.  ASR works with numerous Achieving the Dream colleges to help them better manage their data and reporting infrastructure in support of their student success initiatives.

ASR will also be participating in the “Emerging Ideas Exchange” on Wednesday February 6th along with our client Davidson County Community College (DCCC), a recently announced Leader College, to demonstrate and share our Student Success and Completion Analytics solution.  To keep abreast of all the activities at the conference follow @HigherEdBI and hashtag #DREAM2013 on Twitter or like “Higher Education Business Intelligence” on Facebook. For more information about Achieving the Dream and the DREAM2013 meeting, visit www.acheivingthedream.org/DREAM2013. For information about ASR Analytics visit our solutions page.

 

ASR Analytics, LLC

Decide with Intelligence.  Act with Confidence.

BI, Uncertainty, and the Two Watch Paradox – Part 1

December 14, 2012 | Leave a Comment

There is a well know quote about certainty that goes something like “A person with one watch always knows what time it is; a person with two watches is never sure”.   I was struck the other day by the pertinence of this quote to the process of working with clients who are implementing a new BI solution which replaces existing reporting tools.  

Many organizations are like the man with one old watch – they may not have much in the way of reporting, and the reports may be labor-intensive and involve combining the results of multiple queries on different source systems, and may be of indeterminate accuracy, but the end results tend to take on a kind of hallowed authority and become the organization’s defining vision of reality.  These organizations “know” what time it is, and accurately enough for everyday purposes, it seems. 

Whatever the promise of a new BI system, whatever savings in staff effort, broader reporting scope, expanded data visualization capabilities and increased accuracy it delivers through state-of-the –art ETL, OLAP databases, and automated, scheduled report generation, it still will inevitably need to deal with the fact that, at least initially, it is the proverbial second watch, the one that shatters the certainty of “really” knowing what time it is.   Because, inevitably, based on the complexity of the organization’s business rules, and the effect of applying these complex rules in to arcane data configurations, the values of accepted metrics will vary with the results of previously accepted reports, and this variance will cause angst.  

And getting through this “two watch” phase is critical, and is made more difficult because the watch that is invariably believed is the one which has been depended upon for all the preceding years.  In fact, projects can get permanently bogged down trying to exactly match these pre-existing reports as de facto acceptance criteria for the new system.

In the next post, I’ll discuss some ideas on getting through this problematic phase.

Moonlight and Other Correlated Factors

November 5, 2012 | Leave a Comment

Today’s Financial Times ”Weekly Review of the Fund Management Industry” has an interesting front page article that really struck me. The article describes how a firm which specializes in longevity research for pension funds recently discovered a spike in death rates when more than half of the moon is visible in the night sky.

My immediate reaction, and one that I think is relevant for any of us in the field of research and predictive modeling, is “Who even thought of data on the moon phases as an input variable to this research??”  Some think predictive modeling is an automatic magical black box exercise. But it really is just math and depends on the capacity to throw the net wide, so to speak, across a range of seemingly completely unrelated data to see if patterns emerge.

Now, does knowing that there is a higher death rate at certain points in the lunar cycle help with predicting longevity? The article suggests not, but it does help with predicting payout patterns, which is of concern to pensions as well.

Now, let’s see… what kinds of things might cause students to drop out? Donors to increase giving? Students to default on Financial Aid payments?? Maybe that crazy full moon has something to do with it! And now I have the King Harvest “Dancing in the Moonlight” song stuck in my head…

Longer-term View of Student Success

October 11, 2012 | Leave a Comment

Traditionally, most institutions have measured graduation and completion rates by looking only at those students who have been awarded a degree or certificate from that same institution. However administrators have long complained that such a view does not take into account the dynamics of student “swirl” where a student may take courses at one school and transfer to another, possibly multiple times in their educational career. They often do so for a host of reasons including convenience, cost, and academic reputation. Furthermore, recent studies such as those by “Complete College America” and reported in the NY Times show dramatic gaps in both the completion and time to completion for students. The Chronicle of Higher Education has also tackled the issue of graduation rate measurement.

This issue is of particular concern at community colleges where to some extent the intention of a student may be to transfer to a four year institution from the outset. As accreditation and funding bodies demand more evidence of student success and completion, a more comprehensive picture is required to better measure outcomes. Many community colleges report graduation rates in the low teens on the IPEDS survey based on the federal cohort definitions. But is that apparently low performance really the case?

In a project with ASR Analytics, Howard Community College in Columbia, MD decided to tackle just this issue. After implementing an enrollment and retention analytics solution that provided a wealth of information to the President’s Team and Retention and Graduation Steering Team, Zoe Irvin,  Executive Director of Planning and Research, decided the next critical step in enhancing service to students and their understanding of completion was to get a broader view of where students were going and what they were doing after leaving Howard. For some time they had been asking students upon entry what their educational goals were. But what really happens to them over the next 7 years?

The first challenge was to determine how to gather data about their students after they left. A small scale pilot program to track students who transferred to University of Maryland had been started, but something much broader, more sustainable, and less labor intensive was required to be of any value.

Howard, like many institutions, participates in the National Student Clearinghouse (NSC) EnrollVerify and DegreeVerify services which entitles them to request subsequent data for all their students they ever have submitted. This was a natural starting point to integrate with their own internal administative system generated enrollment and degree information. Together, this would give a more comprehensive picture of what the student is doing over time. Granted, there are some limitations. For example, not all institutions participate in the NSC service leaving potential gaps in student completion. Also, reporting of actual degree codes and programs of study is not standardized but rather freeform text making it impossible to know much more than they graduated from a 2 year or 4 year institution. From this one can assume a 2 or 4 year degree, but not much more.

In any event, a significant step forward has been achieved by linking past and present cohorts of students to the NSC data to see a more comprehensive and longer term view of continued study and degree completion at other schools. This has provided the Howard team with a way to see patterns by demographic characteristics available in the existing enrollment and retention analytics extended to the completion data. It can be used to improve delivery of services and provide interventions as well as ways to reach out and recapture students who may not be continuing elsewhere. In the words of one Vice President: “this combination of data provide us with a wealth of information to analyze that has never been available before.”

This solution is now being evaluated by and implemented at several other institutions. For more information, please contact ASR Analytics.

What was the question?

September 24, 2012 | Leave a Comment

The answer is 42. For those who are fans of “Hitchhikers Guide to the Galaxy“, you know the story of how interstellar traveller Arthur Dent, while visiting another planet, learns that another race of beings had created a supercomputer to answer the ultimate question of the meaning of Life, the Universe and Everything. It took millions of years to compute the answer, but by that time nobody really remembered the question.

This often happens when working with BI solutions that have been established for awhile. It can also be an issue while trying to gather requirements. People often become so focused on the answer (and particularly if it is “right” or not) that they forget the question. What is the real purpose of the data? What do you do with the analysis? How will it change your behavior and decision making? What will you do differently in the interaction with the student or constituent knowing the answer you have?

Every so often in any Business Intelligence (BI) program it is essential to step back and consider: “What was the question?”  If you cannot honestly come up with a purpose for a report, measure, or Key Performance Indicator (KPI) than maybe it is time to retire it and review what is really needed to answer new questions at hand. Don’t wait until everyone forgets and none of the information is meaningful.

Davidson County Community College – Data Informed Decisions

August 27, 2012 | Leave a Comment

Last week an ASR client was featured in Community College Week with a comprehensive review of the use of technology tools, such as Business Intelligence (BI) to help manage and improve student success.

Click here to read how Stacy Holliday, Director, Campus Innovations and Student Success, sees systematic change within the college as a necessary pre-requisite to use technology effectively and how Davidson County Community College is moving forward with their plans.

The Last Mile Problem and Business Intelligence

July 18, 2012 | Leave a Comment

The most expensive piece in a telecommunications grid is not the huge data pipes that make up national and international data networks, or even the incredibly specialized switching equipment that control the staggering amount of data which moves across these networks, or even the back-office billing systems that somehow tally the charges for all this data.  In each of these cases, economies of scale bring the per-user price of these components into a manageable cost structure.  No, the most expensive and problematic aspect of the whole system is what is called the ‘Last Mile’, that critical connection from the telecom switch to the consumer’s home or place of business.   This is pure infrastructure, and involves digging up streets, running wires, and working with customers one at a time.  There are no economies of scale and every possible avenue to more efficiently make this connection has been explored.  My favorite examples of the innovative solutions developed to solve the ‘Last Mile’ problem when I worked in the telecom field were: first, a company in the 90’s that built robots which would crawl through sewer lines to run telecom cable to access points within a customer premise and another which used free-space optical beams to send information to receivers on customer rooftops, a solution with obvious challenges during periods when the environment is not cooperative, for example, rain and fog!  

So how does this ‘Last Mile’ problem in telecom relate to Business Intelligence?  I would argue the same connection problem exists between the BI solution (including the data warehouse, the BI infrastructure and presentation tools, the cubes and reports) and the end user.  It has struck me that there are so many amazing BI solutions out there that provide so many potentially game-changing capabilities for their users but which still, well, fail.  They fail to make an impact, fail in their adoption by business users, and fail to meet the rosy expectations of their institutional sponsors.   In my experience, the most common reason for this is the inability to effectively bridge the ‘Last Mile’ of BI solution delivery, that is, to make the connection from the infrastructure of the solution to its end users.   Some of these solutions have amazing capabilities, just like those telecom networks, but are worthless if they cannot get their content into the hands of their end users for those users to make it part of the way they do business. 

So, having drawn this parallel, what insights to achieve success can be gained for BI solutions by looking at it in this light?

I think the fundamental one is to emphasize the importance of considering the BI ‘Last Mile’ into the overall design of the BI solution.  It is always tempting to adopt the adage from the movie ‘Field of Dreams’ – “If you build it, they will come!”.   But experience shows that they may not come, no matter how wonderfully it is built.  The users of a BI implementation and what they are capable of must be considered from the outset. A realistic assessment must be made of what they will need to be able to adopt the capabilities being rolled out.

Secondly, solutions to both of these problems are difficult to scale.  The analogy to the telecom guy climbing a pole or digging a ditch is the one-on-one communication that has to take place to get BI users on board and invested in the solution.  How do you bring along novice users to step up to what can be a daunting new challenge?  How do you convince reporting users who have adapted to existing, less-capable but known solutions that they should extend the effort to learn a new system?  How do you show everyone involved that the BI solution will help them, and is not just something imposed on them from upper management?  This involves careful design of the user-facing artifacts of the BI system, but also careful documentation and training. When it comes right down to it, you really have to sell the solution to the user community.  In our consulting practice, we have found that one of the best ways to engage and motivate new users of a system is to take reporting problems that they struggle with and solve them as sample problems in a training session.  This obviously requires an individualized approach to training, tailored not just for a specific customer but for a particular set of users within an organization.  But the enthusiasm that this engenders and the system buy-in that comes from the demonstration of the system’s capabilities in a well-understood domain makes it worth the effort. Even when solving these problems requires advanced skills with the tool, skills users might not totally understand, it is a concrete demonstration that the time and effort needed to learn the system will have a meaningful payback.

And the final insight is the stark reality that having a flawed plan or no plan at all is going to be fatal to the success of the whole system, no matter how remarkable the technical solution or infrastructure underlying it may be.

The ‘Last Mile’ problem applied to the BI world is more insidious and more likely to be overlooked than the physical ‘Last Mile’ problem in the telecom world.  With all its thorniness, it is starkly obvious that some solution is necessary to bridge the physical gap from switch to home or office.  However, it is far easier to delude oneself that the BI baseball field needs only to be constructed in an Iowa cornfield and that users will emerge like ghostly Chicago White Sox and start running down fly balls, or rather discovering business insights from the BI solution.  With all due respect to Kevin Costner, that just isn’t likely to happen.

Playing the Numbers

June 25, 2012 | Leave a Comment

And, no, I don’t mean playing the lottery numbers!  We’ve posted a couple of articles in the past about What Makes a Good Measure (Jan 3, 2012) and Telling Stories with Data (Jan 21, 2011).  These posts discuss the challenges of working with numbers in the analytics world.  I also emphasize with clients the importance of expressing the context of a report or data that people are seeing. What does it include? And sometimes much more importantly, what does it NOT include? What filters are applied? What are the values of those filters? What time frame does the data represent? Can users see this context when reviewing a report or is it likely they could apply their own assumptions about what data means?

A perfect example of this context issue came while reading an excellent commentary in today’s US edition of the Financial Times. (You can access it via this link, but have to sign up for a free account to read the full text.)   Let me summarize the gist of Steven Hill’s argument about inappropriate and misleading use of numbers.

The conventional thinking, particularly purveyed by the media and governments themselves, is that youth unemployment in Europe is at crisis levels. There are demoralizing rates of nearly 50% in Spain and Greece, and over 20% in the rest of the Eurozone. But is this really the case? He argues that a very flawed methodology is used to calculate those rates. Namely, they do not include those youths who are in school or job training and not looking for a job anyway. The denominator is a much smaller number of individuals, and therefore drastically overstates the unemployment rate.  So, the “unemployment rate” conveyed most often to the public does not tell the whole story.

He suggests a better measure might be an overall ratio of the entire youth population 24 years old and younger (regardless of their intent to seek employment or schooling) to the youth who are actively seeking a job and can’t find one.  Using that measure, Spain’s youth unemployment is only 19% (vs 48.9%) and Greece’s is only 13% (vs 49.3%). In the Eurozone as a whole it would only be 8.7%. That is a dramatic difference in the two measures. It tells a different story.  In the adult unemployment rate, there is a reverse problem by excluding those who have given up looking for work, the rate is commonly understated.

As I read the article, I was thinking you could create a similar ratio of students in school or job training for comparison purposes that shares the same denominator and thus has a common foundation. Further, you could list the ratio of those not in the labor market. This is a much better way of expressing data on common grounds with common definitions and therefore make better informed policy decisions.  Yet, traditionally, the unemployment “rate” has always excluded those in school and not looking for work, therefore providing frightening unemployment picture to those who wish to use it that way.

Now take for instance the challenge of communicating graduation and completion rates for students. Completion is a hot button issue across all segments of higher education. The IPEDS numbers are famously unreliable since by definition the cohort only includes students who were first time to any college. Is that really helpful? In community colleges it leaves a large population out of the denominator since many students have attended other institutions before. Furthermore, for any institution, do they measure if the student completed somewhere else after they transferred? At least the Department of Education recognizes these issues and has formed a working group to address the limitations of current measure definitions.

What similar scenarios might you have in your institutional measurement structures? Are your rates and ratios on a common foundation?  Do people know the whole context of the ratios they are seeing? What might need to change to address the shortcomings of what is communicated? Discuss!

Next Page »