When does predictive analytics go too far?
March 20, 2013 | Leave a Comment
This recent news item about Target using analytics to target (pun intended) promotions to newly pregnant mothers and the controversy surrounding it illustrates a profound dilemma. How should data be used for predictive purposes? The privacy issues, loss of control on the sharing of personal information, much less the risks of unexpected consequences, raise serious questions for those of us in the industry who develop the models to answer business questions of apparent importance to an organization.
Target’s business question seems innocent enough–determine as quickly as possible those customers who are likely to be pregnant and interested in certain products and promotions to capture their purchase and loyalty before losing it to the competition. But at what cost? In the case of the father who found out his teen daughter might be pregnant because of coupons sent to the home before she shared the information with the family, Target is facing more than just a public relations challenge. A false positive for this family might have created a bit of an awkward firestorm at home. In this case, the correct prediction did more than create a firestorm, it changed their lives and took the choice and control away from their customer. Is that really the desired result?
What should we be asking? What is the appropriate use of source data? What are the possible implications of accurate predictions and false positives? False positives in a predictive model to identify fraudulent tax refunds might only embarrass the taxpayer or delay the processing while scruitinized in a deeper review. There may be no lasting damage other than a frustrated taxpayer. Furthermore, correct predictions may have no negative consequences for the tax agency, but appropriately negative consequences for the perpetrator of refund fraud. Determining which students may be likely to drop out of university, accept an offer of admission to a program, or be delinquent in tuition payments also seem relatively innocuous.
What do we ever really know about what organizations might be doing with information collected about us? Very little. Should this level of use be disclosed and required in privacy notices? Should it depend on the type of use? Recently when my mortgage was sold to another servicer, I received a privacy disclosure that made it very clear that I had no rights or choice on how the bank used my personal and loan data for internal purposes. The notice pointed out this was legal under federal law. I only could indicate my preferences for how data was used with affiliates and how it was shared outside the bank. That still leaves the problem of how they might use personal data internally for their own predictive modeling that I may find inappropriate.
As BI professionals we should consider more than just the technical accuracy of a predictive model and the selected target variable. Is would also seem appropriate to consider privacy, potential consequences, and whether the end customer has a choice in saying how that data is to be used or not for decision making purposes. Perhaps the Golden Rule would be a prudent test.
ASR Analytics Announces new Partnership with Achieving the Dream
December 18, 2012 | Leave a Comment
Potomac, MD December 18, 2012 — ASR Analytics, a company specializing in data warehousing, business intelligence, and analytics solutions for Higher Education and Government, is pleased to partner with Achieving the Dream by sponsoring a scholarship that will help offset the cost of attendance to DREAM2013, the Annual Meeting on Student Success, for those who otherwise may not be able to attend. ASR works with numerous Achieving the Dream colleges to help them better manage their data and reporting infrastructure in support of their student success initiatives.
ASR will also be participating in the “Emerging Ideas Exchange” on Wednesday February 6th along with our client Davidson County Community College (DCCC), a recently announced Leader College, to demonstrate and share our Student Success and Completion Analytics solution. To keep abreast of all the activities at the conference follow @HigherEdBI and hashtag #DREAM2013 on Twitter or like “Higher Education Business Intelligence” on Facebook. For more information about Achieving the Dream and the DREAM2013 meeting, visit www.acheivingthedream.org/DREAM2013. For information about ASR Analytics visit our solutions page.
ASR Analytics, LLC
Decide with Intelligence. Act with Confidence.
BI, Uncertainty, and the Two Watch Paradox – Part 1
December 14, 2012 | Leave a Comment
There is a well know quote about certainty that goes something like “A person with one watch always knows what time it is; a person with two watches is never sure”. I was struck the other day by the pertinence of this quote to the process of working with clients who are implementing a new BI solution which replaces existing reporting tools.
Many organizations are like the man with one old watch – they may not have much in the way of reporting, and the reports may be labor-intensive and involve combining the results of multiple queries on different source systems, and may be of indeterminate accuracy, but the end results tend to take on a kind of hallowed authority and become the organization’s defining vision of reality. These organizations “know” what time it is, and accurately enough for everyday purposes, it seems.
Whatever the promise of a new BI system, whatever savings in staff effort, broader reporting scope, expanded data visualization capabilities and increased accuracy it delivers through state-of-the –art ETL, OLAP databases, and automated, scheduled report generation, it still will inevitably need to deal with the fact that, at least initially, it is the proverbial second watch, the one that shatters the certainty of “really” knowing what time it is. Because, inevitably, based on the complexity of the organization’s business rules, and the effect of applying these complex rules in to arcane data configurations, the values of accepted metrics will vary with the results of previously accepted reports, and this variance will cause angst.
And getting through this “two watch” phase is critical, and is made more difficult because the watch that is invariably believed is the one which has been depended upon for all the preceding years. In fact, projects can get permanently bogged down trying to exactly match these pre-existing reports as de facto acceptance criteria for the new system.
In the next post, I’ll discuss some ideas on getting through this problematic phase.
Moonlight and Other Correlated Factors
November 5, 2012 | Leave a Comment
Today’s Financial Times ”Weekly Review of the Fund Management Industry” has an interesting front page article that really struck me. The article describes how a firm which specializes in longevity research for pension funds recently discovered a spike in death rates when more than half of the moon is visible in the night sky.
My immediate reaction, and one that I think is relevant for any of us in the field of research and predictive modeling, is “Who even thought of data on the moon phases as an input variable to this research??” Some think predictive modeling is an automatic magical black box exercise. But it really is just math and depends on the capacity to throw the net wide, so to speak, across a range of seemingly completely unrelated data to see if patterns emerge.
Now, does knowing that there is a higher death rate at certain points in the lunar cycle help with predicting longevity? The article suggests not, but it does help with predicting payout patterns, which is of concern to pensions as well.
Now, let’s see… what kinds of things might cause students to drop out? Donors to increase giving? Students to default on Financial Aid payments?? Maybe that crazy full moon has something to do with it! And now I have the King Harvest “Dancing in the Moonlight” song stuck in my head…
Longer-term View of Student Success
October 11, 2012 | Leave a Comment
Traditionally, most institutions have measured graduation and completion rates by looking only at those students who have been awarded a degree or certificate from that same institution. However administrators have long complained that such a view does not take into account the dynamics of student “swirl” where a student may take courses at one school and transfer to another, possibly multiple times in their educational career. They often do so for a host of reasons including convenience, cost, and academic reputation. Furthermore, recent studies such as those by “Complete College America” and reported in the NY Times show dramatic gaps in both the completion and time to completion for students. The Chronicle of Higher Education has also tackled the issue of graduation rate measurement.
This issue is of particular concern at community colleges where to some extent the intention of a student may be to transfer to a four year institution from the outset. As accreditation and funding bodies demand more evidence of student success and completion, a more comprehensive picture is required to better measure outcomes. Many community colleges report graduation rates in the low teens on the IPEDS survey based on the federal cohort definitions. But is that apparently low performance really the case?
In a project with ASR Analytics, Howard Community College in Columbia, MD decided to tackle just this issue. After implementing an enrollment and retention analytics solution that provided a wealth of information to the President’s Team and Retention and Graduation Steering Team, Zoe Irvin, Executive Director of Planning and Research, decided the next critical step in enhancing service to students and their understanding of completion was to get a broader view of where students were going and what they were doing after leaving Howard. For some time they had been asking students upon entry what their educational goals were. But what really happens to them over the next 7 years?
The first challenge was to determine how to gather data about their students after they left. A small scale pilot program to track students who transferred to University of Maryland had been started, but something much broader, more sustainable, and less labor intensive was required to be of any value.
Howard, like many institutions, participates in the National Student Clearinghouse (NSC) EnrollVerify and DegreeVerify services which entitles them to request subsequent data for all their students they ever have submitted. This was a natural starting point to integrate with their own internal administative system generated enrollment and degree information. Together, this would give a more comprehensive picture of what the student is doing over time. Granted, there are some limitations. For example, not all institutions participate in the NSC service leaving potential gaps in student completion. Also, reporting of actual degree codes and programs of study is not standardized but rather freeform text making it impossible to know much more than they graduated from a 2 year or 4 year institution. From this one can assume a 2 or 4 year degree, but not much more.
In any event, a significant step forward has been achieved by linking past and present cohorts of students to the NSC data to see a more comprehensive and longer term view of continued study and degree completion at other schools. This has provided the Howard team with a way to see patterns by demographic characteristics available in the existing enrollment and retention analytics extended to the completion data. It can be used to improve delivery of services and provide interventions as well as ways to reach out and recapture students who may not be continuing elsewhere. In the words of one Vice President: “this combination of data provide us with a wealth of information to analyze that has never been available before.”
This solution is now being evaluated by and implemented at several other institutions. For more information, please contact ASR Analytics.
What was the question?
September 24, 2012 | Leave a Comment
The answer is 42. For those who are fans of “Hitchhikers Guide to the Galaxy“, you know the story of how interstellar traveller Arthur Dent, while visiting another planet, learns that another race of beings had created a supercomputer to answer the ultimate question of the meaning of Life, the Universe and Everything. It took millions of years to compute the answer, but by that time nobody really remembered the question.
This often happens when working with BI solutions that have been established for awhile. It can also be an issue while trying to gather requirements. People often become so focused on the answer (and particularly if it is “right” or not) that they forget the question. What is the real purpose of the data? What do you do with the analysis? How will it change your behavior and decision making? What will you do differently in the interaction with the student or constituent knowing the answer you have?
Every so often in any Business Intelligence (BI) program it is essential to step back and consider: “What was the question?” If you cannot honestly come up with a purpose for a report, measure, or Key Performance Indicator (KPI) than maybe it is time to retire it and review what is really needed to answer new questions at hand. Don’t wait until everyone forgets and none of the information is meaningful.
Davidson County Community College – Data Informed Decisions
August 27, 2012 | Leave a Comment
Last week an ASR client was featured in Community College Week with a comprehensive review of the use of technology tools, such as Business Intelligence (BI) to help manage and improve student success.
Click here to read how Stacy Holliday, Director, Campus Innovations and Student Success, sees systematic change within the college as a necessary pre-requisite to use technology effectively and how Davidson County Community College is moving forward with their plans.
The Last Mile Problem and Business Intelligence
July 18, 2012 | Leave a Comment
The most expensive piece in a telecommunications grid is not the huge data pipes that make up national and international data networks, or even the incredibly specialized switching equipment that control the staggering amount of data which moves across these networks, or even the back-office billing systems that somehow tally the charges for all this data. In each of these cases, economies of scale bring the per-user price of these components into a manageable cost structure. No, the most expensive and problematic aspect of the whole system is what is called the ‘Last Mile’, that critical connection from the telecom switch to the consumer’s home or place of business. This is pure infrastructure, and involves digging up streets, running wires, and working with customers one at a time. There are no economies of scale and every possible avenue to more efficiently make this connection has been explored. My favorite examples of the innovative solutions developed to solve the ‘Last Mile’ problem when I worked in the telecom field were: first, a company in the 90’s that built robots which would crawl through sewer lines to run telecom cable to access points within a customer premise and another which used free-space optical beams to send information to receivers on customer rooftops, a solution with obvious challenges during periods when the environment is not cooperative, for example, rain and fog!
So how does this ‘Last Mile’ problem in telecom relate to Business Intelligence? I would argue the same connection problem exists between the BI solution (including the data warehouse, the BI infrastructure and presentation tools, the cubes and reports) and the end user. It has struck me that there are so many amazing BI solutions out there that provide so many potentially game-changing capabilities for their users but which still, well, fail. They fail to make an impact, fail in their adoption by business users, and fail to meet the rosy expectations of their institutional sponsors. In my experience, the most common reason for this is the inability to effectively bridge the ‘Last Mile’ of BI solution delivery, that is, to make the connection from the infrastructure of the solution to its end users. Some of these solutions have amazing capabilities, just like those telecom networks, but are worthless if they cannot get their content into the hands of their end users for those users to make it part of the way they do business.
So, having drawn this parallel, what insights to achieve success can be gained for BI solutions by looking at it in this light?
I think the fundamental one is to emphasize the importance of considering the BI ‘Last Mile’ into the overall design of the BI solution. It is always tempting to adopt the adage from the movie ‘Field of Dreams’ – “If you build it, they will come!”. But experience shows that they may not come, no matter how wonderfully it is built. The users of a BI implementation and what they are capable of must be considered from the outset. A realistic assessment must be made of what they will need to be able to adopt the capabilities being rolled out.
Secondly, solutions to both of these problems are difficult to scale. The analogy to the telecom guy climbing a pole or digging a ditch is the one-on-one communication that has to take place to get BI users on board and invested in the solution. How do you bring along novice users to step up to what can be a daunting new challenge? How do you convince reporting users who have adapted to existing, less-capable but known solutions that they should extend the effort to learn a new system? How do you show everyone involved that the BI solution will help them, and is not just something imposed on them from upper management? This involves careful design of the user-facing artifacts of the BI system, but also careful documentation and training. When it comes right down to it, you really have to sell the solution to the user community. In our consulting practice, we have found that one of the best ways to engage and motivate new users of a system is to take reporting problems that they struggle with and solve them as sample problems in a training session. This obviously requires an individualized approach to training, tailored not just for a specific customer but for a particular set of users within an organization. But the enthusiasm that this engenders and the system buy-in that comes from the demonstration of the system’s capabilities in a well-understood domain makes it worth the effort. Even when solving these problems requires advanced skills with the tool, skills users might not totally understand, it is a concrete demonstration that the time and effort needed to learn the system will have a meaningful payback.
And the final insight is the stark reality that having a flawed plan or no plan at all is going to be fatal to the success of the whole system, no matter how remarkable the technical solution or infrastructure underlying it may be.
The ‘Last Mile’ problem applied to the BI world is more insidious and more likely to be overlooked than the physical ‘Last Mile’ problem in the telecom world. With all its thorniness, it is starkly obvious that some solution is necessary to bridge the physical gap from switch to home or office. However, it is far easier to delude oneself that the BI baseball field needs only to be constructed in an Iowa cornfield and that users will emerge like ghostly Chicago White Sox and start running down fly balls, or rather discovering business insights from the BI solution. With all due respect to Kevin Costner, that just isn’t likely to happen.
Playing the Numbers
June 25, 2012 | Leave a Comment
And, no, I don’t mean playing the lottery numbers! We’ve posted a couple of articles in the past about What Makes a Good Measure (Jan 3, 2012) and Telling Stories with Data (Jan 21, 2011). These posts discuss the challenges of working with numbers in the analytics world. I also emphasize with clients the importance of expressing the context of a report or data that people are seeing. What does it include? And sometimes much more importantly, what does it NOT include? What filters are applied? What are the values of those filters? What time frame does the data represent? Can users see this context when reviewing a report or is it likely they could apply their own assumptions about what data means?
A perfect example of this context issue came while reading an excellent commentary in today’s US edition of the Financial Times. (You can access it via this link, but have to sign up for a free account to read the full text.) Let me summarize the gist of Steven Hill’s argument about inappropriate and misleading use of numbers.
The conventional thinking, particularly purveyed by the media and governments themselves, is that youth unemployment in Europe is at crisis levels. There are demoralizing rates of nearly 50% in Spain and Greece, and over 20% in the rest of the Eurozone. But is this really the case? He argues that a very flawed methodology is used to calculate those rates. Namely, they do not include those youths who are in school or job training and not looking for a job anyway. The denominator is a much smaller number of individuals, and therefore drastically overstates the unemployment rate. So, the “unemployment rate” conveyed most often to the public does not tell the whole story.
He suggests a better measure might be an overall ratio of the entire youth population 24 years old and younger (regardless of their intent to seek employment or schooling) to the youth who are actively seeking a job and can’t find one. Using that measure, Spain’s youth unemployment is only 19% (vs 48.9%) and Greece’s is only 13% (vs 49.3%). In the Eurozone as a whole it would only be 8.7%. That is a dramatic difference in the two measures. It tells a different story. In the adult unemployment rate, there is a reverse problem by excluding those who have given up looking for work, the rate is commonly understated.
As I read the article, I was thinking you could create a similar ratio of students in school or job training for comparison purposes that shares the same denominator and thus has a common foundation. Further, you could list the ratio of those not in the labor market. This is a much better way of expressing data on common grounds with common definitions and therefore make better informed policy decisions. Yet, traditionally, the unemployment “rate” has always excluded those in school and not looking for work, therefore providing frightening unemployment picture to those who wish to use it that way.
Now take for instance the challenge of communicating graduation and completion rates for students. Completion is a hot button issue across all segments of higher education. The IPEDS numbers are famously unreliable since by definition the cohort only includes students who were first time to any college. Is that really helpful? In community colleges it leaves a large population out of the denominator since many students have attended other institutions before. Furthermore, for any institution, do they measure if the student completed somewhere else after they transferred? At least the Department of Education recognizes these issues and has formed a working group to address the limitations of current measure definitions.
What similar scenarios might you have in your institutional measurement structures? Are your rates and ratios on a common foundation? Do people know the whole context of the ratios they are seeing? What might need to change to address the shortcomings of what is communicated? Discuss!
When More Complicated is Actually Easier
June 14, 2012 | Leave a Comment
Many institutions struggle with their reporting and analytics deployment. They face a dilemma about how to roll out self service to users and still meet complex reporting requirements. These requirements appear to need the “high cost and high touch” of IT support for those users to be successful.
I hear frequently from BI project leaders that they’ve tried to give users the ability to create their own reports, but most can’t figure out how to do it with the tools and training provided. The problem is, an assumption has been made (often perpetuated by the marketing and sales messages of BI and ERP vendor’s themselves) that their drag and drop reporting and available templates are easy for anyone to use.
Let’s break apart that conventional wisdom, however, and dig a little deeper into this paradox. True enough, conceptually, many of these environments such as SAP Business Objects WebIntelligence, which is the core technology in Datatel’s (now Ellucian) Reporting and Operating Analytics (DROA) solution, are designed for casual users and ease of creating reports with advanced interactivity. The features of Cognos used with the Banner Enterprise Data Warehouse (EDW) are similar and you’re faced with a similar dilemma. Pick your favorite BI tool to use with the complex data of a mature ERP system: Tableau, SAS, even Excel. You name it, soon you’ll come to a brick wall.
The problem is, many reports users need are quite complex and don’t fit neatly into the typical approach of trying to drag and drop all the necessary data and filters into a single query. Yet, that is the way that most projects approach the training and roll out. Everyone is lulled into the “it should be easy and all the data is here in one place to query” effect!
Take this real world example: the Graduate Studies division needs a report at the end of each term to determine those who are not eligible to continue. They need a list of students who are actively enrolled in the Graduate level, in certain programs, have more than one C grade in a 500 level course in the term, and alongside this info list their General Academic Advisor and not their program advisor.
There are at least 4 areas of data needed, namely, the student, registrations, enrolled program, and assigned advisors. Worse, many of the filters and conditions only apply to certain pieces of the data. One might think since they are all tied together (and they are, albeit loosely) by the student ID, shouldn’t all this data be able to be dragged at once into the query? Magically there should be a result. There will be one, but not at all what is expected. Suddenly, IT is needed to help build this report!
Instead of trying to do this all in one query, it is much easier to break it apart into four distinct pieces. In WebIntelligence, each piece can be defined in query and tied together with “advanced” query techniques. This approach actually expresses the reporting requirements more naturally. One is the list of student in a program info. Another is the count of C grades but only for those in the query result of students in the graduate programs for that term. WebIntelligence allows you to do this type of filter from the results of another query quite easily. There is also sub-query and combined queries capability. Any report can combine data from more than one query and even more than one data source. These are very “complex” and powerful features, but actually make the reporting problem simpler because it is breaking it down into smaller, manageable pieces.
In fact, the seemingly more complex approach and teaching users how to do it actually makes them more likely to create the desired reports successfully and correctly. Part of the reason for this is that users think of their requirements in small chunks. Sometimes, they even forget to define very important little chunks and a query may run on the whole database! Any time you break a data problem down into smaller pieces it is easier to define and verify results. The query statements that can be better expressed as subsets of data and filters that are linked together by some common identifier such as the Student ID or a term ID. For Datatel users familiar with UniQuery or QueryBuilder used with the UniData database, this is fundamentally the same concept as savedlists and using the results of one query to select for another.
Take your pick of example. Maybe you need only those students with a GPA of 3.0 or higher, or 20 credits after 18 months, or have a Pell FA award. The list goes on. This type of query problem is also not unique to higher education. In any case, don’t hesitate to introduce users to advanced query concepts. This will make them more self reliant and can reduce support requirements.

