October 11, 2012 | Leave a Comment
Traditionally, most institutions have measured graduation and completion rates by looking only at those students who have been awarded a degree or certificate from that same institution. However administrators have long complained that such a view does not take into account the dynamics of student “swirl” where a student may take courses at one school and transfer to another, possibly multiple times in their educational career. They often do so for a host of reasons including convenience, cost, and academic reputation. Furthermore, recent studies such as those by “Complete College America” and reported in the NY Times show dramatic gaps in both the completion and time to completion for students. The Chronicle of Higher Education has also tackled the issue of graduation rate measurement.
This issue is of particular concern at community colleges where to some extent the intention of a student may be to transfer to a four year institution from the outset. As accreditation and funding bodies demand more evidence of student success and completion, a more comprehensive picture is required to better measure outcomes. Many community colleges report graduation rates in the low teens on the IPEDS survey based on the federal cohort definitions. But is that apparently low performance really the case?
In a project with ASR Analytics, Howard Community College in Columbia, MD decided to tackle just this issue. After implementing an enrollment and retention analytics solution that provided a wealth of information to the President’s Team and Retention and Graduation Steering Team, Zoe Irvin, Executive Director of Planning and Research, decided the next critical step in enhancing service to students and their understanding of completion was to get a broader view of where students were going and what they were doing after leaving Howard. For some time they had been asking students upon entry what their educational goals were. But what really happens to them over the next 7 years?
The first challenge was to determine how to gather data about their students after they left. A small scale pilot program to track students who transferred to University of Maryland had been started, but something much broader, more sustainable, and less labor intensive was required to be of any value.
Howard, like many institutions, participates in the National Student Clearinghouse (NSC) EnrollVerify and DegreeVerify services which entitles them to request subsequent data for all their students they ever have submitted. This was a natural starting point to integrate with their own internal administative system generated enrollment and degree information. Together, this would give a more comprehensive picture of what the student is doing over time. Granted, there are some limitations. For example, not all institutions participate in the NSC service leaving potential gaps in student completion. Also, reporting of actual degree codes and programs of study is not standardized but rather freeform text making it impossible to know much more than they graduated from a 2 year or 4 year institution. From this one can assume a 2 or 4 year degree, but not much more.
In any event, a significant step forward has been achieved by linking past and present cohorts of students to the NSC data to see a more comprehensive and longer term view of continued study and degree completion at other schools. This has provided the Howard team with a way to see patterns by demographic characteristics available in the existing enrollment and retention analytics extended to the completion data. It can be used to improve delivery of services and provide interventions as well as ways to reach out and recapture students who may not be continuing elsewhere. In the words of one Vice President: “this combination of data provide us with a wealth of information to analyze that has never been available before.”
This solution is now being evaluated by and implemented at several other institutions. For more information, please contact ASR Analytics.
September 24, 2012 | Leave a Comment
The answer is 42. For those who are fans of “Hitchhikers Guide to the Galaxy“, you know the story of how interstellar traveller Arthur Dent, while visiting another planet, learns that another race of beings had created a supercomputer to answer the ultimate question of the meaning of Life, the Universe and Everything. It took millions of years to compute the answer, but by that time nobody really remembered the question.
This often happens when working with BI solutions that have been established for awhile. It can also be an issue while trying to gather requirements. People often become so focused on the answer (and particularly if it is “right” or not) that they forget the question. What is the real purpose of the data? What do you do with the analysis? How will it change your behavior and decision making? What will you do differently in the interaction with the student or constituent knowing the answer you have?
Every so often in any Business Intelligence (BI) program it is essential to step back and consider: “What was the question?” If you cannot honestly come up with a purpose for a report, measure, or Key Performance Indicator (KPI) than maybe it is time to retire it and review what is really needed to answer new questions at hand. Don’t wait until everyone forgets and none of the information is meaningful.
August 27, 2012 | Leave a Comment
Last week an ASR client was featured in Community College Week with a comprehensive review of the use of technology tools, such as Business Intelligence (BI) to help manage and improve student success.
Click here to read how Stacy Holliday, Director, Campus Innovations and Student Success, sees systematic change within the college as a necessary pre-requisite to use technology effectively and how Davidson County Community College is moving forward with their plans.
July 18, 2012 | Leave a Comment
The most expensive piece in a telecommunications grid is not the huge data pipes that make up national and international data networks, or even the incredibly specialized switching equipment that control the staggering amount of data which moves across these networks, or even the back-office billing systems that somehow tally the charges for all this data. In each of these cases, economies of scale bring the per-user price of these components into a manageable cost structure. No, the most expensive and problematic aspect of the whole system is what is called the ‘Last Mile’, that critical connection from the telecom switch to the consumer’s home or place of business. This is pure infrastructure, and involves digging up streets, running wires, and working with customers one at a time. There are no economies of scale and every possible avenue to more efficiently make this connection has been explored. My favorite examples of the innovative solutions developed to solve the ‘Last Mile’ problem when I worked in the telecom field were: first, a company in the 90’s that built robots which would crawl through sewer lines to run telecom cable to access points within a customer premise and another which used free-space optical beams to send information to receivers on customer rooftops, a solution with obvious challenges during periods when the environment is not cooperative, for example, rain and fog!
So how does this ‘Last Mile’ problem in telecom relate to Business Intelligence? I would argue the same connection problem exists between the BI solution (including the data warehouse, the BI infrastructure and presentation tools, the cubes and reports) and the end user. It has struck me that there are so many amazing BI solutions out there that provide so many potentially game-changing capabilities for their users but which still, well, fail. They fail to make an impact, fail in their adoption by business users, and fail to meet the rosy expectations of their institutional sponsors. In my experience, the most common reason for this is the inability to effectively bridge the ‘Last Mile’ of BI solution delivery, that is, to make the connection from the infrastructure of the solution to its end users. Some of these solutions have amazing capabilities, just like those telecom networks, but are worthless if they cannot get their content into the hands of their end users for those users to make it part of the way they do business.
So, having drawn this parallel, what insights to achieve success can be gained for BI solutions by looking at it in this light?
I think the fundamental one is to emphasize the importance of considering the BI ‘Last Mile’ into the overall design of the BI solution. It is always tempting to adopt the adage from the movie ‘Field of Dreams’ – “If you build it, they will come!”. But experience shows that they may not come, no matter how wonderfully it is built. The users of a BI implementation and what they are capable of must be considered from the outset. A realistic assessment must be made of what they will need to be able to adopt the capabilities being rolled out.
Secondly, solutions to both of these problems are difficult to scale. The analogy to the telecom guy climbing a pole or digging a ditch is the one-on-one communication that has to take place to get BI users on board and invested in the solution. How do you bring along novice users to step up to what can be a daunting new challenge? How do you convince reporting users who have adapted to existing, less-capable but known solutions that they should extend the effort to learn a new system? How do you show everyone involved that the BI solution will help them, and is not just something imposed on them from upper management? This involves careful design of the user-facing artifacts of the BI system, but also careful documentation and training. When it comes right down to it, you really have to sell the solution to the user community. In our consulting practice, we have found that one of the best ways to engage and motivate new users of a system is to take reporting problems that they struggle with and solve them as sample problems in a training session. This obviously requires an individualized approach to training, tailored not just for a specific customer but for a particular set of users within an organization. But the enthusiasm that this engenders and the system buy-in that comes from the demonstration of the system’s capabilities in a well-understood domain makes it worth the effort. Even when solving these problems requires advanced skills with the tool, skills users might not totally understand, it is a concrete demonstration that the time and effort needed to learn the system will have a meaningful payback.
And the final insight is the stark reality that having a flawed plan or no plan at all is going to be fatal to the success of the whole system, no matter how remarkable the technical solution or infrastructure underlying it may be.
The ‘Last Mile’ problem applied to the BI world is more insidious and more likely to be overlooked than the physical ‘Last Mile’ problem in the telecom world. With all its thorniness, it is starkly obvious that some solution is necessary to bridge the physical gap from switch to home or office. However, it is far easier to delude oneself that the BI baseball field needs only to be constructed in an Iowa cornfield and that users will emerge like ghostly Chicago White Sox and start running down fly balls, or rather discovering business insights from the BI solution. With all due respect to Kevin Costner, that just isn’t likely to happen.
June 25, 2012 | Leave a Comment
And, no, I don’t mean playing the lottery numbers! We’ve posted a couple of articles in the past about What Makes a Good Measure (Jan 3, 2012) and Telling Stories with Data (Jan 21, 2011). These posts discuss the challenges of working with numbers in the analytics world. I also emphasize with clients the importance of expressing the context of a report or data that people are seeing. What does it include? And sometimes much more importantly, what does it NOT include? What filters are applied? What are the values of those filters? What time frame does the data represent? Can users see this context when reviewing a report or is it likely they could apply their own assumptions about what data means?
A perfect example of this context issue came while reading an excellent commentary in today’s US edition of the Financial Times. (You can access it via this link, but have to sign up for a free account to read the full text.) Let me summarize the gist of Steven Hill’s argument about inappropriate and misleading use of numbers.
The conventional thinking, particularly purveyed by the media and governments themselves, is that youth unemployment in Europe is at crisis levels. There are demoralizing rates of nearly 50% in Spain and Greece, and over 20% in the rest of the Eurozone. But is this really the case? He argues that a very flawed methodology is used to calculate those rates. Namely, they do not include those youths who are in school or job training and not looking for a job anyway. The denominator is a much smaller number of individuals, and therefore drastically overstates the unemployment rate. So, the “unemployment rate” conveyed most often to the public does not tell the whole story.
He suggests a better measure might be an overall ratio of the entire youth population 24 years old and younger (regardless of their intent to seek employment or schooling) to the youth who are actively seeking a job and can’t find one. Using that measure, Spain’s youth unemployment is only 19% (vs 48.9%) and Greece’s is only 13% (vs 49.3%). In the Eurozone as a whole it would only be 8.7%. That is a dramatic difference in the two measures. It tells a different story. In the adult unemployment rate, there is a reverse problem by excluding those who have given up looking for work, the rate is commonly understated.
As I read the article, I was thinking you could create a similar ratio of students in school or job training for comparison purposes that shares the same denominator and thus has a common foundation. Further, you could list the ratio of those not in the labor market. This is a much better way of expressing data on common grounds with common definitions and therefore make better informed policy decisions. Yet, traditionally, the unemployment “rate” has always excluded those in school and not looking for work, therefore providing frightening unemployment picture to those who wish to use it that way.
Now take for instance the challenge of communicating graduation and completion rates for students. Completion is a hot button issue across all segments of higher education. The IPEDS numbers are famously unreliable since by definition the cohort only includes students who were first time to any college. Is that really helpful? In community colleges it leaves a large population out of the denominator since many students have attended other institutions before. Furthermore, for any institution, do they measure if the student completed somewhere else after they transferred? At least the Department of Education recognizes these issues and has formed a working group to address the limitations of current measure definitions.
What similar scenarios might you have in your institutional measurement structures? Are your rates and ratios on a common foundation? Do people know the whole context of the ratios they are seeing? What might need to change to address the shortcomings of what is communicated? Discuss!
June 14, 2012 | Leave a Comment
Many institutions struggle with their reporting and analytics deployment. They face a dilemma about how to roll out self service to users and still meet complex reporting requirements. These requirements appear to need the “high cost and high touch” of IT support for those users to be successful.
I hear frequently from BI project leaders that they’ve tried to give users the ability to create their own reports, but most can’t figure out how to do it with the tools and training provided. The problem is, an assumption has been made (often perpetuated by the marketing and sales messages of BI and ERP vendor’s themselves) that their drag and drop reporting and available templates are easy for anyone to use.
Let’s break apart that conventional wisdom, however, and dig a little deeper into this paradox. True enough, conceptually, many of these environments such as SAP Business Objects WebIntelligence, which is the core technology in Datatel’s (now Ellucian) Reporting and Operating Analytics (DROA) solution, are designed for casual users and ease of creating reports with advanced interactivity. The features of Cognos used with the Banner Enterprise Data Warehouse (EDW) are similar and you’re faced with a similar dilemma. Pick your favorite BI tool to use with the complex data of a mature ERP system: Tableau, SAS, even Excel. You name it, soon you’ll come to a brick wall.
The problem is, many reports users need are quite complex and don’t fit neatly into the typical approach of trying to drag and drop all the necessary data and filters into a single query. Yet, that is the way that most projects approach the training and roll out. Everyone is lulled into the “it should be easy and all the data is here in one place to query” effect!
Take this real world example: the Graduate Studies division needs a report at the end of each term to determine those who are not eligible to continue. They need a list of students who are actively enrolled in the Graduate level, in certain programs, have more than one C grade in a 500 level course in the term, and alongside this info list their General Academic Advisor and not their program advisor.
There are at least 4 areas of data needed, namely, the student, registrations, enrolled program, and assigned advisors. Worse, many of the filters and conditions only apply to certain pieces of the data. One might think since they are all tied together (and they are, albeit loosely) by the student ID, shouldn’t all this data be able to be dragged at once into the query? Magically there should be a result. There will be one, but not at all what is expected. Suddenly, IT is needed to help build this report!
Instead of trying to do this all in one query, it is much easier to break it apart into four distinct pieces. In WebIntelligence, each piece can be defined in query and tied together with “advanced” query techniques. This approach actually expresses the reporting requirements more naturally. One is the list of student in a program info. Another is the count of C grades but only for those in the query result of students in the graduate programs for that term. WebIntelligence allows you to do this type of filter from the results of another query quite easily. There is also sub-query and combined queries capability. Any report can combine data from more than one query and even more than one data source. These are very “complex” and powerful features, but actually make the reporting problem simpler because it is breaking it down into smaller, manageable pieces.
In fact, the seemingly more complex approach and teaching users how to do it actually makes them more likely to create the desired reports successfully and correctly. Part of the reason for this is that users think of their requirements in small chunks. Sometimes, they even forget to define very important little chunks and a query may run on the whole database! Any time you break a data problem down into smaller pieces it is easier to define and verify results. The query statements that can be better expressed as subsets of data and filters that are linked together by some common identifier such as the Student ID or a term ID. For Datatel users familiar with UniQuery or QueryBuilder used with the UniData database, this is fundamentally the same concept as savedlists and using the results of one query to select for another.
Take your pick of example. Maybe you need only those students with a GPA of 3.0 or higher, or 20 credits after 18 months, or have a Pell FA award. The list goes on. This type of query problem is also not unique to higher education. In any case, don’t hesitate to introduce users to advanced query concepts. This will make them more self reliant and can reduce support requirements.
May 29, 2012 | Leave a Comment
There has been a revolution going on in the Business Intelligence (BI) world in recent years. Those who follow the trends in BI and data warehousing are probably aware of the growing interest in a wave of database systems expressly developed to analyze the unbelievably huge data stores created by the maturing internet juggernauts. Companies such as Google, Facebook, Amazon, and Yahoo now want to analyze literally hundreds and thousands of terabytes of data that they find are essential to their business. Welcome to the brave new world of “Big Data”. Technologies such as NoSQL database systems and MapReduce algorithms, and products such as Hadoop, Hive, and Pig seem to be becoming more and more mainstream, and consequently more and more the topic of discussion on blogs and at conferences.
The question is, how much of this really pertains to the world of higher education management systems, i.e. the institutions that run SunGard HE, Datatel (both now Ellucian), Jenzabar, Campus Management, and PeopleSoft systems? Aren’t they also struggling to make sense of copious amounts of data? As someone who has worked with BI and reporting in this space for most of the last decade, I find the focus on the “Big Data” solutions a bit frustrating, because I see these tools addressing a different problem than that faced in the higher education BI world. This may seem a little counter-intuitive, as there certainly is more data than ever involved in running our campuses and institutional systems. Wouldn’t tools focused on “Big Data” help us too?
To illustrate, if you think of data as a swimming pool, the typical “Big Data” applications work with swimming pools that are very, very deep, and contain a whole lot of water. The ability to pump lots and lots of water volume is what the job is all about. On the other hand, I see our data in the higher education management space as being a swimming pool that is not very deep, comparatively, but which has an incredibly broad surface area. The overall volume water is not comparable to those “Big Data” swimming pools, but the surface area may be much greater and the structure and interrelationships of the different parts of the “swimming pool” are very complex.
Typically, an institution is not dealing with mammoth volumes of administrative data (unless it is really big school doing clickstream analysis on its websites and learning management system, perhaps). The total number of customers at our enterprises (our students) and the number of items they typically buy (classes, housing, meal plans) are relatively modest, again compared to the Amazons of the world. However, the variety of types of data we deal with is huge, and ranges from housing preferences to complicated faculty contract tenure payments to accounts payable records to course prerequisite and degree requirement rules, etc. The list of business transactions that occur in the management of an institution is incredibly diverse and complex. It is a wide swimming pool of data with a huge surface area, though as I said, perhaps not that deep at any point.
As someone working in the higher education reporting world, I am looking for support not for “Big Data”, but for what I think of as this “Wide Data” paradigm. Rather than tools that support incredible throughput on massive data sets, this implies a need for tools that help in the analysis of complex data sets. In particular, these tools should make us more nimble in quickly modeling and integrating new data sources into BI and data warehousing environments. This data must be readily available for our reporting and analytic delivery to our end-users.
There is another trend in the BI world which may prove much more fruitful for our future endeavors, in my view. This would be the emergence of in-memory databases such as SAP’s Hana or even Microsoft’s PowerPivot and BI Semantic Model that essentially are making the whole idea of pre-aggregated measures a thing of the past. But more on that in a future post.
April 18, 2012 | Leave a Comment
Potomac, MD April 18, 2012 – ASR is pleased to announce the addition of a new Managing Consultant, John Marsh. He was formerly the Lead Software Designer/Developer for Business Intelligence and Reporting Solutions at Datatel (now ellucianTM.) In his role at Datatel he created advanced data model, reporting, and analytic designs to support the needs of a diverse client base of nearly 800 colleges and universities across North America. Prior to his work at Datatel he worked at Qwest where he was heavily involved in data warehousing design and implementation.
At ASR John has joined the Higher Education practice to implement a variety of Business Intelligence (BI) solutions for our growing client base. Some of the projects currently underway include implementation and knowledge transfer of the popular SAP Business Objects Enterprise, SAS Enterprise BI, and Microsoft BI platforms.
Beyond the technology itself, many institutions are struggling with getting data across their numerous systems organized, defined, and delivered to internal and external constituents. Our “analytic accelerators” leverage the higher education expertise, data and systems knowledge along with predefined template designs for student enrollment and retention analysis, student outcomes analysis with course completion and success. Coupled with recommendations for Data Governance processes, this approach covers three key success factors — people, process, and technology.
One of the more interesting projects currently underway begins to address the void of information about a student once they leave an institution. Student “swirl” as it is called, presents even greater challenges for institutions trying to understand the ultimate success of a student in achieving their educational goal. This is particularly true for community colleges where it is common for a student to come for a couple semesters or years and transfer to a four year university. Or, they may have regular stop outs and take courses at other nearby institutions as convenience and demands of life dictate.
By combining enrollment data with National Student Clearinghouse (NSC) enrollment and degree verification data, an institution can get a more comprehensive picture of the long term outcomes for their students after they leave. The results of this project will provide a whole new level of measurement of long term persistence and graduation rates while giving institutional leaders a broad range of student dimensional categories to help analyze their own student success and intervention initiatives.
April 17, 2012 | Leave a Comment
I am at the Higher Ed Data Warehouse conference (HEDW) sitting in a session describing the procees of selecting a BI platform where cost was one of the primary considerations and amazed at the continued myth that open source is free.
Free as in “free puppies”, not “free beer”.
This is one of the best analogies I ever heard and I attribute to Linda Hilton from the Vermont State Colleges.
Apparently the institution presenting this session was surprised at the actual cost to pay for support, maintenance, and training. If you are in the process of selecting a BI technology, be sure you research the full cost picture as well as the functionality requirements.
Open Source may be a good option, but there are many commercial options such as Micro Strategy which give their full functionality for a given number of licences before having to pay license fees. SAP Business Objects online has a very low per user per month pricing where you don’t have worry about the hardware infrastructure. There are many possibilities to consider.
March 5, 2012 | Leave a Comment
I ran across an interesting issue the other day while developing a new report for a client from their data warehouse. As I was going through the validation process, I discovered that there were two pieces of data in some of the records that didn’t agree, yet by definition (on the surface at least) they should have.
The specific example was a college student with a term registration status of “first time student” in the “2009 Fall” term (as opposed to “returning” which is assigned to subsequent registration terms), yet the student had an entry term cohort value of “2001 Fall”. The entry term cohort value didn’t match the term value selected in my query of “first time” students registered in “2009 Fall”.
How could this happen? How could the data ended up this way in the data warehouse? A quick data validation query showed a clear discrepancy on about 10% of records where the student had a “first time” term registration status, but that registration term was not their entry cohort term.
Further research and discussion with the client revealed that their business process dictates that after certain periods of absence the student is required to reapply and therefore be treated as a “first time student” again, even though the student’s original entry term cohort value is not change manually or automatically. So what should be used to query for those entering for the first time? The Registration Status? or the Entry Cohort? Each returns slightly different values. The question is now before the data governance team as to what to do with the “Entry Cohort” values in this case and what the impact is of making a business rule change given the apparent conflict of definition between these two data elements and the different purposes they serve, often for different business units at the college.
Ultimately, it raises the question about what to do about the data in the data warehouse. Should it be changed? Or should it be left as is? It is an interesting dilemma that could apply to even simpler scenarios where data may have been entered incorrectly like the student ethnicity and a week later it is corrected. The data warehouse will capture these data changes and show the student with the “incorrect” ethnicity for that period of time. If a report showing the breakdown of students by ethnicity is requested for that very time period, the report would be “incorrect”.
This caused me to do some thinking and searching for other people’s wisdom on this topic. It seems there is no clear consensus on the best approach with pros and cons on both sides. In the first example I came across, there is a way this type of discrepancy can be captured in data quality exception reports during ETL and either fixed to match an agreed upon business rule or left to the business users to review and implement manually in the source transaction system to flow into the warehouse during the next load. The second example however has no automated way of correcting since there is no way for the system to truly know what the correct data value should be.
The decision to correct data anomalies of this nature depends somewhat on the business requirements and the general philosophy on data quality. There is a distinct argument for saying “leave it as is since the warehouse is meant to represent the historical state of transaction data and the errors will not create a variance to change decisions made by analysis of that data”. Others argue that data warehouses are meant to provide as clean an inputs to business decisions and should be cleaned when known to be wrong. This blog post has a good summary of the perspectives with thoughts from some of the founders of data warehousing Ralph Kimball and Bill Inmon.
What do you think? I’d be interested in your comments and experiences.