Random forests, grey fog and learning by doing

As I reflect on a wonderful afternoon of presentations from the 3rd year of the MIT A-Lab analytics seminar, I thought I’d chronicle this fun and exciting interaction with MIT post grad students, professors and client data.

For the last 3 years professors Erik Brynjolfsson & Sinan Aral have invited selected organizations to propose projects for teams of MIT post graduate students taking the A-Lab “Action learning seminar on analytics, machine learning and the digital economy”.

The objective of the seminar is for student teams to “design and deliver a project using analytics, machine learning, large data sets, and other digital innovations to address a business or organizational opportunity or issue.” The involvement of companies contributing real data-sets and real data challenges helps link the theory to practice.

”Data analytics and data science is not something you learn well from text books” -Prof. Sinan Aral

The process for submission of a project is rigorous … MIT are are looking for projects across a variety of industries and sectors to address a wide range of types of problems. Availability and access to clean and rich data sets is a primary factor in having a submission accepted.

MIT’s RFP gives some guidance …

“The strongest projects often have descriptive, predictive, and occasionally causal components in the analysis. Well-defined projects with clear questions are typically successful, but the expectation is that students will explore the provided dataset to come up with new insights.”

Over the summer we engage in a collaborative and iterative process with the MIT faculty to select the right project that addresses each of these requirements and components. Many project submissions are discarded and the ones that make it through often look very different from the first draft! Of the hundreds of submissions, only 25 are selected for “Match Day”.

Match day is an action packed, fascinating afternoon of listening, learning, pitching, and as a presenter thinking, “wow, I’m going to have to really turn it on to impress”. As a presenter, I have 4 minutes – a strictly adhered to 4 minutes to set the background, context and case for my project. As presenter slot 21 of 25, the pressure mounts, it seems that every other project and presenter is able to articulate a great case for their project and has outdone the previous one. The ace up my sleeve … or last desperate chance to win over these post grad students …. and the final of my visuals… a photo with the winning team mentored by us last year!

At the end of the marathon of 4 minute elevator pitches, presenters and students get to mingle as students ask questions and seek out details to understand more about the projects on offer. They then have a week to create their teams and select their project. There are more projects than teams, not all the submissions making it to match day find a student team … I feel like I am the student, waiting to see what my grade is and whether i make the cut!

“Projects start with good data and good sponsors.” -Prof. Erik Brynjolfsson

A week goes by, and great news! A group of 3 students have picked our project! Yay! I set up Slack for collaboration and a weekly conference call, I get my data extraction team working on pulling 3 years of data and making it available to our students. I give them some background info and context of the industry and then let them loose! Mindful that they have just three months to complete the project and this is just one of five courses they are taking, we work to get as clean a dataset as possible. It sounds obvious and easy, but data is somehow never that clean, we find some strange things and many “rogue” delimiters.

Lesson #1, getting data is always more complicated than it sounds! Text books and algorithms always assume nice clean ready to use data.

Our project for this year is focused on the healthcare eco-system and the intersection of medical providers and suppliers.

A critical and often overlooked aspect of the healthcare eco-system is that of medical supplies. Think gauze pads, Band-Aids, surgical kits, tongue depressors, etc., -consumable supplies that are a critical part of treating patients through the continuum of care. In particular, usage patterns of consumables that are dependent on a particular condition. The often used example is that of gauze pads for burn victims. The mathematical average usage consumption may work out to be 5 pads per day, but the actual consumption pattern is 10 pads every 2 days. Scale this to multiple burn victims in a seemingly random fire and very quickly an easily a treatable condition becomes a life threatening event due to the lack of a simple medical supply. Extend this to supplies of a simple tongue depressor used to help a pediatrician tell a child to say “ahhh” during an unexpected flu epidemic and we realize that the ability to forecast and understand usage patterns of mundane medical supplies plays a critical role for the health consumer. Balance that with the economic reality of cost reduction and value based payments; over-supply is not a solution.

The impacts are most pronounced in the non-acute space. With the push to treat inpatients and outpatients in a non-acute environment, away from the traditional hospital setting, non-acute facilities do not have large storage facilities at their disposal. Supplies are often ordered and delivered on a weekly or twice weekly basis. The cost efficiency in not carrying large amounts of supply inventory are countered by the risk of not having life supporting items. An improvement in predicting usage patterns for individual medical supplies are critical. Additionally the ability to predict supply usage at individual facilities with low units of measure is a science that is yet to be deployed.

Ultimately the industry needs to go from a supply side model to consumption based model.

Lesson #2, understanding data is part art, part science. Lots of “what” and “why” questions. The end result of analytics is generally a nice looking chart, some impressive looking numbers, and a major insightful data inspired and supported breakthrough. But there is a whole process behind that requires working in the weeds and details of text, words, numbers and uncooperative methods of data transfer.

The A-Lab is an advanced course for MIT graduate students, admission to it by those with academic and work experience in data science, analytics, statistics, and information technology – and it is a core requirement for all students in the new MIT Masters in Business Analytics (MBAn) program. Our team is awesome. Each week they ask some really, really good questions about the data and the context, and provide us with charts and examples of the techniques they are using as they search out meaning in the dataset. This is the art of the science of data analytics, quizzing the meaning behind the collection of data points, and wondering “what if we did this”.

We come to our last scheduled call before “presentation day” … our student team had a late night, stepping back from the data, charts, algorithms, analysis and techniques used and putting it back into the broader business context. MIT’s mentors are on hand, helping with the story-line and talking points in what will be an 8 minute presentation.

“As a mentor, I was impressed with the team’s ability to dig in to the data early and explore alternative models …. in particular, they showed endurance and persistence in exploring the meaning of dead-ends among the many analytic paths they explored” – Cyrus Gibson, Center for Information Systems Research

The presentation looks great, the insight into the data, the conclusions and even recommendations are pertinent, challenging and provocative. If this is an indication of the next generation of thinkers and leaders, and their ability to use data to drive business, we are in for some exciting times. I want to work with these guys! Now! I draw on my in-the-field earned wisdom to make a couple of suggestions, but it’s just window dressing in the story telling. Data tells you a lot, but an audience remembers a story, so tie the data and the facts back to a story-line or image that people connect with. My 2 cents of contribution!

I’m thinking to myself, there is an entire generation of management that needs to get ready for early retirement to make way for these guys coming in. I take a sip from my secret fountain of youth so i can stay in the game.

Presentation day

I’m excited not only for our team, but for all the teams. On the flight to Boston I cast my mind back to “match day” and all the great companies, projects and presentations. What great things will I learn and see today? I let my mind fool myself into thinking that being on the hallowed ground at MIT and around a bunch of really smart people that perhaps some of it will rub off on me. In my dreams!

“I’m blown away by quality of projects and the improvement in the 3 years since we started” -Prof. Erik Brynjolfsson

Our entry and student team from last year, 2015, actually won “the prize” for best project. They created a risk model to predict 30-day hospital readmission risks for a major healthcare system by advanced statistical modeling. It’s not really a competition, but any playoff situation ends having a prize at the end. Bragging rights, but not really, we are all winners for going through this process and understanding how challenging assumptions and using data in new and wonderful ways can lead to new business models, customer experiences and the way we engage with our workforce, suppliers, partners, and others around us.

“It’s important to consider the business impact of data science. It’s not just about running regression and random forest. You can’t be caught in the depths of techniques.” -Prof. Sinan Aral

My team steps up for their presentation. I have not had a chance to speak with them since I arrived … they’ve been off attending other classes, a regular day … they arrive shortly before their time slot, presentation #11 for the day. They open strongly, building the context and the business problem, then dive into the technical aspects and techniques of the analytics. Then, a surprise! A new slide has been added at the last minute, they glance over at me with a big grin. I learn later they applied for, and we’re granted a special extension from the 10am submission cut-off to incorporate an important new finding. Evidently Derek, one of our students had an epiphany late Saturday night, actually 2am makes it Sunday; “what if I run the data through a 2 step algorithm, one that feeds the other”. 9 hours later … yes, 9 hours … this analysis requires some serious computational power and a regular laptop is not the idea tool. But after 9 hours, the results from this technique produce a much better predictive model! Woohoo! The team’s ability to communicate the complexities of what they are doing and tie it to business outcomes is outstanding. “Think of it as simple deep learning” is greeted with both nods and laughter.

I want to thank everyone at MIT, Erik, Siral, Chuck, Susan, Christie; my student team, Christine, Derek, Michaela; my client, Tom; and my data extraction unit, Tina, Georgy and Joseph for making this such a rich, rewarding and fun endeavour.

As Prof. Aral put it, “learning by doing”.

Can’t wait for next year!

Leave a Reply

Your email address will not be published.