Several of my colleagues and I recently had an opportunity to visit the Microsoft campus in Redmond, WA to meet with their data scientists and SQL Server 2016/ SQL Server R Services experts. Our objective was to collaborate on the machine learning models we are developing for our Advanced Reporting for Credit Unions™ (ARCU) business intelligence solution.
While machine learning algorithms have been around for quite some time, our team is looking to productize and operationalize predictive models for 240+ Jack Henry/Symitar customers. Predictive analytics is an area of data science that is getting more and more attention.
Why? Companies have accumulated a breadth and depth of available data and they want to maximize their investment by generating predicted outcomes that will help them make better decisions and take faster action.
Supervised machine learning is the process of taking a set of input data and the known responses to the input data to train a model so that it will generate reasonable responses from a new dataset. While that is a simple definition, creating supervised machine learning models has not been simple nor is it magic.
First and foremost, the model predictions will only be as good as the available data and the careful selection of the inputs. Our collaboration with Microsoft has resulted in the creation of several predictive models we are now installing and testing with a small group of ARCU customers. We created these models following the steps outlined below:
- Define the question the model should answer.
- A question that poses how much or how many (regression) or which category (classification)?
- Identify the available data sources.
- Has relevant data been collected consistently and accurately over time?
- Explore the data.
- Which data variables (features) have a correlation to the predicted result?
- Determine which input variables should be included or excluded.
- Are the variables informative but not too informative and have the number of unrelated variables been limited?
- Build the model.
- Have one or more algorithms been created for the dataset where the prediction will be carried out?
- Test the model.
- Has the algorithm been run against the dataset which has been split into a training dataset (for the model to learn) and a test dataset (to test the learning).
- Validate the model.
- Has someone with a business data understanding validated the predicted results?
- Evaluate the model.
- Is there consensus that the model is “right fitted” for the data?
As noted in SQL Server as a Machine Learning Model Management System, “Currently, there is no standard method for comparing, sharing or viewing models created by other data scientists, which results in siloed analytics work. Without a way to view models created by others, data scientists leverage their own private library of machine learning algorithms and datasets for their use cases.”
With SQL Server 2016 for ML (machine learning) Model Management our team can automate predictive models for our customers. The iterative aspect of machine learning means that, as models are exposed to new data, they are able to adapt and learn to continue producing reliable results.
As I mentioned in a previous blog post (“D Is for Data Is for Decision” October 19, 2016) I am excited to help make predictive analytics a reality for our customers but as always my focus is “does the solution help our customers make better decisions and will those decisions lead to growth, increased revenue, and improved efficiency”?
My answer is – how could it not?