Data scientists are craftsmen who enjoy the recognition and autonomy that comes from having a little understood set of skills. They are typically focused on the specific tangible problem that has been put before them and often demonstrate a high degree of self-reliance, not asking for help and not welcoming guidance.
This artisanal approach can make sense when breaking new ground where there are no patterns to follow.
However, it is the wrong approach when you want to maximize the business value you generate from your data by broadening through your organization the use of established machine learning techniques. In which case the focus should be on efficiency and repeatability. One might think of this as a more industrial approach.
Here are three examples of steps on this path.
Make it a team sport through specialization
As we recently discussed the reasons that surgical teams specialize also apply to data science teams. Specialization enables stretching scare resources; professionalization and optimization of each role; and more consistent application of best practices.
Higher level tools to improve productivity and repeatability
It is surprising when you see a 25 year old as set in his ways as a 75 year old. Yet it is common to see twenty-something data scientists cling tightly to doing their work the same way today as they did at university: building from scratch in python; pulling data to them rather than pushing the analytics to the data; not investing to make their model reusable by others in the future; etc.
The state of the art is improving rapidly: GUI tools for data cleaning; tools to manage shared datasets; tools to package and reuse exploratory models; etc. etc. It is important to have a team culture that embraces these improvements.
Distinguish POC and production
It is appropriate for a machine learning project to initially be closer to a proof-of-concept than a real production project. It is only after you build the model and analyze the results that you can really know what business value you might get from a machine learning initiative.
However, too often this POC is treated as an end point when it is really just the “end of the beginning”. Recognizing that you can produce meaningful predictions from a given data set generates a whole new set of questions:
- How will we address privacy and security concerns?
- How will we provide explainability to the stakeholders?
- How will we achieve needed performance and scale in production?
- How will we respond to changes in the noise we are seeing?
- What monitoring and alerting systems are required?
- Shall there be bounds on what automated actions will be taken based on these predictions?
- How will the model be retrained ongoing?
- etc. etc.
The above are some of the steps we will need to go through to move from our historical mindset of artisanal craftsmanship and shift to a culture of repeatable and scalable systems that deliver sustainable business value.