As data science becomes a critical value driver for organizations of all sizes, business leaders who depend on software development teams need to know how the two differ and how they should work together.
FREMONT, CA: Even though there are countless resemblances between data science and software development, they also possess three main differences―processes, tooling, and behavior. In practice, IT teams are usually responsible for enabling data science teams with tools and infrastructure. Because data science looks so similar to software development (they both involve writing code), many IT experts with the best intentions approach this problem with mistaken assumptions.
1. Process: Software engineering has established techniques for tracking progress through agile points and burndown charts. Thus, managers can foresee and organize the process by employing clearly defined metrics. Data science is somewhat different as research is more exploratory.
The second unique feature of data science is the notion of hit rate, which is the percentage of models being arranged and utilized by the business. Even when a model did not get used by the business, data science teams study from their mistakes and file insights in management systems.
The crucial third dissimilarity in the process is the level of integration with other parts of the enterprise. Engineering is usually capable of operating independently from other parts of the business. In comparison, a data science team is most efficient when it works closely with the business units that will use its models. Thus, the data science team requires organizing themselves effectively to enable seamless and frequent cross-organizational communication.
2. Tools and Infrastructure: There is a tremendous possibility of innovation in the data science ecosystem. Data scientists should be able to effortlessly test new techniques and packages, without IT blockages or risking destabilizing the systems. And they should not have to apply different environments while switching languages.
3. Behavior: With software, there is a concept of a prescribed functionality and a correct answer, which means it is possible to write tests that authenticate the intended behavior. This does not hold for data science work, because there is no “right” answer, only better or worse ones.
Software development and data science processes often overlap while serving as the “delivery vehicle” for many models. But the two disciplines, while dissimilar, should work alongside each other to eventually drive business value.