How To Build A Data Science Dream Team

0
69


By Jared Dame, Global AI and Edge Director at HP Inc and Jim Duarte, Principal at LJ Duarte & Associates LLC.

Mark Twain once said, “Data is like garbage. You’d better know what you are going to do with it before you collect it.” This gives data science teams food for thought.

Abstract Digital network communicationGetty

The go-to method in data collection for many teams has been to “collect it all” and sort it out later, though this data strategy brings up several issues for managing, qualifying, and processing data later on.

The solution? Strong collaboration between data scientists and subject-matter experts (SMEs) on the data is essential for building an infrastructure of capturing data for rapid, accurate decisions. But where to start? First, it’s important to understand the role of a data scientist in order to determine the best way to build your data science “dream team.”

What does a data scientist do?

There is much discussion, and often confusion, around the term “data scientist.” In short, the definition of data science is the process of asking questions and getting answers from data. By defining the different roles of data scientists and breaking them into four distinct categories, it may better clarify the different uses of the term data scientist, each with its own focus.

The first category of data scientist; which in this article will be referred to as  Data Scientist 1 (DS1), is going to have the responsibility to create the data strategy and overall technological requirements surrounding how the data will be collected, stored, formatted, and accessed throughout the life cycle of rapid insight gathering. Additionally, this type of data scientist will be leaned upon to develop AI and other coding mechanisms that enable the other groups to gain the ability to ask, and have answered, their questions from the data. Another key element is making sure that the users have ‘good data’. Good data is data that is clean of errors and difficult formatting.  

The DS1 has a critical, and often difficult, role engaging SMEs that have vital knowledge about how processes function and are measured. Being able to understand the needs of SMEs may be daunting when being asked to assist with creating meaningful algorithms. Getting the right data in the right format to the right people is the basis for creating a top-notch organization. Additionally, the DS1 plays a significant role in the technical aspects of making the data rapidly available since the volumes and velocity can make analysis of the data an overwhelming task for the data scientists to be described later.  

The second category; i.e., Data Scientist 2 (DS2) delves into the types of data with SMEs and their needs to perform advanced analytics. They are supported by statisticians and a new breed of individuals who graduate with a master’s degree in analytics. The latter is focused on analyzing data and less on the underlying mathematical theory. Both disciplines provide significant support to SMEs to perform advanced analytics. The DS2 must also recognize that a major consideration for integrating analytics relates to how it plays out with the 4th generation of the Industrial Revolution. Terms like Industry 4.0, Manufacturing 4.0, Smart Manufacturing, and Quality 4.0 are being used, and sometimes misused, more often. One must consider how data creation, collection and formatting changed in this environment.

Defining the most relevant and useful data to be created and collected is valuable. Taking time to brainstorm what key questions need to be answered with data can also add value to the discussion around data sources and what data to collect.

The next exercise would be to measure the volume of data that is desired and the velocity at which it is being created. Taking time to work through measuring the impact of the volume and velocity of data available will oftentimes educate both the DS1 and DS2 in ways that optimize the tasks of each.

A conversation about the analysis methods that will turn data into meaningful information for decision-making will uncover the technical nature of advanced analytics as well as the formatting of data to accomplish the task. At this point DS1 and DS2 can bring in the SMEs for meaningful discussion. From questions that surfaced during previous conversations on what data is needed, the DS2 can start to build an analytics strategy. That strategy will start by separating simple data rummaging with Business Intelligence (BI) software from more advanced methods like machine learning and other more sophisticated analysis methods.

At this point the DS1 and DS2 may collaborate on the purchase of analytical software. There are several things to consider when acquiring analytical software. Some of the general themes that should be considered when purchasing from a software vendor are: 1) avoid software that is ‘overkill’. Software companies may ‘bundle’ many features into a package that includes analytical procedures and features that may never be used. 2) ensure that the software is not overcomplicated. Software should install and configure without significant time and resources. If the software is difficult to install and configure, it may also be difficult to maintain. 3) watch the price. This sounds obvious, but there are still things to consider when the vendor starts charging for ‘consulting’ to install and configure their system.

The role of the Data Scientist 3 (DS3) falls to the SME in various disciplines. With the DS1 and DS2 getting the data properly prepared and accessible the DS3 has the ability to look at the data through the perspective of their particular discipline. Taking time to rummage through data by SMEs using grouping, drilling, and trending techniques helps everyone to better understand the importance of their role in providing a beneficial and consistent decision-making process and to gain great insights into their data.

Finally, the DS4 is the gatekeeper of the overall data management, analysis and reporting process. Their role is to optimize the infrastructure to get the best and most information available to key decision makers in the organization. They have ready access to the DS1 and DS2 to get answers to questions that have not been previously addressed. Their role is to ensure that data and analyses are “at the ready” for key decision makers. This may include verifying the accuracy of standard reports as well as handling, and possibly anticipating, “the next question” and being ready to go find the answer.

All four types of data scientists are necessary for an organization to best take advantage of data for good, rapid and consistent decision-making. Management support for this structure provides a valuable resource for an organization to run efficiently.

Giving Data Scientists the Tools to Succeed

Keeping these guidelines in mind will help you assemble a team of data scientists to take on your biggest data challenges. And just as racing teams pair highly skilled drivers with the latest technological innovations to achieve increasingly faster results, you should empower data scientists with tools that will help them do their groundbreaking work faster.

One such tool is RAPIDS, a set of open source machine learning libraries that leverage GPUs for incredible performance. Originated by NVIDIA, RAPIDS makes the power of data science accessible to every member of your team, with innovative technology that simplifies and speeds their efforts. Powered by NVIDIA CUDA, RAPIDS enables faster data preparation, model training and data visualization, for an accelerated workflow that can be deployed on a portfolio of NVIDIA GPU system options such as those from HP. Now businesses can benefit from improved data scientist productivity, achieving breakthroughs in model accuracy, while driving down infrastructure TCO.

To learn more about doing state-of-the-art data science with HP systems, attend NVIDIA GTC 2019 and see how HP is making data science accessible in Booth 1339.

Resources:



Source

Business News

LEAVE A REPLY

Please enter your comment!
Please enter your name here