Design SDS#
Introduction#
Human beings exist in a world of infinite resolution. For as long as humans have existed we have made approximations and abstractions of that world to facilitate the whole range of human endeavor. In the world of Data Science this process is what we call Design and we divide it into two major categories, Data Collection, and Data Communication. The goal today is the same as it was millenia ago, we are trying to move information from one person to another. Today in the 21st century we also have the added challenge of crossing the divide between the infinite world of concrete human experience and the finite realm of abstract digital analysis.
This chapter begins with a description of the fundamentals of Data Collection, the process of converting observations made in the human world into a series of 1s and 0s in the digital world. This process necessarily involves the loss of information. The savvy scientist must make critical decisions and compromises to ensure that the necessary information is retained. The second part of the chapter focuses on the return journey, what we call Data Communication, the conversion of the 1s and 0s back into a story that can be told in the human world. At this stage our savvy scientist is now making decisions to mitigate the consequences of the preceding compromises[1].
Put another way, a Data Scientist captures information, shepherds it into the digital realm, analyzes it to create new knowledge, and brings that knowledge back to share with the rest of humanity. They are in a unique position, having one foot in the human world and one foot in the digital world. When information crosses the bridge into the digital realm information is lost, context is stripped away. It is the role of the Data Scientist to provide that context on the return. This crucial step is the core of Design.
Our advice to the reader is to keep three themes in mind as they progress through this chapter. The first is the mindset of asking, “What should I leave in, and what should I leave out?” This question is fundamental to both pieces of Design and is always in the mind of a practicing Data Scientist even when idly taking a walk around town. The second is the framework of “Data vs. Interpretation”, the Data Scientist is always asking, “What is Data? … and what is Interpretation?”. This is not simply a binary, there are always compromises that bake in interpretation, the key is to always be aware of where that interpretation is leaking in. The final theme is “Storytelling”. Stories are a tool as old as humans and are in fact the only tool for communicating an idea from one human to another. From the very start to the very end of the pipeline a data scientist will have to tell a story to communicate their results. By keeping these themes in mind you will be able to tell great stories and change the world.