This blog is dedicated mainly to show different aspects of my professional life. As perhaps you have notice visiting my CV (here) my career is oriented to data analysis and presentation of useful results to the final users suggesting the most valuable explanations for the data. Despite this could sound novel the truth is that I am the last generation of a long tradition which reached the maturity with the work of John Tukey. He defined data analysis as: "Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data."
The diagram shows the relevant position of the exploratory data analysis. Putting Tukey's words up to date we can say that this stage looks for the parameters that capture, with a certain level of confidence, the underlying distribution of the data and that once they are known, it will allow to generate a precise model. This model at the end of the process would be able to generate predictions and hence will allow to make better decisions (better in the sense of the user of the model). As you can see if the dive into the data fails, it will compromise the understanding of the reality and will generate inaccurate models that will leads to wrong decisions.
I intentionally remarked the role of the exploratory data analysis, and I will explain more in detail about it. But as a data scientist one should not loose grip with the ground and that is why I decided to mark in gold the decision stage. What I mean is that the golden goal of our analysis is to take decisions about the reality, sometimes to provide a deep understanding sometimes to provide characterization of a critical parameter in the productive model of a company, but in any case the analysis pursues to have an impact in the way we do things. And things should be accomplished usually as fast as possible then the exploratory data analysis could not stop for years until it finds the golden parameter that characterize a system. The way to proceed it is through iterations within the data analysis stage and iterations of the whole diagram. Some people, specially in academics, have a preference for do not jump into the model stage and take decisions until everything is polished in the analysis....nice ideal, but not worthy. Let's follow the Japanese tradition of kaizen (改善) and progress through small steps and many iterations.
New tools are exciting. But using software packages of the shelf, without understanding them fully, can lead to disaster.
As it has being published this week in science (issue 6324): New tools are exciting. But using software packages of the shelf, without understanding them fully, can lead to disaster. And this is were the data scientist work becomes valuable and critical. I will show you in further posts, how to dig in to the concepts and techniques of data analysis to provide confident solutions to the people that have to take decisions.
Stay tuned but now time to solve!