While the first generation of data scientists often came with Ph.Ds in quantitative fields, more and more people are aware of data science as a viable career path earlier in their careers. Data science is increasing in popularity as an undergraduate major, and there are amazing programs exposing high school students to all of the opportunities that data science skills can unlock.
When I was introduced to data science, I knew it was the field for me. Data science combines my interest in computer science and data engineering with the analytical tools to make that data actionable. While my studies are ongoing, I have had the rare opportunity to learn data science in the classroom while also being given the chance to apply my knowledge in the real world in the context of data science consulting.
As someone who’s studied data science in an academic context and worked in the field, I wanted to share my learnings about the gaps between data science as an academic subject and what it’s actually like in a professional setting.
- Real data is messy
Technical skills like building and manipulating queries are extremely transferable to the day to day role of a data professional. However, I quickly realized working in data science requires more than just technical skills. In the classroom, our data was always given to us and was generally high quality and comprehensive. In the real world, you can’t always expect your data to be readily available or usable.
My first step was to get familiar with the data warehousing tools, as well as all the different data applications that organizations use to run their businesses. There are cases where the data you need to answer a client’s request isn’t there (or at least not in a usable format) and it’s up to you to troubleshoot, and get creative about how you can still deliver value in the face of messy or incomplete data. Being able to bounce between the UI of a production application to the database to understand how data is flowing and where the gaps are is just as useful as knowing how to extract data.
2. Business Intelligence Tools
One of the most exciting things I’ve found from working in the real world is the plethora of BI tools. In the classroom, we mostly used Jupyter to display our visualizations or small scale platforms like Streamlit. Once you learn the basics of how business intelligence tools work, it’s relatively easy to learn the nuances and quirks of different tools. I’ve now built reporting in Tableau, Looker, and Sigma Computing, and while they all have slightly different approaches, you can build compelling data visualizations in each.
3. Defining Problems and Client Relationships
In the classroom, our problems and tasks are clear cut and well-defined. You’re almost always given a dataset and told what information to extract or analyze. In the real world, it’s never this simple. First, there is always the task of gathering the right data sources and setting up the necessary pipelines. The second, most challenging task is defining the problem you want to solve. Sometimes business stakeholders are very clear about the use cases they want to enable, but more often, they are looking for input on the kinds of questions they should be asking of their data. As data people, it’s our job to interpret open ended requests and guide our stakeholders to the underlying business problem. Defining problems and use cases is the most important skill of any data professional, and it’s something that’s very hard to learn outside of a business or real-world context.
Learning data science in the classroom is just the beginning of the journey. Technical skills and syntax will certainly prove themselves to be useful, so definitely keep working on your SQL and Python! That said, I’ve learned that being curious, creative, and rigorous about defining problems is the most critical (and fun!) part of becoming a data scientist.