A Peek into a Data Scientist's Daily Routine: More Than Just Code
Written on
A Day in the Life of a Data Scientist
Spoiler alert—my day doesn’t revolve around developing intricate machine learning models!
Recently, I’ve encountered many individuals eager to transition into the field of data science. The first question they typically pose is, "What does a standard day entail?" While numerous articles outline the skills and tools utilized by Data Scientists, there are fewer that illustrate the tangible daily activities involved.
Although each day presents new challenges, here’s a glimpse into what a typical day looks like for me as a Senior Data Scientist at a large financial institution.
Overview of My Day
- 8:30–9:00 — Kickoff
- 9:00–10:00 — Pair Programming
- 10:00–10:30 — Scrum Meeting
- 10:30–11:00 — Presentation Preparation
- 11:30–12:00 — One-on-One with Manager
- 12:00–1:00 — Feedback Session with Lead Data Scientist
- 1:00–4:30 — Coding Time
Starting My Day
I usually begin my workday around 8:30 AM, having gotten out of bed at 8:20. Working remotely since March 2020 has been transformative for me. I appreciate the tranquility of my home office, where I can work in comfort and manage personal tasks while running code.
The first item on my agenda is to check emails and Teams messages I may have overlooked the previous day. Each day, I receive an email detailing the status of one of my team’s production machine learning models. I verify that everything is functioning correctly and that no errors have occurred in the model or the associated data extraction processes.
If everything checks out, I respond to various messages and then log into Jira, our project management tool, to update the status of my ongoing tasks for the next three weeks—a timeframe referred to as a sprint in agile methodology. From there, I prioritize my responsibilities for the day.
Pair Programming
Over my five years as a Data Scientist, I’ve observed a notable change in my work dynamics. Initially, I was solely focused on my tasks, but now a significant portion of my time is devoted to mentoring and assisting less experienced colleagues.
Once a week, I engage in a pair programming session with a Junior Data Scientist on my team. In this agile practice, two developers collaborate on a task, either by sharing a computer or screen.
Initially, I was skeptical about pair programming. I thought it would be inefficient for two people to work on the same issue, and I feared I would appear incompetent if I struggled with coding. However, I soon realized that pair programming is an excellent opportunity to learn from one another. I consistently gain new insights from junior colleagues during these sessions.
In our 9:00–10:00 AM pairing, we tackled an issue I encountered while learning to use a new graph database tool. The Junior Data Scientist mentioned a useful plugin he had installed, which I struggled to access. Through collaboration, we discovered that my version had been installed without internet access, preventing the plugin's installation. With his guidance, I successfully installed it.
We then reviewed a script I had written for creating nodes and relationships in the graph, where he offered suggestions to improve readability and adhere to best practices.
Following that, we addressed a challenge he faced in crafting a SQL query to extract metrics for various business lines. By breaking the problem down into manageable components, we devised a solution together.
Scrum Meeting
At 10:00 AM, we transition to our team's daily scrum meeting.
Traditionally, the scrum master leads by asking three questions:
- What did you accomplish yesterday?
- What will you work on today?
- Are there any obstacles in your way?
I find minimal value in simple status updates, so I encourage my team to adopt a more interactive approach focused on learning. Rather than merely summarizing our work, we demonstrate it.
For instance, if I spent the previous day developing Python code to check if a date in my dataset was a holiday, I share my screen and walk the team through my code.
This method has numerous advantages. Often, team members propose better, faster, or more straightforward solutions. Catching issues early in a five-line code snippet is significantly easier than during a full code review. Additionally, I frequently discover that a teammate is tackling a similar problem, allowing us to save time by reusing code.
Presentation Preparation
After the scrum, I have about 30 minutes before my next meeting—typically insufficient for in-depth "data science" tasks like data cleaning or modeling. Instead, I use this time to respond to new emails or prepare for upcoming presentations.
A substantial part of my role involves creating presentations to clarify the nuances of data science. Many executives hear buzzwords like artificial intelligence and machine learning and respond with, "We should implement that!" However, the reality is that machine learning isn’t always the best solution; sometimes, straightforward reporting or basic automation effectively resolves most issues without overcomplicating matters.
When presenting results from initial models to stakeholders, I used to overload my slides with technical jargon in an effort to impress. This approach backfired, as my audience lacked the necessary mathematical background to grasp concepts like F1 scores. Over time, I learned to tailor my communication to my audience’s understanding, opting for terms like "accuracy" instead of technical metrics.
As Albert Einstein famously said, "If you can’t explain it simply, you don’t understand it well enough."
One-on-One with Manager
Following my presentation edits, I have a one-on-one meeting with my manager. These discussions are crucial for addressing career aspirations, recent achievements, and any challenges.
Today, I need to discuss difficulties I’ve faced while setting up a tool in our development environment. After navigating permissions with another team for weeks, we reached an impasse regarding the configuration of security settings. My manager, with his broader network, was able to suggest contacts who might assist.
Feedback from Lead Data Scientist
Next, I meet with the Lead Data Scientist to review a proof of concept I’ve been working on. For the past month, I’ve been exploring a dataset using the graph database tool.
As most of my previous experience was in natural language processing, I had to familiarize myself with graph databases. This project has been enjoyable, blending research and practical application.
I began by presenting a sample of the data and its representation in a network graph. We then examined some basic queries I had written and discussed built-in algorithms I attempted. The Lead Data Scientist suggested restructuring my graph to test additional algorithms, a perspective I hadn’t considered. This experience highlighted the value of feedback—sometimes, a fresh viewpoint is invaluable.
Coding Time!
To wrap up my day, I finally get to code! I shifted my focus from network graphs to a document classification project for another business area. My team functions like contractors, engaging in diverse projects across various areas. This variety keeps me stimulated; if I encounter frustration with one project, I can easily pivot to another.
For this current project, we lack labeled data and are experimenting with a technique called active learning, which enhances model performance with fewer labeled examples. We selected a sample of documents for internal annotators to label, and my task is to review their annotations for consistency.
Before reviewing, I consolidated data from five Excel files into a single dataframe, ensuring a column for each labeler’s results. This process took longer than anticipated.
I decided to employ a majority-rule method for determining agreement: if three or more labelers chose the same label, that would be the final decision for training our model. Some cases were straightforward, while others presented challenges where labelers disagreed. I documented my decisions and outlined tasks for the next day.
Conclusion
You might be surprised to learn that a significant portion of my day as a Data Scientist doesn’t involve coding. Even when I do code, it’s often focused on data cleaning and analysis rather than building complex machine learning models. When I first entered the field, I envisioned my days filled with algorithm creation, but the reality is that much of my time is spent preparing data and understanding the processes behind it.
Data Scientists are often perceived as having glamorous and thrilling careers, but the truth is far more nuanced. This isn’t necessarily negative; it’s simply essential for newcomers to understand what they are stepping into.
If you found this article insightful and are an aspiring Data Scientist eager to learn about the practical aspects of the field, consider joining my workshop, where I teach skills often overlooked in academic settings.
In this video titled "A Day in Life as a Google Data Scientist," viewers get an inside look at the daily tasks and responsibilities of a data scientist at Google.
The second video, "A Day In The Life Of A Data Scientist," provides further insights into the various roles and activities a data scientist undertakes daily.