# A Professional's Take on Reddit's Data Science Discussions
Written on
Chapter 1: Introduction to Data Science Insights
The notion that "Data Scientist is the most attractive profession of the 21st century" is a sentiment I recall from the Harvard Business Review. As a hopeful graduate, I chased the dream of a lucrative career in this field. Now, over ten years later, my perspective has matured, and I've gained valuable insights along the way.
Rather than solely sharing my own experiences, I decided to compile detailed responses to six of the most relevant discussions on the r/datascience subreddit. This platform serves as a gathering place for data professionals to engage in meaningful conversations about career-related questions. The selected posts encapsulate common themes I've encountered in my interactions with fellow data scientists.
You might wonder why my perspective matters. To provide some context, I've spent over a decade in data science, primarily within the financial services sector, progressing from a junior analyst to a senior manager. I have led initiatives, such as implementing an AI strategy for a bank and building data teams. Currently, I am the founder and Chief Data Scientist at Data-Centric Solutions, a consultancy I launched in early 2023. While my insights are not definitive, I hope they offer some valuable takeaways.
Now, let's delve into these discussions.
Section 1.1: The Right Programming Language for Data Science
One recurring question I encounter, especially from graduates and career changers, is about the programming languages essential for data scientists. However, I believe this question is misguided. A more effective inquiry would be, "What is the most suitable programming language for [specific task]?" This could range from data visualization to model deployment.
This approach makes more sense. As a data scientist, you'll likely tackle various tasks in any given project, each requiring its own preferred language or framework. Often, the best language is simply the one you are most proficient in, though project constraints can also dictate your choice. Early in my career, I had to develop optimizers using VBA, a challenging task I haven't faced since.
Currently, I primarily work in Python for machine learning while utilizing other languages as needed. Ultimately, a successful data scientist should embrace flexibility in their programming language choices. The more languages you are open to using, the greater your opportunities to contribute meaningfully to diverse projects.
Chapter 2: Understanding Data Scientist Roles
The radar chart depicting the roles of data scientists is theoretically accurate, but I’ve found that job titles often blur the lines between analyst and data scientist. In reality, job roles are rarely as neatly defined as we might wish.
Reflecting on my experience building a data science team at a major UK bank, I faced the expectation that our new hire would handle "end-to-end" data science tasks. This encompasses the entire model-building process from inception to production. However, after a year of waiting, business priorities shifted, and we found ourselves focusing on ad hoc analyses and dashboard creation instead.
This ambiguity highlights the chaotic nature of businesses. As a data scientist, it’s crucial to adapt and accept that your role may evolve based on organizational needs. Think of yourself as a professional capable of deriving value from data, rather than merely a machine learning specialist.
Section 2.1: The Challenge of Interview Processes
The data science interview process can often be distressing, characterized by unrealistic expectations and a lack of empathy from interviewers. I’ve experienced this firsthand as both a candidate and an interviewer.
One particularly grueling experience was interviewing for a VP of Data Science at a major US bank, which involved five rounds with eleven interviewers. In one instance, I was asked to create a machine learning model within a strict timeframe using unfamiliar data—an unrealistic expectation. After enduring hostility during a final interview, I received no feedback, only silence.
As an interviewer, I acknowledge my own shortcomings. The hiring process for data scientists can be fraught with inefficiencies, often leading to a frustrating candidate experience. To improve this, companies should eliminate take-home tasks that unfairly penalize candidates with time constraints. Instead, they should request a portfolio of previous work to assess candidates' abilities.
Chapter 3: The Data Science Hierarchy of Needs
The data science hierarchy of needs is essential knowledge for anyone working in this field. My own experience taught me this lesson the hard way when I was tasked with building a data science team from scratch, armed with a budget that seemed substantial at the time.
I quickly learned that our organization's data management was lacking, with no clear data dictionary or access to quality data. The ambitious goals I had set were quickly tempered by the reality of our infrastructure limitations.
As you explore job opportunities, take time to gauge a company’s data infrastructure. Understanding where the organization stands on the data science hierarchy can help you align your expectations with reality.
Conclusion: Reflections on Data Science Discussions
The r/datascience subreddit serves as a rich resource for insights, debates, and shared experiences within the data science community. My reflections on these discussions aim to provide fresh perspectives and foster further dialogue about the complexities and rewards of this dynamic field.
Feel free to connect with me on LinkedIn for more insights or schedule a complimentary consultation if you’re interested in integrating AI or data science into your business operations.