What’s in store for Data and Machine Learning in 2021?

Data Culture
3 min readJan 29, 2021


By Nicole Maffeo and Catherine Williams, Google AI Research team

Stealth-mode Startup, 2016. Left (Catherine Williams) | Right (Nicole Maffeo)

Disclaimer: The views and opinions expressed in this article are those of the authors, and do not necessarily reflect the official policy or position of their employer

What are your predictions for the field of data in the new year?

Data fairness, the democratization of AI, and increased investments in the integration of data management into ML-Ops will all likely be key in 2021.

Fairness is at the cornerstone of Responsible ML. Fair datasets make fair models. Algorithmic bias is one the biggest problems in making AI driven products, and it’s very hard to battle as an afterthought. This is why data teams should be ingrained early and meaningfully in the dataset creation process.

We are also seeing an explosion of tools in the ML-Ops world. Everyone is trying to make AI easier, more manageable, self served, many of them with open source solutions. Lowering the barrier to entry is crucial in democratizing AI. But as part of that, people need to look at the data side of things. How to effectively and efficiently store, manage, annotate and version your datasets. The tools and organizations that will turn data management into an integral part of their ML-ops solution will have a major advantage.

What are the next big hurdles — trends, issues etc?

There is a lot of work to be done; it is difficult to pare it down. If we had to choose two topics? Data Privacy and ML in the “real world”.

As technology becomes more pervasive, user privacy has become an increasingly hot topic (one of our favs!). Over the last few years, we have seen the industry grapple with the appropriate level of “explainability” when it comes to the use of personal data. Ensuring users are not only informed, but also able to understand and engage with privacy agreements is of utmost importance. Setting the right standards and expectations across big tech will gain user trust and catalyze the adoption of increasingly personalized tech.

What else? ML in the “real world.” We see proof of concept vs. deployment as a gap to bridge. Many organizations still struggle with utilizing AI/ML for real world use cases — there have been a lot of wins but there is so much potential to unlock. 90% of ML projects fail due to siloed teams, fractured infrastructure and tooling. The next decade is where the rubber hits the road in making AI trusted & ubiquitous not just for big tech, but for businesses everywhere.

What will data as a culture look like in 2021?

We are not mind readers, BUT we see data culture in 2021 as being increasingly accessible, operationalized, and cool (duh!).

First, with the e-learning boom, the barrier to entry into Data has become much lower. Companies such as Coursera have democratized learning which, in turn, has widened the funnel. Data skills have become tablestakes in nearly every sector and job function. With that in mind, we hope & expect to see more women, under-represented groups in this traditionally male dominated field.

Second, we see data culture rapidly evolving and growing in scope. Data is half the battle; organizations are investing heavily in the right tools and processes required to effectively manage data while maximizing its potential. From startups to large orgs, the integration of robust end-to-end data platforms is gaining increased traction.

Third, data is cool again and here to stay! As former nerds (err, current nerds), we are thrilled that data is making a grand debut. AI and big data is no longer a relic of big tech and fintech — they are pushing into ubiquitous industries such as agrotech, factories, construction. With AI entering the mainstream, data in turn increasingly informs the world around us.

Interested in learning more about how to scale data and build data culture at your company? Reach out to Leah and Gabi at Data Culture.



Data Culture

We help organizations build data capabilities and get value from their data.