By Jimmy Chan, co-founder and CEO at Dropbase, a platform that automates manual data work and turns CSVs, Excel and online data to analytics-ready databases.
Open-source software startups building data products are having a moment. In just the last few months, we’ve seen startups such as Airbyte, ClickHouse, PostHog and RudderStack quickly grow strong communities and collectively raise over $100 million. These startups are open-source alternatives to closed-source products such as Fivetran, Snowflake, Amplitude and Segment, which are collectively valued at over $100 billion at the time of writing.
This spike isn’t surprising given that the amount of data production and consumption is growing exponentially. In this article, I will talk about the underlying reasons for this trend and how entrepreneurs can find opportunities to build successful data startups.
What’s A Data Startup And Open-Source Software?
A data startup develops software and tools to help individuals and organizations make their data more useful. This includes helping collect, clean, centralize, store, transform and analyze data.
Open-source software (OSS) refers to software that people can view, use, download, study, modify, distribute and share for any purpose. The specific purposes depend on its license type; for example, MIT license, GPL 3.0 or Apache 2.0.
What Explains The Spike In Open-Source Companies Building Data Products?
Fundamentally, a shift in market needs. Companies increasingly look to have finer control of their data, adapt products to custom-use cases and have a choice for what tools to integrate with. This is further compounded with the increasing amount of data generated by companies and their desire to extract more insight from it to grow the business.
People who work with data are familiar with the following pattern: They want to capture marketing data to understand customer trends. They sign up for a commercially available product developed by a vendor they trust. Then their data volume increases, workflows become more complex and they need to customize the product to suit specific use cases. To do this, they need to access the underlying data, which they don’t have, not unless they pay vendors five figures in additional recurring fees. Most companies go through this pattern. While the requirements for many companies won’t outgrow features offered by closed-source vendors, those of the most successful ones certainly will, creating large opportunities for startups.
Why Do Organizations Turn To Open-Source Data Tools?
Because they can benefit from it in ways that closed-sourced products can’t.
• Access: You get maximum access to your underlying data.
• Control: You can control what you want to do with your data, how it’s stored or backed up and who else can access it.
• Choice: You can choose to use your data with any downstream tool that requires data access. You can also choose which tool to integrate it with.
• Customization: You can modify and customize the data product to suit your teams’ or business’s specific needs, allowing maximum flexibility.
• Portability: You can take your data somewhere else without time-intensive and expensive effort.
• Compliance And Regulation: You control all of your own data and don’t need to establish data processor relationships with external parties.
• Speed Of Innovation: Innovation efforts are brought by a distributed community, all working to improve the product, as opposed to being handled by a single team or company.
What Matters When Building An Open-Source Data Startup?
Community, flexibility, speed of development and customization are the most important factors that determine the success of data startups built as OSS. Product market fit in the traditional sense is surprisingly not the most important challenge of data startups built on OSS. That’s because data products built as OSS are often based on already available and successful commercial products, which already have product market fit. Open-source companies get the advantage of observing how a closed-source system works, from their pricing, go-to-market, and strength and weaknesses to their product documentation, which is often modeled one-for-one from existing products.
On the other hand, there are other difficult challenges, too. First, it’s not easy to just pick any closed-source software and create an open-source alternative of it and launch a startup. Not all products can or will succeed as open-source. Second, you’ll still need to figure out how to pick an appropriate go-to-market, business model and pricing scheme that works well for the target market. And third, building community and trust is difficult with open-source data products. Customers may not choose a project that does not have sufficient or fast-growing community support. Community is probably the hardest thing about doing an open-source data startup but also the most important one.
Products That Have Or Would Have Worked Well As Open-Source
While there is no exact formula, here are some examples of products that have worked or would work well as OSS. These are typically products that:
• Query your data warehouse (business intelligence tools, data cataloging, data observability)
• Require a very high level of engineering customization
• Are a building block for other products (authentication, security)
• Extract insights from their data (machine learning, product analytics)
• Help you build custom applications based on your company’s data (internal tools builders, data automation tools)
• Capture or stream data logs (customer data platforms, event streaming)
• Enhance team collaboration (Figma, Jira, Google Sheets)
There is a huge opportunity in recognizing closed-source products that are working well and are hugely successful and building an open-source alternative to them. We’ll likely see more companies commercializing open-source software for data products due to increasing market demand and the intrinsic benefits of open-sourced data products.
Over the next few years, we’ll likely see the emergence of a standard modern “data stack” — or a set of core tools that aid in data-related work, where each underlying component is based on OSS. This future of data products based on OSS is incredibly bright, filled with incredible opportunities for entrepreneurs and just around the corner.