RevenueBase Blog

How Gigasheet Powers RevenueBase Data Delivery

Guest post by Jason Hines, CEO of Gigasheet

RevenueBase is redefining how B2B data gets delivered. Instead of gating access through credit-based pricing or limiting users to narrow API queries, they make the entire dataset available to customers in one shot. In practice that means serving up thousands of searches across hundreds of millions of high-quality contact and company records, open for exploration without restrictions.

That kind of access is powerful, but it also creates a real challenge: how do you deliver data at that scale, in an efficient way that’s actually usable for sales and marketing teams? That’s where Gigasheet comes in.

RevenueBase has built one of the largest, most accurate, and continuously refreshed B2B intelligence databases in the market. And they’ve been working with us at Gigasheet almost since day one. As one of our earliest customers, they helped shape how we approach large-scale data delivery.

Today, we support hundreds of RevenueBase users each month, delivering 57 billion data points of fresh B2B sales intelligence through our platform. Users regularly run highly complex queries across the entire dataset, often with more than 100 filter clauses. That level of flexibility and performance wouldn’t be possible without the infrastructure we’ve built behind the scenes.

Big Data Built for Business Users

At Gigasheet, we believe that just because you’re working with huge datasets doesn’t mean you should need a SQL console to explore them. Most of the people who need access to this kind of data (revenue operations, SDR managers, demand gen, ie sellers) don’t live in Snowflake or the command line. They live in business tools, like spreadsheets.

So that’s what we built. Gigasheet gives you a familiar spreadsheet-like interface backed by a database that can handle billions of rows. No SQL, no scripts. Just flexible access to data, no matter how big.

For RevenueBase customers, this means they can explore the full dataset of contacts and companies, apply filters, build segments, and export what they need. It’s full unfettered access to every row and column, without needing to involve the data team.

How It Works: A Look Under the Hood

Every month, RevenueBase refreshes their data. This is not a simple append or diff. It is a full update, with stale contacts removed and millions of new ones added, and even new columns as RevenueBase continues to add features to the data. Here’s how Gigasheet handles the ingestion and publishing process on our side.

We start by running data checks against the source tables in RevenueBase’s Snowflake environment. We look for consistency in schema, expected row counts, and other structural indicators to ensure the data is sound before ingestion.

Next, we connect directly to their Snowflake instance using API credentials they have provisioned for us and our own data connectors. Using Snowflake’s SQL API, we issue a COPY INTO command to export the full contact table in Parquet format to an S3 bucket we manage. In this setup Snowflake writes directly to our storage.

At Gigasheet we use ClickHouse as our backend, because it’s an absolute beast when it comes to analytical workloads at scale. Its performance with columnar data and ability to run fast, complex queries makes it the perfect engine for delivering massive datasets like RevenueBase’s. And we don’t hold back. We’re likely running the largest single-node ClickHouse deployment anywhere, with over 827,000 tables and more than 288 billion rows added so far this year. It handles that kind of volume reliably and with speed, which is exactly what our customers need.

Once the export lands in S3, we load it into our ClickHouse backend using a CREATE TABLE AS command. This allows us to bring in the dataset efficiently using Parquet’s columnar structure. After the load, we inspect the table to check column types, row counts, and overall structure. We then create matching metadata entries in our Postgres catalog, which powers the user-facing views and filters in Gigasheet.

We follow that with another round of data quality checks on the newly imported table. These checks compare distributions, look for unusual shifts, and validate that the data aligns with historical norms. Once everything passes, we use ClickHouse’s EXCHANGE TABLES command to swap in the new version. This preserves all saved views, filters, and shares while updating the data underneath. We also update the relevant metadata in Postgres to keep everything consistent.

There’s one final step. Sometimes RevenueBase includes fields that should remain in the backend but not be visible in the UI. We apply those column-level changes, drop any excluded fields from the interface, and run a final round of QA on the production table. Once that is complete, we notify the RevenueBase team that the update is live.

To ensure rollback is always possible, we retain the prior version of the table until the next monthly refresh.

Scale Without the Complexity

The result is a scalable, self-service experience for every RevenueBase customer. Right now, users have access to:

  • 294 million business contacts
  • 63 million companies
  • 53 million deliverable emails
  • 35 million mobile numbers

And they can explore all of it without needing engineering support or dealing with restrictive credit/usage caps. No more submitting a ticket to get a list of companies with 5+ Android developers in Fintech. Just open up Gigasheet, apply a filter, and export your list.

Could you build something like this using Snowflake and a BI tool? Possibly, but should you? that means burning cycles from your data team to model, permission, and maintain a front-end. Gigasheet gives you a turnkey experience that’s purpose-built for raw data delivery at scale.

If you’re a RevenueBase customer, the latest dataset is already live in Gigasheet. If you’re not, now’s a good time to take a look.