Benchmarking datasets

Suggest edits

When you provision a Lakehouse node, it comes pre-configured to point to a public S3 bucket in its same region, containing sample benchmarking datasets.

You can query tables in these datasets by referencing them with their schema name.

Schema NameDataset
tpcds_sf_1TPC-DS, Scale Factor 1
tpcds_sf_10TPC-DS, Scale Factor 10
tpcds_sf_100TPC-DS, Scale Factor 100
tpcds_sf_1000TPC-DS, Scale Factor 1000
tpch_sf_1TPC-H, Scale Factor 1
tpch_sf_10TPC-H, Scale Factor 10
tpch_sf_100TPC-H, Scale Factor 100
tpch_sf_1000TPC-H, Scale Factor 1000
clickbenchClickBench, 100 million rows
brc_1bBillion row challenge
Notes about ClickBench data:

Data columns (EventData) are integers, not dates.

You must quote ClickBench column names, because they contain uppercase letters, but unquoted identifiers in Postgres are case-insensitive. For example:

select "Title" from clickbench.hits;

🚫 select Title from clickbench.hits;


Could this page be better? Report a problem or suggest an addition!