Benchmarking datasets

Suggest edits

When you provision a Lakehouse node, it comes pre-configured to point to a public S3 bucket in its same region, containing sample benchmarking datasets.

You can query tables in these datasets by referencing them with their schema name.

Schema Name	Dataset
`tpcds_sf_1`	TPC-DS, Scale Factor 1
`tpcds_sf_10`	TPC-DS, Scale Factor 10
`tpcds_sf_100`	TPC-DS, Scale Factor 100
`tpcds_sf_1000`	TPC-DS, Scale Factor 1000
`tpch_sf_1`	TPC-H, Scale Factor 1
`tpch_sf_10`	TPC-H, Scale Factor 10
`tpch_sf_100`	TPC-H, Scale Factor 100
`tpch_sf_1000`	TPC-H, Scale Factor 1000
`clickbench`	ClickBench, 100 million rows
`brc_1b`	Billion row challenge

Notes about ClickBench data:

Data columns (EventData) are integers, not dates.

You must quote ClickBench column names, because they contain uppercase letters, but unquoted identifiers in Postgres are case-insensitive. For example:

✅ select "Title" from clickbench.hits;

🚫 select Title from clickbench.hits;

← Prev

Supported AWS instances

↑ Up

Reference - EDB Postgres® AI Lakehouse

Queries

Could this page be better? Report a problem or suggest an addition!

Benchmarking datasets

Notes about ClickBench data:

← Prev

↑ Up

Next →