Loading data (sync or bring your own)
Suggest editsLoading data with Lakehouse sync
If you have a transactional database running in EDB Postgres AI Cloud Service, then you can sync tables from this database into a Managed Storage Location. See "How to lakehouse sync" for further details.
Bringing your own data
It's possible to point your Lakehouse node at an arbitrary S3 bucket with Delta Tables inside of it. However, this comes with some major caveats (which will eventually be resolved):
Caveats
- The tables must be stored as Delta Lake Tables within the location.
- A "Delta Lake Table" (or "Delta Table") is a folder of Parquet files along with some JSON metadata.
- Each table must be prefixed with a
$schema/$table/
where$schema
and$table
are valid Postgres identifiers (i.e. < 64 characters)- For example, this is a valid Delta Table that will be recognized by Beacon Analytics:
my_schema/my_table/{part1.parquet, part2.parquet, _delta_log}
- These
$schema
and$table
identifiers will be queryable in the Postgres Lakehouse node, e.g.:SELECT count(*) FROM my_schema.my_table;
- These
- This Delta Table will NOT be recognized by Postgres Lakehouse node (missing a schema):
my_table/{part1.parquet, part2.parquet, _delta_log}
- For example, this is a valid Delta Table that will be recognized by Beacon Analytics:
Loading data into your bucket
You can use the lakehouse-loader
utility to export data from an arbitrary Postgres instance to Delta Tables in a storage bucket.
See Delta Lake Table Tools for more information on how to obtain and use that utility.
For further details, see the External Tables documentation.
Could this page be better? Report a problem or suggest an addition!