Skip to content

ljishen/tpch-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

tpch-data

$ pip install pyarrow duckdb
$ python3
>>> import duckdb
>>> import pyarrow.parquet as pq
>>> con = duckdb.connect(database=':memory:')
>>> con.execute("INSTALL tpch; LOAD tpch")
>>> con.execute("CALL dbgen(sf=10)")
>>> print(con.execute("show tables").fetchall())
[('customer',), ('lineitem',), ('nation',), ('orders',), ('part',), ('partsupp',), ('region',), ('supplier',)]
>>> tables = ["customer", "lineitem", "nation", "orders", "part", "partsupp", "region", "supplier"]
>>> for t in tables:
...     res = con.query("SELECT * FROM " + t)
...     pq.write_table(res.to_arrow_table(), t + ".parquet")
...

About

Generate tpch data in parquet format

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published