Parquet file

Parquet came from a common need for on disk columnar representation, and it’s inspired from a lot of work in academia and Google Dremel paper, and you know, a lot of databases are using it, like Vertica, are using columnar representation to speed up analysis. Arrow is similar, coming from a common need for in memory columnar, so if you look at papers like MonetDB, papers that are the beginning of vectorized execution, it’s the next step in making sequel execution and all those things much faster. - The Columnar Roadmap

Predicate Pushdown in Parquet andApache Spark

Written on September 23, 2020, Last update on September 23, 2020

parquet sql archive