Parquet file

Parquet came from a common need for on disk columnar representation, and it’s inspired from a lot of work in academia and Google Dremel paper, and you know, a lot of databases are using it, like Vertica, are using columnar representation to speed up analysis. Arrow is similar, coming from a common need for in memory columnar, so if you look at papers like MonetDB, papers that are the beginning of vectorized execution, it’s the next step in making sequel execution and all those things much faster. - The Columnar Roadmap

Predicate Pushdown in Parquet andApache Spark
The two versions of Parquet - how the engines that process Parquet files as SQL tables are blocking the evolution of the format. This is because those engines are not fully supporting the latest specification, and without this support, the rest of the ecosystem has no incentive to adopt it.

Written on September 23, 2020, Last update on November 8, 2025

parquet sql archive