File format for tabular data
there is several high quality and well-developed formats - HN
csv β Simple for simple use cases, text-based, however many edge cases, feature lacking etc
ndjson β every line is a json object
xlsx β Works in excel, ubiquitous format with a standard, however complicated and missing scientific features
sqlite β Designed for relational data, somewhat ubiquitous, types defined but not enforced
parquet / hdf5 / apache feather / etc β Designed for scientific use cases, robust, efficient, less ubiquitous - DuckDB is a lightweight and super fast library/CLI for working with Parquet. Itβs SQLite for column formats - Arrow also has its own on-disk format called Feather
capn proto, prototype buffers, avro, thrift β Has specific features for data communication between systems
xml β Useful if you are programming in early 2000s
GDBM, Kyoto Cabinet, etc β Useful if you are programming in late 1990s
see also
- Working with CSV files on shell/terminal
- A love letter to the CSV format / HN
- dead simple + text + streamable