## Apache Parquet
For excellent general load time and portability to non-Python tools.
- Write: `df.to_parquet(file_name)`
- Read: `df = pd.read_parquet(file_name)`
- You should have either the [PyArrow](https://arrow.apache.org/docs/python/install.html) (`pyarrow`, recommended) or [fastparquet](https://github.com/dask/fastparquet) (`fastparquet`) engine installed.
## **CSVs on GPUs with [cuDF](https://docs.rapids.ai/api/cudf/stable/)**
For ultra-fast loading of even huge CSV files:
- Write: `df.to_csv(file_name)`
- Read: `df = cudf.read_csv(file_name)`
- You need to [install and set up cuDF](https://github.com/rapidsai/cudf?tab=readme-ov-file#installation)
## [PyTables](https://www.pytables.org/usersguide/introduction.html) HDF5
Use PyTables for speed when using large datasets (>50 MB).
- Write: `df.to_hdf(file_name)`
- Read: `df = pd.read_hdf(file_name)`
- You need to [install](https://www.pytables.org/usersguide/installation.html#id1) the package `tables`
Ref: [Loading data into a Pandas DataFrame – a performance study](https://www.architecture-performance.fr/ap_blog/loading-data-into-a-pandas-dataframe-a-performance-study/)