polars
A library for dataframes, written in rust.
Installation
Polars is available on pypi as polars
uv add polars
Usage
Load a dataset
df = pl.read_csv("csv_filepath.csv") #csv
df = pl.read_excel("excel_filepath.xlsx") #excel
Column calculations
The alias method defines the name of the new column to be created.
Arithmetic
result = df.select(
(pl.col("nrs") + 5).alias("nrs + 5"),
(pl.col("nrs") - 5).alias("nrs - 5"),
(pl.col("nrs") * pl.col("random")).alias("nrs * random"),
(pl.col("nrs") / pl.col("random")).alias("nrs / random"),
(pl.col("nrs") ** 2).alias("nrs ** 2"),
(pl.col("nrs") % 3).alias("nrs % 3"),
)
Comparisons
result = df.select(
(pl.col("nrs") > 1).alias("nrs > 1"), # .gt
(pl.col("nrs") >= 3).alias("nrs >= 3"), # ge
(pl.col("random") < 0.2).alias("random < .2"), # .lt
(pl.col("random") <= 0.5).alias("random <= .5"), # .le
(pl.col("nrs") != 1).alias("nrs != 1"), # .ne
(pl.col("nrs") == 1).alias("nrs == 1"), # .eq
)
Combine multiple comparisons with & and |.
result = df.select(
pl.col("nrs"),
(pl.col("nrs") & 6).alias("nrs & 6"),
(pl.col("nrs") | 6).alias("nrs | 6"),
(~pl.col("nrs")).alias("not nrs"),
(pl.col("nrs") ^ 6).alias("nrs ^ 6"),
)
Uniqueness
result = long_df.select(
pl.col("numbers").n_unique().alias("n_unique"),
pl.col("numbers").approx_n_unique().alias("approx_n_unique"),
)
Lists
Columns can be list type in polars, and lists can be created dynamically from something like a string.
weather = weather.with_columns(
pl.col("temperatures").str.split(" "),
)
Elements of the list can then accessed by index.
result = weather.with_columns(
pl.col("temperatures").list.head(3).alias("head"),
pl.col("temperatures").list.tail(3).alias("tail"),
pl.col("temperatures").list.slice(-3, 2).alias("two_next_to_last"),
)
Missing data
Polars uses null for missing values. None can be used to create a null value in python.
Missing data can be filled with a particular value
fill_literal_df = df.with_columns(
pl.col("col2").fill_null(3),
)
Forward and backward filling are also supported
fill_forward_df = df.with_columns(
pl.col("col2").fill_null(strategy="forward").alias("forward"),
pl.col("col2").fill_null(strategy="backward").alias("backward"),
)