Marks Notes

polars

A library for dataframes, written in rust.

Source code

Installation

Polars is available on pypi as polars

uv add polars

Usage

Load a dataset

df = pl.read_csv("csv_filepath.csv") #csv
df = pl.read_excel("excel_filepath.xlsx") #excel

Column calculations

The alias method defines the name of the new column to be created.

Arithmetic

result = df.select(
    (pl.col("nrs") + 5).alias("nrs + 5"),
    (pl.col("nrs") - 5).alias("nrs - 5"),
    (pl.col("nrs") * pl.col("random")).alias("nrs * random"),
    (pl.col("nrs") / pl.col("random")).alias("nrs / random"),
    (pl.col("nrs") ** 2).alias("nrs ** 2"),
    (pl.col("nrs") % 3).alias("nrs % 3"),
)

Comparisons

result = df.select(
    (pl.col("nrs") > 1).alias("nrs > 1"),  # .gt
    (pl.col("nrs") >= 3).alias("nrs >= 3"),  # ge
    (pl.col("random") < 0.2).alias("random < .2"),  # .lt
    (pl.col("random") <= 0.5).alias("random <= .5"),  # .le
    (pl.col("nrs") != 1).alias("nrs != 1"),  # .ne
    (pl.col("nrs") == 1).alias("nrs == 1"),  # .eq
)

Combine multiple comparisons with & and |.

result = df.select(
    pl.col("nrs"),
    (pl.col("nrs") & 6).alias("nrs & 6"),
    (pl.col("nrs") | 6).alias("nrs | 6"),
    (~pl.col("nrs")).alias("not nrs"),
    (pl.col("nrs") ^ 6).alias("nrs ^ 6"),
)

Uniqueness

result = long_df.select(
    pl.col("numbers").n_unique().alias("n_unique"),
    pl.col("numbers").approx_n_unique().alias("approx_n_unique"),
)

Lists

Columns can be list type in polars, and lists can be created dynamically from something like a string.

weather = weather.with_columns(
    pl.col("temperatures").str.split(" "),
)

Elements of the list can then accessed by index.

result = weather.with_columns(
    pl.col("temperatures").list.head(3).alias("head"),
    pl.col("temperatures").list.tail(3).alias("tail"),
    pl.col("temperatures").list.slice(-3, 2).alias("two_next_to_last"),
)

Missing data

Polars uses null for missing values. None can be used to create a null value in python.

Missing data can be filled with a particular value

fill_literal_df = df.with_columns(
    pl.col("col2").fill_null(3),
)

Forward and backward filling are also supported

fill_forward_df = df.with_columns(
    pl.col("col2").fill_null(strategy="forward").alias("forward"),
    pl.col("col2").fill_null(strategy="backward").alias("backward"),
)