MIN_BY and MAX_BY in Snowflake

MIN_BY and MAX_BY in Snowflake

Have you ever wanted to look up the minimum or maximum value that corresponds to another column? Common questions like this come up often:

  • What is the latest order date per customer
  • What is the latest price by product
Sample results of data we want to pull using MAX_BY

What many engineers don't know is that Snowflake (and most modern DBMSs like DuckDB, Databricks, BigQuery, etc.) include two powerful aggregate functions, MAX_BY and MIN_BY, that drastically simply this query pattern.

In this blog we'll cover:

  • What you used to do without these functions
  • How MIN_BY and MAX_BY work

Let's begin by walking through an example, but feel free to skip to how it's used via the table of contents on the left.

Use Case

Consider the following table:

┌────────────┬──────────────┬────────────┬────────┬───────────────┐
│ product_id │ product_name │    date    │ price  │ has_promotion │
│   int64    │   varchar    │    date    │ double │    boolean    │
├────────────┼──────────────┼────────────┼────────┼───────────────┤
│          1 │ lettuce      │ 2025-07-12 │   3.12 │ true          │
│          1 │ lettuce      │ 2025-11-03 │   3.99 │ false         │
│          2 │ bread        │ 2016-06-02 │   5.99 │ false         │
│          3 │ milk         │ 2023-03-23 │   5.94 │ true          │
│          5 │ apple        │ 2021-02-01 │   0.54 │ true          │
│          5 │ apple        │ 2023-01-29 │   1.21 │ true          │
│          5 │ apple        │ 2025-10-11 │   0.99 │ false         │
└────────────┴──────────────┴────────────┴────────┴───────────────┘

We want to find the latest price for each product and the date in which that price came into effect:

┌──────────────┬─────────────┬──────────────┐
│ product_name │ latest_date │ latest_price │
│   varchar    │    date     │    double    │
├──────────────┼─────────────┼──────────────┤
│ apple        │ 2025-10-11  │         0.99 │
│ bread        │ 2016-06-02  │         5.99 │
│ lettuce      │ 2025-11-03  │         3.99 │
│ milk         │ 2023-03-23  │         5.94 │
└──────────────┴─────────────┴──────────────┘

The Two Old Ways

Normally, we would use a self join, ending up with a query that is unnecessarily complex.

WITH latest_dates AS (
    -- find the latest date for each product
    SELECT
        product_id,
        MAX(date) as latest_date
    FROM test
    GROUP BY 1
)

-- join each product to its latest date
SELECT
    t.product_name,
    ld.latest_date,
    t.price as latest_price
FROM test AS t
JOIN latest_dates AS ld
    ON t.product_id = ld.product_id
    AND t.date = ld.latest_date;

Alternatively, you could also have used a window function.

SELECT
    DISTINCT
    product_name,
    MAX(date) OVER (PARTITION BY product_name ORDER BY date DESC) as latest_date,
    FIRST_VALUE(price) OVER (PARTITION BY product_name ORDER BY date DESC) as latest_price
FROM test;

The New Way

With Snowflake's MAX_BY, the query can be rewritten as:

  SELECT
      product_name,
      MAX(date) as latest_date,
      MAX_BY(price, date) as latest_price
  FROM test
  GROUP BY product_name;

How to use MAX_BY and MIN_BY

For simplicity's sake, we will only cover MAX_BY as the descriptions below apply to MIN_BY as well.

In Snowflake and DuckDB, MAX_BY and MIN_BY accepts 3 parameters.

MAX_BY(
    <column_to_return>,
    <column_containing_maximum_value>,
    [<number_of_values_to_return>]
)

For example, if a table contains the columns product_id and date, MAX_BY(product_id, date) returns the value of the product_id column for the row that has the highest value in the date column.

Arguments

<column_to_return>

This specifies the column to return. In our previous example, we want the price associated with the latest date. If we instead wanted to determine whether the latest date includes a promotion, we would replace this with has_promotion.

  SELECT
      product_name,
      MAX_BY(has_promotion, date) as has_promotion
  FROM test
  GROUP BY product_name;
┌──────────────┬───────────────┐
│ product_name │ has_promotion │
│   varchar    │    boolean    │
├──────────────┼───────────────┤
│ apple        │ false         │
│ bread        │ false         │
│ lettuce      │ false         │
│ milk         │ true          │
└──────────────┴───────────────┘

<column_containing_maximum_value>

This specifies how to choose the column to return above. This is the column whose maximum value determines which value is selected.

┌──────────────┬────────────┬────────┬───────────────┐
│ product_name │    date    │ price  │ has_promotion │
│   varchar    │    date    │ double │    boolean    │
├──────────────┼────────────┼────────┼───────────────┤
│ lettuce      │ 2025-07-12 │   3.12 │ true          │
│ lettuce      │ 2025-11-03 │   3.99 │ false         │ <-- this is selected
│ bread        │ 2016-06-02 │   5.99 │ false         │ <-- this is selected
│ milk         │ 2023-03-23 │   5.94 │ true          │ <-- this is selected
│ apple        │ 2021-02-01 │   0.54 │ true          │
│ apple        │ 2023-01-29 │   1.21 │ true          │
│ apple        │ 2025-10-11 │   0.99 │ false         │ <-- this is selected
└──────────────┴────────────┴────────┴───────────────┘

<number_of_values_to_return>

This optional integer specifies how many values to pull from <column_to_return> based on the <column_containing_maximum_value>.

When this argument is provided, MAX_BY returns an array of values ordered by the <column_containing_maximum_value> in descending order.

For example, if we want to see last two product prices based on the most recent dates, we can write:

SELECT
    product_name,
    MAX_BY(price, date, 2) as latest_price
FROM test
GROUP BY product_name;
┌──────────────┬──────────────┐
│ product_name │ latest_price │
│   varchar    │   double[]   │
├──────────────┼──────────────┤
│ apple        │ [0.99, 1.21] │
│ bread        │ [5.99]       │
│ lettuce      │ [3.99, 3.12] │
│ milk         │ [5.94]       │
└──────────────┴──────────────┘

This means for apples, the last two prices, ordered by the latest dates, are $0.99 and $1.21.

Additional notes:

This function ignores NULLs in <column_containing_maximum_value> and returns NULL if all values in <column_containing_maximum_value> are NULL.

Wrapping Up

Whenever you're tempted to grab the latest, highest, lowest, or top N values per group, you can now do it in a single clean expression.

Give these functions a try in your own warehouse, they’re one of those features that make SQL feel surprisingly elegant.

Kyle Cheung

Kyle Cheung

optimization for you, you, you, you, and you