data2al Snowflake, Databricks, and SQL Engineering

Search articles

Concept note

Deduplicate Rows With Row Number

A warehouse-friendly SQL pattern for keeping the latest record per business key.

2025-04-02

Alan

SQL

Beginner

SQL Data Quality Warehousing

Example
Why this pattern helps
Notes

Use this when ingestion or upstream systems can resend the same business event more than once.

Example

with ranked_events as (
    select
        event_id,
        customer_id,
        event_timestamp,
        payload,
        row_number() over (
            partition by event_id
            order by event_timestamp desc
        ) as row_num
    from raw.customer_events
)
select
    event_id,
    customer_id,
    event_timestamp,
    payload
from ranked_events
where row_num = 1;

Why this pattern helps

removes duplicate records deterministically
keeps the query easy to adapt for staging models
works well in warehouses that support window functions efficiently

Notes

choose an ordering column that reflects the latest valid version of a row
use additional tie-breakers if timestamps are not unique
test the deduped result against known duplicate cases

Similar Posts

Use Cortex LLM Functions for Support Triage

Previous Article Use Databricks Assistant to Speed Up SQL and Notebook Work

Next Article Check For Null And Duplicate Keys