data2al icon data2al Data Engineering Notes and Code Patterns

Concept note

Deduplicate Rows With Row Number

A warehouse-friendly SQL pattern for keeping the latest record per business key.

2025-04-02
Alan
SQL
Beginner
SQL Data Quality Warehousing

Use this when ingestion or upstream systems can resend the same business event more than once.

Example

with ranked_events as (
    select
        event_id,
        customer_id,
        event_timestamp,
        payload,
        row_number() over (
            partition by event_id
            order by event_timestamp desc
        ) as row_num
    from raw.customer_events
)
select
    event_id,
    customer_id,
    event_timestamp,
    payload
from ranked_events
where row_num = 1;

Why this pattern helps

  • removes duplicate records deterministically
  • keeps the query easy to adapt for staging models
  • works well in warehouses that support window functions efficiently

Notes

  • choose an ordering column that reflects the latest valid version of a row
  • use additional tie-breakers if timestamps are not unique
  • test the deduped result against known duplicate cases

Similar Posts

Previous Article Data Cleanup Before Loading