Reference Index

This is the site-wide alphabetical index for all published concept notes.

All concept notes

Build Incremental Snowflake Pipelines With Streams, Tasks, and Dynamic Tables: A directional guide to choosing between streams, tasks, and dynamic tables when designing refresh-aware transformation pipelines in Snowflake.
Build Running Totals With Window Functions: A clean cumulative-sum pattern for trend lines, balances, and progressive totals.
Build a Hugging Face AI Stack With MCP and Python Clients: An infrastructure-oriented example that uses Hugging Face models behind an MCP server and calls task-specific models from a Python MCP client.
Build an MCP Server in Python: A reference example for building a simple MCP server in Python with the official MCP Python SDK.
Call an MCP Server From a Python Service: A follow-up example showing how a Python service can connect to an MCP server and call its tools over streamable HTTP.
Check For Null And Duplicate Keys: A fast validation query for checking whether a model has null keys or duplicate identifiers.
Choose Views for Flexibility and Tables for Stability: A guide to how views support flexibility, how tables support stable outputs, and how both can be used in a pipeline.
Choose the Right Data File Format for the Job: A practical guide to when CSV, JSON, XML, YAML, Avro, Parquet, and ORC are the right fit for data engineering and analytics work.
Choose the Right Ranking Function for Ties: A side-by-side ranking pattern for deciding between ROW_NUMBER, RANK, and DENSE_RANK.
Compare Each Row to Its Group Average: A practical pattern for flagging records that outperform or underperform their peer group.
Data Cleanup Before Loading: A lightweight dataframe cleanup pattern for standardizing files before loading them into a warehouse.
Deduplicate Rows With Row Number: A warehouse-friendly SQL pattern for keeping the latest record per business key.
Design Snowflake Ingestion Patterns for Latency, Scale, and Control: A professional guide to choosing between batch loads, continuous ingestion, and streaming patterns in Snowflake for advanced data engineering scenarios.
Detect Missing Values in a Sequence: A sequence-gap check for invoice numbers, tickets, or any ordered identifier stream.
Engineer Trustworthy Snowflake Data Products With Governance and Data Quality: A directional article on why secure access design, data quality controls, and governed delivery are essential for Snowflake advanced data engineering work.
Estimate Cost Per Query in Snowflake With Better Cost Attribution: A practical way to estimate Snowflake cost per query by allocating actual hourly warehouse credits instead of relying on execution time alone.
Find Customers Active in Consecutive Months: A retention-style query for identifying customers who returned in back-to-back months.
Find the Nth Highest Value With Dense Rank: A durable pattern for returning the nth highest value while handling ties correctly.
Keep the Top Products Within Each Category: A top-n-per-group pattern for ranking products without mixing categories together.
Manage Python Virtual Environments With venv: A reference workflow for creating, activating, updating, and removing Python virtual environments with venv.
Move Large Datasets With Resumable and Verified Transfers: A practical guide to moving large datasets safely, estimating transfer time, and choosing between transfer tools, warehouse-native loading patterns, and Python orchestration.
Move, Share, and Protect Data Across Snowflake Regions and Platforms: A professional guide to understanding when to use data sharing, replication, or failover capabilities in Snowflake advanced data engineering scenarios.
Operate SQL Server Triggers and Stored Procedures Safely: A practical SQL Server pattern for using triggers carefully and writing transaction-safe stored procedures.
Optimize Snowflake Performance With Warehouse and Storage Design: A professional overview of how warehouse sizing, workload isolation, pruning, and storage-aware design affect Snowflake performance and cost.
Organize AI Workflows as Files, Not Frameworks: A file-oriented AI architecture pattern where one agent connects to multiple workflows, and each workflow is broken into tasks, prompts, data, and tools.
Pivot Rows Into Reporting Columns: A conditional-aggregation pattern for turning row values into side-by-side reporting columns.
Prompt Engineering, Context Engineering, and Harness Engineering: A practical distinction between prompt engineering, context engineering, and harness engineering, and why all three matter when moving from demos to reliable AI systems.
Salesforce AgentForce: A quick reference on Salesforce Agentforce, its Atlas reasoning engine, and how autonomous agent flows are structured.
Schedule SQL Server Agent Jobs by Time and Trigger: A SQL Server Agent guide for time-based schedules, trigger-driven job starts, and separating scheduling from job logic.
Set Up a Snowflake CLI Connection With config.toml: A practical guide to defining a Snowflake CLI connection in config.toml, setting a default connection, and validating access locally or in CI/CD pipelines.
Setting up a Snowflake Connector in Python: A clean starting point for loading environment variables and opening a Snowflake connection from Python.
Snowflake Dynamic Tables: How Snowflake dynamic tables help you build refresh-aware transformation pipelines with less orchestration.
Standardize Text Values With SQL String Functions: A lightweight cleanup pattern for normalizing free-text fields before analysis.
Use Docker to Run Services and Edit Files in Containers: A reference pattern for starting Docker containers, inspecting them, and editing files that live inside a running container.
Use a CTE to Stage a Complex Query: A readable SQL pattern for splitting one large transformation into named steps.