Estuary

How to use the Snowflake Primary Key Constraint

Snowflake's primary key constraint ensures you don't abandon data integrity when you migrate to the cloud. Here's how to use it.

How to use the Snowflake Primary Key Constraint
Share this article

Storing and managing data in the cloud instead of on-premises comes with benefits such as easy accessibility, scalability, and data governance. However, compatibility with existing data systems remains critical: you want to be sure you’re not abandoning data integrity with the migration to the cloud. That’s where paradigms like Snowflake constraints come in. 

In this article, I’ll show you how Snowflake’s constraints keep this cloud warehouse compatible with traditional database systems that support similar conventions. This is useful when you have a data pipeline from a traditional database; for example, MySQL to Snowflake.

Then, we’ll take a deep dive into how to create a primary key constraint in Snowflake and the different approaches to achieving that. 

At the end of the article, you will have a clear understanding of the different types of constraints in Snowflake, how to manage primary key constraints in Snowflake, and how they can be enforced using the Snowflake Unistore workload.

What is Snowflake?

Snowflake is a global data cloud platform designed as a one-stop solution for data workloads. It can act as a data warehouse or data lake. It emphasizes seamless data collaboration as data is stored in a centralized repository. 

Snowflake’s architecture is unique when compared to other data providers or database systems. Its architecture is cloud native and it is not built on any existing database technology. 

Snowflake’s main innovation is that it separates computation and storage while retaining support for Structured Query Language (SQL). Snowflake combines the benefits of a shared disk architecture (in that it utilizes a centralized data repository) with the advantages of a shared-nothing architecture (as data is split and stored in virtual data warehouses). Warehouses can be scaled out and power fast query computation times through the concept of Massively Parallel Processing (MPP) compute clusters. 

Another major advantage of Snowflake is that it’s a fully managed cloud service. You don’t need to install, configure, or manage any components. The optimization and functionality of the platform are Snowflake’s responsibility, so you can focus on your data modeling needs.

What are Snowflake Constraints?

In database design, there are times when you need to place constraints on the database model to follow certain expectations and requirements. For example, you might need to make sure that every data point in a given column of a table is unique, or that new records cannot be created in a table if certain specified columns do not contain values. 

The most prominent of such constraints is the primary key constraint, which uniquely identifies every row in a database table. The values ascribed in the primary key must be unique and a database table can only contain one primary key. 

A primary key can be used in conjunction with other constraints such as a foreign key constraint, which is used to create directional relationships between tables. Snowflake supports the primary key constraint, unique key constraint, foreign key constraint, and NOT NULL constraint. 

At this point, it is important to note that while Snowflake supports the above-mentioned constraints, it only enforces the NOT NULL constraint, which means that duplicate records or non-unique entries can be created in Snowflake even when these constraints are specified. It is the application’s job to verify that entries obey the constraints. 

Recently, Snowflake introduced a new workload called Unistore that natively enforces Snowflake constraints, which we’ll discuss in a later section. Still, it is vital to remember that all other workloads on Snowflake do not enforce constraints and the examples in this article assume you are using the vanilla version of Snowflake unless otherwise stated.

Types of Snowflake Constraints

Snowflake supports four constraints from the ANSI SQL standard namely UNIQUEPRIMARY KEYFOREIGN KEY, and NOT NULL. Snowflake’s constraints can be defined either on temporary tables, transient tables, or permanent tables. You can classify Snowflake constraints based on the way a constraint is defined and the column it targets. Below are the types of constraints in Snowflake.

Single Column Constraints: This type of constraint, as the name implies, is defined on a single column in a Snowflake table. Single-column constraints are specific to a particular column and the constraint may be a primary key constraint, foreign key constraint, unique key constraint, or a NOT NULL constraint.

Multi-Column Constraints: Multi-column constraints are used to place constraints on more than one column in a Snowflake table. You can think of them as compound constraints that combine the corresponding key sequence on a column with other columns in the constraint definition. The order of the columns in this compound constraint is important. You can use multi-column constraints to create compound unique keys or primary keys.

Inline Constraints: Constraints that are defined as part of a column definition are termed inline constraints and can only be used to create single-column constraints in Snowflake.

Out-of-Line Constraints: You can use Out-of-Line constraints to create both single-column and multi-column constraints. A constraint is said to be out of line in Snowflake when it is defined using a separate clause that specifies the column or columns on which the constraint will take effect. In essence, out-of-line constraints are defined as an add-on clause to specify the constraint properties of specific columns.

Creating a Snowflake Primary Key Constraint

You can define a primary key constraint in Snowflake using the column-level or table-level syntax. You can also add a primary key constraint to an existing table using the ALTER TABLE command. 

When tables with constraints are copied in Snowflake via the CREATE TABLE … LIKE or CREATE TABLE … CLONE commands, the constraints defined on the source table are copied to the new table. Below are the various methods of creating a primary key constraint in Snowflake.

Column Level: In this method, the primary key constraint is specified inline as part of the column definition when creating a table. An example of the relevant syntax is shown below.

plaintext
CREATE TABLE pk_products_table  (     id  INT PRIMARY KEY,     NAME  VARCHAR(10),     tagline VARCHAR(100)  );

Table Level: Here the primary key constraint definition is included as an out-of-line clause that refers to a specific column in the table. The syntax is shown below.

plaintext
CREATE TABLE pk_products_table  (     id  INT,     NAME  VARCHAR(10),     tagline VARCHAR(100),     PRIMARY KEY (id)  );

ALTER TABLE: A primary key constraint definition can be added to an existing table using the ALTER TABLE command. To do so, the column to be specified must exist. You can add a primary key constraint definition to an existing table like so.

ALTER TABLE pk_my_table ADD PRIMARY KEY (id);

How to Show Snowflake Primary Keys

You can use the Snowflake SHOW PRIMARY KEYS command to check if you have primary keys enabled on a table. You can also use it to list all primary keys in a specified schema, account, or database. The command returns the primary key properties and metadata from your database. 

Examples of how to show primary keys in Snowflake are shown below. 

Using a generic command:

show primary keys;

Using a command that targets a specific table:

show primary keys in my_table;

Using a command to show all primary keys in a database:

show primary keys in database my_database;

Snowflake Unistore Workload and Primary Key Constraints

In June 2022, Snowflake introduced a new workload that could power fast analytical queries on transactional data. The workload, called Unistore, aims to provide a single solution for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) workloads. 

The idea is to have a data store that can cater to the relationship-heavy data that is typically present in OLTP systems while at the same time being able to act as a store for the complex, analysis-rich workloads in OLAP systems. 

Another major difference of Unistore is that, unlike the base offering of Snowflake, it enforces adherence to primary key constraints and unique key constraints while also ensuring referential integrity with foreign keys. As of the time of this writing, Unistore is only available in private preview and is not open to the general public.  

Snowflake CDC is an essential component in achieving this new level of constraint enforcement and data integrity in Unistore, enabling seamless data synchronization between operational systems and analytical platforms.

Snowflake Hybrid Tables

Hybrid tables are the new table type introduced in Snowflake and serve as the building blocks of Unistore. Using hybrid tables, you can create transactional applications and query them with the speed and flexibility of analytical workloads.

Conclusion

This article showed you why you may want to use Snowflake as your go-to cloud solution because of its compatibility with SQL. It introduced the concepts espoused by Snowflake, the constraints supported for data modeling, and how you can create and use a primary key constraint in Snowflake. 

Furthermore, it highlighted Unistore - the new workload available to select Snowflake customers and how it enforces constraints out of the box. However, you may want a solution that simplifies the process of managing Snowflake constraints and transparently meets your data requirements, Estuary, is a managed cloud service that can facilitate the creation of streaming data pipelines and is fully integrated with Snowflake as a data destination.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Build a Pipeline

Start streaming your data for free

Build a Pipeline

About the author

Picture of Jeffrey Richman
Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.