There are several methods to select distinct values from query results in PostgreSQL. One of the ways is by using the DISTINCT
keyword. You can use the PostgreSQL DISTINCT
keyword in conjunction with the SELECT
statement. It eliminates all duplicate records and fetches only unique records.
This article provides a brief explanation of PostgreSQL and its features. You will also gain information about the PostgreSQL DISTINCT
clause and the different syntaxes. Finally, you’ll be presented with some examples of this PostgreSQL clause.
Before diving into the PostgreSQL DISTINCT
clause, let’s get an overview of PostgreSQL.
What is PostgreSQL?
PostgreSQL, commonly known as Postgres, is an enterprise-class, open-source object-relational database management system (ORDBMS). It’s one of the most compliant, stable, and mature relational databases available today.
It was initially developed in 1986 as a follow-up to INGRES (an open-source SQL relational database project of the early 1970s). SQL support was added to the project in 1994, leading to PostgreSQL.
You can use PostgreSQL for both SQL (relational) and JSON (non-relational) queries. It also provides support for different SQL functions, like triggers, subqueries, foreign keys, and different user-defined functions.
The strong reputation of Postgres is owing to its reliability, extensibility, proven architecture, data integrity, and robust feature set. PostgreSQL is used as the primary database or data warehouse for many web, mobile, and analytics applications.
Postgres isn’t controlled by any corporation or private entity. Being an open-source software, you can access its source code under the PostgreSQL License.
Features of PostgreSQL
There are a few features of the PostgreSQL database that make it widely favored over other databases. Here are those features that have earned PostgreSQL a spot in the top-3 most popular databases in the world:
- Point-in-time recovery (PITR): PostgreSQL logs every database change at all times with a write-ahead log. You can use PITR to restore databases to a specific moment in time when you run data recovery initiatives. It helps restore file systems to a stable starting point.
- Stored procedures: PostgreSQL has built-in support for multiple procedural languages. You can create custom subroutines called stored procedures and call them on a given database.
- Open source: PostgreSQL source code, under an open-source license, is freely available for you to use, modify, and implement as you see fit.
- Extensions: PostgreSQL has foreign data wrappers that can fetch data from an external source or transmit data to it for update commands. Other robust features include granular access controls, asynchronous replication, and multi-version concurrency control.
- Community support: PostgreSQL has a dedicated community of contributors and enthusiasts who regularly find bugs and fixes. They contribute to the overall system security and are readily available to support other users.
- Standards compliance: PostgreSQL is ACID-compliant and runs on all major operating systems. It also tries to conform with the SQL standard. This means that most of the features required by the SQL standard are supported, though with slightly different syntax or function.
- Dynamic Loading: You can include manually written code into the PostgreSQL server via dynamic loading. This can be done by specifying an object code file to implement a new type or function, and PostgreSQL will load it as required.
What is Postgres DISTINCT Clause?
In PostgreSQL, the DISTINCT
clause is used to fetch only unique records. The use of the DISTINCT
clause retains only one row from a set of duplicated rows.
You can use the DISTINCT
clause on single or multiple columns. For multiple columns, it uses a combination of the targeted columns’ values to determine the duplicates. The clause skips only those records that have duplicates in the selected columns.
The DISTINCT
clause must be used only with the SELECT
statement.
Syntax
The syntax for the DISTINCT clause in PostgreSQL is:
plaintextSELECT DISTINCT | DISTINCT ON (distinct_expressions)
expressions
FROM tables
[WHERE conditions];
Parameters or Arguments
Here are what the different parameters or arguments in the syntax imply:
distinct_expressions
: The expressions used to remove duplicates.expressions
: The columns to evaluate for unique values.tables
: The name of the tables to retrieve the records. You must specify at least one table with theFROM
operator.WHERE
conditions: This is optional. It represents the conditions that must be satisfied for the selection of records.
Important Points
You must keep some things in mind when you use the DISTINCT
clause syntax.
- If you provide only one expression in the
DISTINCT
clause, the query returns the unique values for that expression. - If you provide more than one expression in the
DISTINCT
clause, the query retrieves unique combinations for the listed expressions. - The
DISTINCT
clause doesn’t ignoreNULL
values. When you use the clause in your SQL statement,NULL
gets included as a distinct value in the result set.
How To Use the Postgres DISTINCT Clause?
Now that you know the Postgres DISTINCT
clause syntax, here are a few different ways to use it.
Different Postgres with DISTINCT Syntaxes
Depending on the different values of distinct_expressions
and the use of DISTINCT
or DISTINCT_ON
, there are three possible syntaxes.
Syntax #1
plaintextSELECT
DISTINCT column1
FROM
table_name;
This syntax uses the DISTINCT
PostgreSQL clause with a single expression. The DISTINCT
clause evaluates the duplicate rows in column1
of the table_name
. The output will only show the list of unique values in column1
.
Syntax #2
plaintextSELECT
DISTINCT column1, column2
FROM
table_name;
This syntax uses the DISTINCT
Postgres clause with multiple expressions. The clause is applied to the multiple columns of the specified table. It checks the combination of the columns of the table for unique values. After evaluation, the Postgres DISTINCT
clause displays the output for the combination of values in column1
and column2
.
Syntax #3
plaintextSELECT
DISTINCT ON (column1) column_alias,
column2
FROM
table_name
ORDER BY
column1,
column2;
When using the DISTINCT
clause in PostgreSQL, you have one more option called DISTINCT ON
. It is used with the SELECT
statement for removing duplicates from the query set result.
The DISTINCT ON
clause returns only the first row of each group of duplicates. The SELECT
statement returns the rows in random order. So, the use of DISTINCT ON
with the SELECT
statement will return a random first row of each group of the duplicate.
For a more organized result set, use the ORDER BY
clause with the DISTINCT ON
clause. The first row returned by the DISTINCT ON
will be based on the ORDER BY
clause provided in the query. Also, any other fields listed within the SELECT
statement will be returned for that first row.
Examples of Postgres DISTINCT Clause
To help you understand the use cases of the different syntaxes of the Postgres SELECT DISTINCT
clause, let’s look into a few examples. Consider the following table titled colorproperties
. It has two columns: fore_color
and back_color
.
Image Source: Educba
Postgres DISTINCT with Single Column
The PSQL SELECT DISTINCT
statement can be used to fetch the unique column values of fore_color
from the colorproperties
table. The following statement will help execute this. By using the ORDER BY
clause, the values in the fore_color
column will be sorted in alphabetical order.
plaintextSELECT
DISTINCT fore_color
FROM
ColorProperties
ORDER BY
fore_color
Image Source: Educba
Postgres DISTINCT with Multiple Columns
DISTINCT
in PSQL can also be used for multiple columns. The following statement is used for PSQL Distinct multiple columns. Here, the DISTINCT
clause is used for two columns i.e., fore_color
and back_color
. After evaluating the combination of the values in these two columns, it returns the unique values.
plaintextSELECT
DISTINCT fore_color,
Back_color
FROM
ColorProperties
ORDER BY
fore_color,
back_color;
Image Source: Educba
Since the SELECT DISTINCT
clause contains both fore_color
and back_color
, the values of both these columns are combined. The duplicate rows of the result set are then removed, and only the unique values of both columns are returned.
Postgres DISTINCT with DISTINCT ON
For a sorted result set after the removal of duplicate rows, here’s the syntax to use:
plaintextSELECT
DISTINCT ON
(back_color) backgroundcolor,
fore_color
FROM
ColorProperties
ORDER BY
back_color,
fore_color;
Image Source: Educba
The result set is sorted by back_color
and fore_color
. Then, for each group of duplicates, only the first row is kept in the returned result set.
Postgres DISTINCT for Multiple Tables
You can use the DISTINCT
PgSQL clause for multiple tables. Let’s consider an example of two tables named students
and departments
.
First, we use the CREATE TABLE
statement to create the students
table that consists of two columns: name
and department_id
.
plaintextCREATE TABLE students (
name text not null,
department_id text not null,
CONSTRAINT students_pk PRIMARY KEY (name)
);
Next, we insert some rows into the students
table by using the INSERT
statement.
plaintextINSERT INTO students (name, department_id) VALUES
(‘Jacob’, ‘101’),
(‘David’, ‘102’),
(‘John’,’103’);
Then, we create a table for departments
with two columns: department_id
and department_name
.
plaintextCREATE TABLE departments (
department_id text not null,
department_name text not null
);
Next, we insert values into the table.
plaintextINSERT INTO departments (department_id, department_name) VALUES
(‘101’, ‘Computer’),
(‘102’, ‘Electrical’),
(‘103’, ‘Mechanical’);
To query for unique values by joining the students and departments tables, here’s the statement to use:
plaintextSELECT DISTINCT ON (s.department_id) s.department_id, s.name,
d.department_name
FROM students s
JOIN departments d ON d.department_id = s.department_id
ORDER BY s.department_id DESC
The (s.department_id)
expression is defined for the DISTINCT ON
clause to order the result set.
Image Source: Educba
Conclusion
Whether you're working with a single column, multiple columns, or multiple tables, the Postgres DISTINCT
clause, coupled with the SELECT
statement, enables you to retrieve unique rows efficiently. For more intricate scenarios where you need a sorted result set, the DISTINCT ON
clause, combined with the ORDER BY
clause, offers an effective solution.
Additionally, for those engaging with Postgres in conjunction with other systems, there's an opportunity to enhance data movement. The Estuary Flow data integration and pipelining platform streamlines data transfer to and from PostgreSQL, ensuring efficiency and accuracy.
Moreover, in the realm of real-time data synchronization, the concept of Postgres Change Data Capture (CDC) emerges as a valuable technique. CDC enables the capture and tracking of database changes, facilitating seamless replication to other systems or enabling analytical insights. By identifying new, modified, or deleted records, CDC ensures data consistency across platforms, making it a vital tool for data-driven applications.
In summary, this article introduced PostgreSQL as a robust database system with features like the DISTINCT
clause for querying unique data. It also touched upon the potential of Change Data Capture for real-time data synchronization. By leveraging these tools, you can bolster your data management capabilities and empower your applications with accurate, up-to-date information.
Work with Postgres and other systems? Consider incorporating the Estuary Flow platform to facilitate swift and efficient data movement between PostgreSQL and various destinations.
Furthermore, explore the realm of Change Data Capture for real-time synchronization, elevating the performance of your data-intensive applications.
Check out the blog for tutorials to connect Postgres to a variety of source and destination systems, including:
About the author
With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.