When working with databases, duplicate records can sometimes clutter your results. To retrieve only the distinct or unique values from your database, the SELECT DISTINCT
statement in SQL is your best friend. It helps you eliminate duplicate rows from your result set and ensures that you retrieve only the unique data you need.
The SELECT DISTINCT
statement is used in SQL to return only distinct (unique) values in the result set. It filters out duplicate rows and ensures that only one instance of each distinct value appears in the query result.
SELECT DISTINCT column1, column2, ...
FROM table_name;
column1, column2, ...
: The columns from which you want to retrieve unique values.table_name
: The table from which you are querying data.Using DISTINCT
will ensure that the result set includes only unique combinations of values based on the selected columns.
Let’s walk through an example to see how DISTINCT
works in action.
Consider the following employees
table:
id | first_name | last_name | city |
---|---|---|---|
1 | John | Doe | New York |
2 | Jane | Smith | Chicago |
3 | Alice | Johnson | New York |
4 | Mark | Lee | Chicago |
5 | Sarah | Connor | New York |
6 | John | Doe | Chicago |
If you want to retrieve a list of all unique cities from the employees
table, you can use the DISTINCT
keyword:
SELECT DISTINCT city
FROM employees;
Result:
city |
---|
New York |
Chicago |
In this case, the query returns only the unique cities—New York and Chicago—eliminating the duplicate entries.
You can use SELECT DISTINCT
with multiple columns to retrieve unique combinations of data. When you use multiple columns, SQL will return distinct rows based on the combination of all the columns specified in the query.
If you want to retrieve unique combinations of first_name
and city
from the employees
table, you can do so by selecting both columns with DISTINCT
:
SELECT DISTINCT first_name, city
FROM employees;
Result:
first_name | city |
---|---|
John | New York |
Jane | Chicago |
Alice | New York |
Mark | Chicago |
Sarah | New York |
This query returns the distinct pairs of first_name
and city
. Notice that John appears in both New York and Chicago, but each pair of first name and city is unique.
You should use the SELECT DISTINCT
statement when:
Imagine you have a sales
table and you want to find all the unique products sold:
SELECT DISTINCT product_name
FROM sales;
This query will return a list of all unique products sold, without any repetitions.
While SELECT DISTINCT
is a powerful tool for eliminating duplicates, there are a few things to consider:
Performance: Using DISTINCT
can be performance-intensive on large tables because SQL needs to compare each row with others to find duplicates. If you're working with large datasets, be mindful of performance.
Distinct on Multiple Columns: When you use DISTINCT
with multiple columns, SQL returns unique combinations of those columns. This means that if you select more than one column, SQL will return only rows where the combination of all selected columns is unique.
For example, this query:
SELECT DISTINCT first_name, city
FROM employees;
Will return unique combinations of both first_name
and city
, not just distinct first_name
values or city
values individually.
DISTINCT
treats NULL
values as unique. So if a column contains multiple NULL
values, they will all be considered distinct and will be displayed in the result set.