SQL SELECT DISTINCT


When working with databases, duplicate records can sometimes clutter your results. To retrieve only the distinct or unique values from your database, the SELECT DISTINCT statement in SQL is your best friend. It helps you eliminate duplicate rows from your result set and ensures that you retrieve only the unique data you need.


What is SQL SELECT DISTINCT?

The SELECT DISTINCT statement is used in SQL to return only distinct (unique) values in the result set. It filters out duplicate rows and ensures that only one instance of each distinct value appears in the query result.

Syntax:

SELECT DISTINCT column1, column2, ...
FROM table_name;
  • column1, column2, ...: The columns from which you want to retrieve unique values.
  • table_name: The table from which you are querying data.

Using DISTINCT will ensure that the result set includes only unique combinations of values based on the selected columns.


How Does SELECT DISTINCT Work?

Let’s walk through an example to see how DISTINCT works in action.

Example Data:

Consider the following employees table:

id first_name last_name city
1 John Doe New York
2 Jane Smith Chicago
3 Alice Johnson New York
4 Mark Lee Chicago
5 Sarah Connor New York
6 John Doe Chicago

If you want to retrieve a list of all unique cities from the employees table, you can use the DISTINCT keyword:

SELECT DISTINCT city
FROM employees;

Result:

city
New York
Chicago

In this case, the query returns only the unique cities—New York and Chicago—eliminating the duplicate entries.


Using DISTINCT with Multiple Columns

You can use SELECT DISTINCT with multiple columns to retrieve unique combinations of data. When you use multiple columns, SQL will return distinct rows based on the combination of all the columns specified in the query.

Example:

If you want to retrieve unique combinations of first_name and city from the employees table, you can do so by selecting both columns with DISTINCT:

SELECT DISTINCT first_name, city
FROM employees;

Result:

first_name city
John New York
Jane Chicago
Alice New York
Mark Chicago
Sarah New York

This query returns the distinct pairs of first_name and city. Notice that John appears in both New York and Chicago, but each pair of first name and city is unique.


When to Use SELECT DISTINCT

You should use the SELECT DISTINCT statement when:

  • You want to eliminate duplicate values: When a query returns duplicate rows and you only need unique values.
  • You need unique combinations: When you want to find unique pairs or combinations of values in your data.
  • You are summarizing data: If you need to filter out redundant values and focus on unique entries for analysis.

Example:

Imagine you have a sales table and you want to find all the unique products sold:

SELECT DISTINCT product_name
FROM sales;

This query will return a list of all unique products sold, without any repetitions.


Considerations and Limitations of SELECT DISTINCT

While SELECT DISTINCT is a powerful tool for eliminating duplicates, there are a few things to consider:

  1. Performance: Using DISTINCT can be performance-intensive on large tables because SQL needs to compare each row with others to find duplicates. If you're working with large datasets, be mindful of performance.

  2. Distinct on Multiple Columns: When you use DISTINCT with multiple columns, SQL returns unique combinations of those columns. This means that if you select more than one column, SQL will return only rows where the combination of all selected columns is unique.

For example, this query:

SELECT DISTINCT first_name, city
FROM employees;

Will return unique combinations of both first_name and city, not just distinct first_name values or city values individually.

  1. Does Not Eliminate NULL Values: DISTINCT treats NULL values as unique. So if a column contains multiple NULL values, they will all be considered distinct and will be displayed in the result set.