When working with databases, it's often useful to summarize or group your data to gain insights and perform analysis. The GROUP BY
clause in SQL allows you to group rows that share the same values in one or more columns and then apply aggregate functions such as SUM()
, COUNT()
, AVG()
, MAX()
, and MIN()
to these groups.
The GROUP BY
clause is essential when you need to aggregate data by categories (e.g., total sales by product, average salary by department) and is typically used in conjunction with aggregate functions.
The basic syntax of the GROUP BY
clause is quite simple. You first use the GROUP BY
keyword followed by the column(s) that you want to group your data by.
SELECT column1, column2, AGGREGATE_FUNCTION(column3)
FROM table_name
WHERE condition
GROUP BY column1, column2;
column1, column2, ...
: The columns you want to select (typically the ones you're grouping by).AGGREGATE_FUNCTION(column3)
: An aggregate function such as SUM()
, COUNT()
, AVG()
, etc., applied to another column (typically numeric).table_name
: The table you're selecting from.WHERE condition
: A condition to filter the rows before applying the grouping.GROUP BY column1, column2
: The column(s) you want to group by.Let’s begin with a simple example using the COUNT()
function. Suppose you have a sales
table that records each sale transaction. You want to know how many sales occurred for each product.
SELECT product_id, COUNT(*) AS total_sales
FROM sales
GROUP BY product_id;
Explanation:
COUNT(*)
function counts the number of rows for each unique product_id
in the sales
table.GROUP BY product_id
groups the rows based on the product_id
, so the result will show the total number of sales for each product.Result:
product_id | total_sales |
---|---|
101 | 50 |
102 | 75 |
103 | 30 |
This query returns the count of sales per product, grouped by product_id
.
The SUM()
function is another common aggregate function used with the GROUP BY
clause. It allows you to calculate the total sum of a column’s values for each group. For example, to calculate the total sales revenue for each product, you can use the following query:
SELECT product_id, SUM(sale_amount) AS total_revenue
FROM sales
GROUP BY product_id;
Explanation:
SUM(sale_amount)
function calculates the total revenue for each product by summing the sale_amount
values within each product_id
group.GROUP BY product_id
groups the data by product, so you get the total revenue for each product.Result:
product_id | total_revenue |
---|---|
101 | 5000 |
102 | 12000 |
103 | 4500 |
This query gives you the total revenue generated by each product.
The AVG()
function calculates the average value of a numeric column for each group. This is particularly useful for finding the average score, salary, price, etc. Let’s say you want to find the average salary for each department in your company.
SELECT department_id, AVG(salary) AS average_salary
FROM employees
GROUP BY department_id;
Explanation:
AVG(salary)
function calculates the average salary for each department.GROUP BY department_id
groups employees by their department, so the result shows the average salary per department.Result:
department_id | average_salary |
---|---|
1 | 55000 |
2 | 65000 |
3 | 48000 |
This query returns the average salary for each department.
You can also group by multiple columns. When you do this, SQL will first group the data by the first column, and if there are ties, it will then group by the second column, and so on.
Suppose you have a sales
table with product_id
, salesperson_id
, and sale_amount
. You want to know how much each salesperson sold per product.
SELECT product_id, salesperson_id, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY product_id, salesperson_id;
Explanation:
product_id
, and then within each product, it groups by salesperson_id
.SUM(sale_amount)
calculates the total sales for each salesperson within each product group.Result:
product_id | salesperson_id | total_sales |
---|---|---|
101 | 1 | 1000 |
101 | 2 | 2000 |
102 | 1 | 1500 |
102 | 3 | 3000 |
This query shows the total sales per product, broken down by salesperson.
The HAVING
clause is used to filter the results after the GROUP BY
operation. It is similar to the WHERE
clause, but while WHERE
filters rows before grouping, HAVING
filters groups after the aggregation has been applied. It is useful when you want to filter groups based on aggregate functions like SUM()
, COUNT()
, or AVG()
.
Let’s say you want to find the products that have total sales greater than $5000:
SELECT product_id, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY product_id
HAVING SUM(sale_amount) > 5000;
Explanation:
SUM(sale_amount)
calculates the total sales for each product.HAVING
clause filters out products where the total sales are less than or equal to $5000.Result:
product_id | total_sales |
---|---|
102 | 12000 |
103 | 8000 |
This query returns only the products whose total sales exceed $5000.
You can combine the GROUP BY
clause with the ORDER BY
clause to sort the result after grouping. This is useful when you want to present the grouped results in a specific order, such as sorting by the total revenue or average salary.
Let’s say you want to find the total revenue for each product and order the results by the total revenue in descending order:
SELECT product_id, SUM(sale_amount) AS total_revenue
FROM sales
GROUP BY product_id
ORDER BY total_revenue DESC;
Explanation:
SUM(sale_amount)
calculates the total revenue for each product.GROUP BY product_id
groups the data by product.ORDER BY total_revenue DESC
sorts the results by total revenue in descending order.Result:
product_id | total_revenue |
---|---|
102 | 12000 |
101 | 5000 |
103 | 4500 |
This query shows the total revenue per product, ordered by revenue from highest to lowest.