SQL GROUP BY Clause


When working with databases, it's often useful to summarize or group your data to gain insights and perform analysis. The GROUP BY clause in SQL allows you to group rows that share the same values in one or more columns and then apply aggregate functions such as SUM(), COUNT(), AVG(), MAX(), and MIN() to these groups.

The GROUP BY clause is essential when you need to aggregate data by categories (e.g., total sales by product, average salary by department) and is typically used in conjunction with aggregate functions.


1. Basic Syntax of the SQL GROUP BY Clause

The basic syntax of the GROUP BY clause is quite simple. You first use the GROUP BY keyword followed by the column(s) that you want to group your data by.

SELECT column1, column2, AGGREGATE_FUNCTION(column3)
FROM table_name
WHERE condition
GROUP BY column1, column2;
  • column1, column2, ...: The columns you want to select (typically the ones you're grouping by).
  • AGGREGATE_FUNCTION(column3): An aggregate function such as SUM(), COUNT(), AVG(), etc., applied to another column (typically numeric).
  • table_name: The table you're selecting from.
  • WHERE condition: A condition to filter the rows before applying the grouping.
  • GROUP BY column1, column2: The column(s) you want to group by.

2. Simple Example of GROUP BY with COUNT()

Let’s begin with a simple example using the COUNT() function. Suppose you have a sales table that records each sale transaction. You want to know how many sales occurred for each product.

Example:

SELECT product_id, COUNT(*) AS total_sales
FROM sales
GROUP BY product_id;

Explanation:

  • The COUNT(*) function counts the number of rows for each unique product_id in the sales table.
  • The GROUP BY product_id groups the rows based on the product_id, so the result will show the total number of sales for each product.

Result:

product_id total_sales
101 50
102 75
103 30

This query returns the count of sales per product, grouped by product_id.


3. Using GROUP BY with SUM()

The SUM() function is another common aggregate function used with the GROUP BY clause. It allows you to calculate the total sum of a column’s values for each group. For example, to calculate the total sales revenue for each product, you can use the following query:

Example:

SELECT product_id, SUM(sale_amount) AS total_revenue
FROM sales
GROUP BY product_id;

Explanation:

  • The SUM(sale_amount) function calculates the total revenue for each product by summing the sale_amount values within each product_id group.
  • The GROUP BY product_id groups the data by product, so you get the total revenue for each product.

Result:

product_id total_revenue
101 5000
102 12000
103 4500

This query gives you the total revenue generated by each product.


4. Using GROUP BY with AVG()

The AVG() function calculates the average value of a numeric column for each group. This is particularly useful for finding the average score, salary, price, etc. Let’s say you want to find the average salary for each department in your company.

Example:

SELECT department_id, AVG(salary) AS average_salary
FROM employees
GROUP BY department_id;

Explanation:

  • The AVG(salary) function calculates the average salary for each department.
  • The GROUP BY department_id groups employees by their department, so the result shows the average salary per department.

Result:

department_id average_salary
1 55000
2 65000
3 48000

This query returns the average salary for each department.


5. GROUP BY with Multiple Columns

You can also group by multiple columns. When you do this, SQL will first group the data by the first column, and if there are ties, it will then group by the second column, and so on.

Example:

Suppose you have a sales table with product_id, salesperson_id, and sale_amount. You want to know how much each salesperson sold per product.

SELECT product_id, salesperson_id, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY product_id, salesperson_id;

Explanation:

  • The query groups the data first by product_id, and then within each product, it groups by salesperson_id.
  • The SUM(sale_amount) calculates the total sales for each salesperson within each product group.

Result:

product_id salesperson_id total_sales
101 1 1000
101 2 2000
102 1 1500
102 3 3000

This query shows the total sales per product, broken down by salesperson.


6. Using GROUP BY with HAVING

The HAVING clause is used to filter the results after the GROUP BY operation. It is similar to the WHERE clause, but while WHERE filters rows before grouping, HAVING filters groups after the aggregation has been applied. It is useful when you want to filter groups based on aggregate functions like SUM(), COUNT(), or AVG().

Example:

Let’s say you want to find the products that have total sales greater than $5000:

SELECT product_id, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY product_id
HAVING SUM(sale_amount) > 5000;

Explanation:

  • The SUM(sale_amount) calculates the total sales for each product.
  • The HAVING clause filters out products where the total sales are less than or equal to $5000.

Result:

product_id total_sales
102 12000
103 8000

This query returns only the products whose total sales exceed $5000.


7. GROUP BY with ORDER BY

You can combine the GROUP BY clause with the ORDER BY clause to sort the result after grouping. This is useful when you want to present the grouped results in a specific order, such as sorting by the total revenue or average salary.

Example:

Let’s say you want to find the total revenue for each product and order the results by the total revenue in descending order:

SELECT product_id, SUM(sale_amount) AS total_revenue
FROM sales
GROUP BY product_id
ORDER BY total_revenue DESC;

Explanation:

  • The SUM(sale_amount) calculates the total revenue for each product.
  • The GROUP BY product_id groups the data by product.
  • The ORDER BY total_revenue DESC sorts the results by total revenue in descending order.

Result:

product_id total_revenue
102 12000
101 5000
103 4500

This query shows the total revenue per product, ordered by revenue from highest to lowest.