Top SQL Functions Every Data Analyst Should Know
Knowing key functions can elevate your work, helping you extract insights faster, automate processes, and communicate findings more effectively.
Here’s a rundown of some of the top SQL functions every data analyst should master, along with examples to demonstrate their impact on real-world scenarios.
1. Aggregates: SUM, AVG, MIN, MAX, COUNT
Aggregate functions form the backbone of SQL analytics, allowing analysts to summarise data. They are invaluable when creating reports or gaining a quick overview of key performance metrics.
SUM: Adds up values in a column.
AVG: Calculates the average value of a numeric column.
MIN/MAX: Finds the smallest/largest value.
COUNT: Counts rows, useful for determining volume.
Example:
Imagine a retail business wanting to analyse monthly sales. You could use these functions to calculate total revenue, average order size, or determine the highest and lowest sales day.
SELECT
SUM(sales_amount) AS total_revenue,
AVG(sales_amount) AS average_order,
MIN(sales_amount) AS lowest_sale,
MAX(sales_amount) AS highest_sale,
COUNT(order_id) AS total_orders
FROM sales
WHERE MONTH(sale_date) = '2024-10';
2. Strings: CONCAT, LENGTH, LOWER, UPPER, SUBSTRING
String functions are helpful for cleaning and standardising text data, which may be messy or inconsistently formatted.
CONCAT: Joins two or more strings together.
LENGTH: Returns the length of a string.
LOWER/UPPER: Converts text to lowercase or uppercase.
SUBSTRING: Extracts a portion of a string.
Example:
If customer data is stored inconsistently, with some names in uppercase and others in lowercase, you can standardise them before analysis.
SELECT
CONCAT(UPPER(LEFT(first_name, 1)), LOWER(SUBSTRING(first_name, 2)))
AS formatted_first_name,
LENGTH(email) AS email_length
FROM customers;
3. Dates: DATEADD, DATEDIFF, YEAR, MONTH, DAY
Date functions are essential for any time-based calculations, such as tracking performance over time, identifying trends, seasonality analysis, etc.
DATEADD: Adds a specified interval to a date.
DATEDIFF: Calculates the difference between two dates.
YEAR/MONTH/DAY: Extracts parts of a date.
Example:
In a scenario where an e-commerce company wants to measure the time between a customer’s sign-up date and their first purchase, you can use `DATEDIFF`:
SELECT
customer_id,
DATEDIFF(first_purchase_date, signup_date) AS days_to_first_purchase
FROM customer_activity
WHERE first_purchase_date IS NOT NULL;
4. Windows: ROW_NUMBER, RANK, DENSE_RANK, NTILE
Window functions are advanced SQL tools that allow calculations across rows related to the current row, while still preserving row-by-row details. They’re particularly useful in ranking and sorting data, creating cumulative summaries, and working with moving averages.
ROW_NUMBER: Assigns a unique number to each row within a partition.
RANK: Assigns a ranking to rows within a partition, with gaps for ties.
DENSE_RANK: Similar to RANK, but without gaps.
NTILE: Divides rows into a specified number of approximately equal groups.
Example:
For a business wanting to reward their top-performing sales representatives, the following query ranks employees based on their sales:
SELECT
employee_id,
sales_amount,
RANK() OVER (ORDER BY sales_amount DESC) AS sales_rank
FROM employee_sales;
5. Nulls: COALESCE and NULLIF
These functions handle NULL values, a common issue in datasets. NULL values can lead to unexpected results if not properly managed.
COALESCE: Returns the first non-NULL value in a list of arguments.
NULLIF: Returns NULL if two values are equal; otherwise, returns the first value.
Example:
If a company wants to provide a default value for missing data in a customer’s address:
SELECT
customer_id,
COALESCE(address, 'Address not provided') AS customer_address
FROM customers;
6. Logic: CASE
The CASE function enables conditional logic within SQL, similar to IF-ELSE statements in programming. It’s useful for creating new categories or segmenting data.
Example:
If you want to categorise customers based on their purchase amounts, CASE can help:
SELECT
customer_id,
purchase_amount,
CASE
WHEN purchase_amount >= 1000 THEN 'High Spender'
WHEN purchase_amount >= 500 THEN 'Medium Spender'
ELSE 'Low Spender'
END AS spending_category
FROM customer_purchases;
7. Mathematical: ROUND, CEIL, FLOOR, ABS
Mathematical functions are essential for rounding values, calculating percentages, and more. They’re especially useful for financial reporting, where decimal precision matters.
ROUND: Rounds a number to a specified decimal place.
CEIL/FLOOR: Rounds up or down to the nearest integer.
ABS: Returns the absolute value, useful in handling potential negative values.
Example:
If a company wants to calculate discounts but needs values to be rounded to the nearest dollar:
SELECT
product_id,
price,
ROUND(price * 0.9, 0) AS discounted_price
FROM products;
8. JSON Functions: JSON_VALUE, JSON_QUERY
With the rise of semi-structured data, JSON functions in SQL have become indispensable for analysts. They allow analysts to parse and manipulate JSON data directly within SQL queries, an asset when dealing with complex datasets from APIs.
JSON_VALUE: Extracts a value from a JSON string.
JSON_QUERY: Returns an object or an array from JSON data.
Example:
Imagine a dataset of customer orders with each order’s details stored in JSON format:
SELECT
order_id,
JSON_VALUE(order_details, '$.product_name') AS product_name,
JSON_VALUE(order_details, '$.price') AS price
FROM orders;
9. Pivoting Data: CASE and GROUP BY Combination
SQL doesn’t have a built-in pivot function, but you can use a combination of CASE and GROUP BY to achieve pivot-like results, which is useful in summarising data by multiple dimensions.
Example:
If a company wants to view monthly sales broken down by product category, you could use:
SELECT
category,
SUM(CASE WHEN MONTH(sale_date) = 1 THEN sales_amount ELSE 0 END) AS Jan_Sales,
SUM(CASE WHEN MONTH(sale_date) = 2 THEN sales_amount ELSE 0 END) AS Feb_Sales
FROM sales
GROUP BY category;