Top SQL Functions Every Data Analyst Should Know

 
monitor with blurred code

SQL (Structured Query Language) remains one of the most essential tools for data analysts, enabling data manipulation, retrieval, and aggregation to make sense of raw data.

Knowing key functions can elevate your work, helping you extract insights faster, automate processes, and communicate findings more effectively.

Here’s a rundown of some of the top SQL functions every data analyst should master, along with examples to demonstrate their impact on real-world scenarios.


1. Aggregates: SUM, AVG, MIN, MAX, COUNT

Aggregate functions form the backbone of SQL analytics, allowing analysts to summarise data. They are invaluable when creating reports or gaining a quick overview of key performance metrics.

  • SUM: Adds up values in a column.

  • AVG: Calculates the average value of a numeric column.

  • MIN/MAX: Finds the smallest/largest value.

  • COUNT: Counts rows, useful for determining volume.

Example:

Imagine a retail business wanting to analyse monthly sales. You could use these functions to calculate total revenue, average order size, or determine the highest and lowest sales day.

SELECT
    SUM(sales_amount) AS total_revenue,
    AVG(sales_amount) AS average_order,
    MIN(sales_amount) AS lowest_sale,
    MAX(sales_amount) AS highest_sale,
    COUNT(order_id) AS total_orders
FROM sales
WHERE MONTH(sale_date) = '2024-10'; 

  

2. Strings: CONCAT, LENGTH, LOWER, UPPER, SUBSTRING

String functions are helpful for cleaning and standardising text data, which may be messy or inconsistently formatted. 

  • CONCAT: Joins two or more strings together.

  • LENGTH: Returns the length of a string.

  • LOWER/UPPER: Converts text to lowercase or uppercase.

  • SUBSTRING: Extracts a portion of a string.

Example:

If customer data is stored inconsistently, with some names in uppercase and others in lowercase, you can standardise them before analysis.

SELECT 
    CONCAT(UPPER(LEFT(first_name, 1)), LOWER(SUBSTRING(first_name, 2))) 
         AS formatted_first_name,
    LENGTH(email) AS email_length
FROM customers;

 

3. Dates: DATEADD, DATEDIFF, YEAR, MONTH, DAY

Date functions are essential for any time-based calculations, such as tracking performance over time, identifying trends, seasonality analysis, etc.

  • DATEADD: Adds a specified interval to a date.

  • DATEDIFF: Calculates the difference between two dates.

  • YEAR/MONTH/DAY: Extracts parts of a date. 

Example:

In a scenario where an e-commerce company wants to measure the time between a customer’s sign-up date and their first purchase, you can use `DATEDIFF`:

SELECT 
    customer_id,
    DATEDIFF(first_purchase_date, signup_date) AS days_to_first_purchase
FROM customer_activity
WHERE first_purchase_date IS NOT NULL;


4. Windows: ROW_NUMBER, RANK, DENSE_RANK, NTILE

Window functions are advanced SQL tools that allow calculations across rows related to the current row, while still preserving row-by-row details. They’re particularly useful in ranking and sorting data, creating cumulative summaries, and working with moving averages.

  • ROW_NUMBER: Assigns a unique number to each row within a partition.

  • RANK: Assigns a ranking to rows within a partition, with gaps for ties.

  • DENSE_RANK: Similar to RANK, but without gaps.

  • NTILE: Divides rows into a specified number of approximately equal groups.

Example:

For a business wanting to reward their top-performing sales representatives, the following query ranks employees based on their sales:

SELECT 
    employee_id,
    sales_amount,
    RANK() OVER (ORDER BY sales_amount DESC) AS sales_rank
FROM employee_sales;

 

5. Nulls: COALESCE and NULLIF

These functions handle NULL values, a common issue in datasets. NULL values can lead to unexpected results if not properly managed.

  • COALESCE: Returns the first non-NULL value in a list of arguments.

  • NULLIF: Returns NULL if two values are equal; otherwise, returns the first value.

Example:

If a company wants to provide a default value for missing data in a customer’s address:

SELECT 
    customer_id,
    COALESCE(address, 'Address not provided') AS customer_address
FROM customers;

 

6. Logic: CASE

The CASE function enables conditional logic within SQL, similar to IF-ELSE statements in programming. It’s useful for creating new categories or segmenting data.

Example:

If you want to categorise customers based on their purchase amounts, CASE can help:

SELECT 
    customer_id,
    purchase_amount,
    CASE 
        WHEN purchase_amount >= 1000 THEN 'High Spender'
        WHEN purchase_amount >= 500 THEN 'Medium Spender'
        ELSE 'Low Spender'
    END AS spending_category
FROM customer_purchases;

7. Mathematical: ROUND, CEIL, FLOOR, ABS

Mathematical functions are essential for rounding values, calculating percentages, and more. They’re especially useful for financial reporting, where decimal precision matters.

  • ROUND: Rounds a number to a specified decimal place.

  • CEIL/FLOOR: Rounds up or down to the nearest integer.

  • ABS: Returns the absolute value, useful in handling potential negative values.

Example:

If a company wants to calculate discounts but needs values to be rounded to the nearest dollar:

SELECT 
    product_id,
    price,
    ROUND(price * 0.9, 0) AS discounted_price
FROM products;

 

8. JSON Functions: JSON_VALUE, JSON_QUERY

With the rise of semi-structured data, JSON functions in SQL have become indispensable for analysts. They allow analysts to parse and manipulate JSON data directly within SQL queries, an asset when dealing with complex datasets from APIs.

  • JSON_VALUE: Extracts a value from a JSON string.

  • JSON_QUERY: Returns an object or an array from JSON data.

Example:

Imagine a dataset of customer orders with each order’s details stored in JSON format:

SELECT 
    order_id,
    JSON_VALUE(order_details, '$.product_name') AS product_name,
    JSON_VALUE(order_details, '$.price') AS price
FROM orders;

 

9. Pivoting Data: CASE and GROUP BY Combination

SQL doesn’t have a built-in pivot function, but you can use a combination of CASE and GROUP BY to achieve pivot-like results, which is useful in summarising data by multiple dimensions.

Example:

If a company wants to view monthly sales broken down by product category, you could use:

SELECT 
    category,
    SUM(CASE WHEN MONTH(sale_date) = 1 THEN sales_amount ELSE 0 END) AS Jan_Sales,
    SUM(CASE WHEN MONTH(sale_date) = 2 THEN sales_amount ELSE 0 END) AS Feb_Sales
FROM sales
GROUP BY category;


 
Lachlan McKenzieSQL