The SQL CASE
Statement is a conditional expression that allows me to create a new temporary column in my result set based on If/Then logic. A CASE
statement is added to the SELECT
statement; it begins with CASE
and ends with END AS new_field_name
. Don't forget the comma after the last item in your SELECT
list! This is a common error for me, so I want to point it out.
CASE
Statements allow me to create a categorical value from a continuous one by binning my data or creating buckets.CASE
Statements comes in very handy when I want to create new fields from existing data in my dataset using conditionals, but I don't have database permissions to create a new table. Temporary tables are useful in this situation, as well. I'll dig into those next.For example, if I have a field called
high_temp
containing integers that represent the high temperature recorded for a particular day, I can add aCASE
statement to my query that will create a new field calledweather
classifying values inhigh_temp
as one of three categorical variables,cold
,warm
, orhot
, based on temperature ranges that I set as conditionals.
I can create categorical variables from a numeric one. I can even add a GROUP BY
and a COUNT(*)
to find out which category in the weather
field has the highest count.
USE us_weather_db;
SELECT high_temp,
COUNT(*) AS days,
CASE
WHEN high_temp > 90 THEN 'hot'
WHEN high_temp BETWEEN 70 AND 90 THEN 'warm'
WHEN high_temp < 70 THEN 'cold'
ELSE 'idk'
END AS weather
FROM us_weather
GROUP BY high_temp
ORDER BY days DESC;
My query can have conditionals that use values from more than one column or field to create a new column. By throwing in a COUNT(*) AS number_customers
, I have a useful result set for making comparisons.
SELECT COUNT(*) AS number_customers,
CASE
WHEN internet_service_type_id = 1 AND device_protection = 'Yes'
THEN 'DSL_Protected'
WHEN internet_service_type_id = 1 AND device_protection = 'No'
THEN 'DSL_Unprotected'
WHEN internet_service_type_id = 2 AND device_protection = 'Yes'
THEN 'Fiber_Optic_Protected'
WHEN internet_service_type_id = 2 AND device_protection = 'No'
THEN 'Fiber_Optic_Unprotected'
WHEN internet_service_type_id = 3
THEN 'No_Internet_Service'
ELSE internet_service_type_id
END AS internet_types
FROM customers
GROUP BY internet_types
ORDER BY number_customers;
-> 1065 DSL_Protected
1356 DSL_Unprotected
1357 Fiber_Optic_Protected
1526 No_Internet_Service
1739 Fiber_Optic_Unprotected
Sometimes my dataset may be messy (hahaha, sometimes), and I will need to normalize the values. CASE
can come in handy here, as well. Notice that my CASE Statement syntax is slightly different here; since I'm only referencing a single column in my logic, and I'm not only testing for equality, I can write my statement like you see below.
SELECT title,
salary,
gender,
CASE gender
WHEN 'M' THEN 'male'
WHEN '0' THEN 'male'
WHEN 'F' THEN 'female'
WHEN '1' THEN 'female'
ELSE 'neither'
END AS gender_norm
FROM us_employees;
Sometimes it's easier to work with a boolean value, and CASE
can be helpful here, as well. Here I create a new column called is_female
containing the value 1
if female
and 0
if male
USE titanic_db;
SELECT sex,
CASE sex
WHEN 'female' THEN 1
ELSE 0
END AS is_female
FROM passengers;
-> male 0
female 1
female 1
female 1
male 0
I can also do this using the IF(condition, val_if_true, val_if_false)
function in my query.
SELECT sex,
IF(sex = 'female', True, False) AS is_female
FROM passengers;
-> male 0
female 1
female 1
female 1
male 0
I can add a condition in my SELECT
statement that will return 1
if the condition evaluates to True and a 1
if the condition evaluates to False.
SELECT sex,
sex = 'female' AS is_female
FROM passengers;
-> male 0
female 1
female 1
female 1
male 0
Temporary Tables exist only until you end your database session; if you lose your connection or close your database session, these tables will disappear.
I can create a temporary table when I want to query a subset of a dataset. I can modify a temporary table even when I don't have the needed permissions to modify data in a database.
-- Create a temporary table using my database access.
USE bayes_123;
-- I want a table that brings over only the current employees in the Customer Service department.
CREATE TEMPORARY TABLE dept_employees AS
SELECT emp_no,
CONCAT(first_name, ' ', last_name) AS full_name,
dept_no,
dept_name,
salary
FROM employees.employees
JOIN employees.salaries USING(emp_no)
JOIN employees.dept_emp USING(emp_no)
JOIN employees.departments USING(dept_no)
WHERE employees.departments.dept_name = 'Customer Service'
AND employees.dept_emp.to_date > CURDATE()
AND employees.salaries.to_date > CURDATE();
-- Once I have created my dept_employees table, I can query this subset of the employees dataset.
SELECT ROUND(AVG(salary),2) AS avg_salary
FROM dept_employees;
After you create a temporary table, you can make changes.
USE bayes_123;
-- Create temporary table
CREATE TEMPORARY TABLE temp_sakila AS
SELECT *
FROM sakila.payment;
-- Check out my data.
SELECT *
FROM temp_sakila;
-- Check out information about temp table structure.
DESCRIBE temp_sakila;
-- Transform the amount column from float to integer data type.
ALTER TABLE temp_sakila MODIFY amount INT UNSIGNED;
-- I can add new columns to a temp table.
ALTER TABLE temp_sakila ADD new_col_1 INT UNSIGNED;
ALTER TABLE temp_sakila ADD new_col_2 VARCHAR(25);
-- Insert records into a tamp table.
INSERT INTO temp_sakila (new_col_1, new_col_2) VALUES (val_1, val_2);
-- Update existing data in the temp table.
UPDATE temp_sakila SET new_col_1 = amount * 100;
-- I can update my temptable using a `WHERE` clause if I don't want to update all values in a field or fields.
UPDATE temp_sakila SET new_col_1 = some_val, new_col_2 = other_val WHERE condition;
-- I can drop columns from my temp table.
ALTER TABLE temp_sakila DROP COLUMN new_col_1;
-- You can remove a temporary table before ending a session or terminating a connection.
DROP TEMPORARY TABLE table_name;