🔍 SQL Fundamentals¶
Prerequisites: 19_data_processing/01_sqlite_basics
Estimated Time: 2-2.5 hours
Master the essential SQL commands for querying and manipulating data.
What is SQL?¶
SQL (Structured Query Language) is the standard language for interacting with relational databases. It allows you to:
- Query data with
SELECT - Insert data with
INSERT - Update data with
UPDATE - Delete data with
DELETE
Think of it like this: If the database is a filing cabinet, SQL is the language you use to ask the librarian for specific files.
The SELECT Statement¶
The most commonly used SQL command. It retrieves data from one or more tables.
Basic Syntax¶
Filtering with WHERE¶
SELECT * FROM employees WHERE department = 'Sales';
SELECT * FROM products WHERE price > 100;
SELECT * FROM users WHERE age BETWEEN 18 AND 65;
Comparison Operators¶
| Operator | Meaning |
|---|---|
= |
Equal to |
!= or <> |
Not equal to |
< |
Less than |
> |
Greater than |
<= |
Less than or equal |
>= |
Greater than or equal |
BETWEEN |
Within a range |
IN |
Match any in a list |
LIKE |
Pattern matching |
IS NULL |
Is NULL value |
Logical Operators¶
-- AND: Both conditions must be true
SELECT * FROM products WHERE price > 50 AND stock > 0;
-- OR: At least one condition must be true
SELECT * FROM employees WHERE department = 'Sales' OR department = 'Marketing';
-- NOT: Negates a condition
SELECT * FROM products WHERE NOT category = 'Electronics';
-- Combining them
SELECT * FROM employees
WHERE (department = 'Sales' OR department = 'Marketing')
AND salary > 50000;
Sorting and Limiting Results¶
ORDER BY¶
-- Ascending (default)
SELECT * FROM products ORDER BY price;
-- Descending
SELECT * FROM products ORDER BY price DESC;
-- Multiple columns
SELECT * FROM employees ORDER BY department ASC, salary DESC;
LIMIT¶
-- Top 10 most expensive products
SELECT * FROM products ORDER BY price DESC LIMIT 10;
-- Skip first 20, get next 10 (pagination)
SELECT * FROM products LIMIT 10 OFFSET 20;
Aggregation Functions¶
Aggregate functions perform calculations on a set of values.
-- Count rows
SELECT COUNT(*) FROM employees;
SELECT COUNT(DISTINCT department) FROM employees;
-- Sum values
SELECT SUM(salary) FROM employees;
-- Average
SELECT AVG(price) FROM products;
-- Minimum and Maximum
SELECT MIN(price), MAX(price) FROM products;
GROUP BY¶
Group rows that have the same values into summary rows.
-- Employees per department
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;
-- Average salary by department
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
HAVING Clause¶
Filter groups (used instead of WHERE for aggregate conditions):
-- Departments with more than 5 employees
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
WHERE vs HAVING:
WHEREfilters rows before grouping,HAVINGfilters groups after aggregation.
Pattern Matching with LIKE¶
-- Names starting with 'J'
SELECT * FROM employees WHERE name LIKE 'J%';
-- Emails ending with '@company.com'
SELECT * FROM users WHERE email LIKE '%@company.com';
-- Titles containing 'Manager'
SELECT * FROM jobs WHERE title LIKE '%Manager%';
-- Single character wildcard
SELECT * FROM products WHERE code LIKE 'A__'; -- A followed by 2 chars
| Wildcard | Meaning |
|---|---|
% |
Zero or more characters |
_ |
Exactly one character |
Joining Tables¶
Combine rows from two or more tables based on related columns.
INNER JOIN¶
Returns only matching rows from both tables.
SELECT employees.name, departments.name as dept_name
FROM employees
INNER JOIN departments ON employees.dept_id = departments.id;
LEFT JOIN¶
Returns all rows from the left table, and matching rows from the right.
-- All employees, even those without a department
SELECT employees.name, departments.name as dept_name
FROM employees
LEFT JOIN departments ON employees.dept_id = departments.id;
Join Types Visual¶
INNER JOIN: LEFT JOIN: RIGHT JOIN: FULL JOIN:
A B A B A B A B
◆====◆ ◆====◆ ◆====◆ ◆====◆
◆====◆ ◆====◆ ◆====◆ ◆ ◆
◆ ◆ ◆====◆ ◆====◆
Subqueries¶
A query within another query.
-- Employees earning above average salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Products in the same category as 'Laptop'
SELECT name FROM products
WHERE category = (SELECT category FROM products WHERE name = 'Laptop');
SQL Command Quick Reference¶
| Task | SQL |
|---|---|
| Select all columns | SELECT * FROM table |
| Select specific columns | SELECT col1, col2 FROM table |
| Filter rows | SELECT * FROM table WHERE condition |
| Sort results | SELECT * FROM table ORDER BY col ASC/DESC |
| Limit results | SELECT * FROM table LIMIT n |
| Count rows | SELECT COUNT(*) FROM table |
| Sum column | SELECT SUM(col) FROM table |
| Average | SELECT AVG(col) FROM table |
| Group by | SELECT col, COUNT(*) FROM table GROUP BY col |
| Join tables | SELECT * FROM t1 JOIN t2 ON t1.id = t2.id |
Common Mistakes¶
- Forgetting GROUP BY - Must include all non-aggregated columns in GROUP BY
- Using WHERE with aggregates - Use HAVING instead of WHERE for aggregate conditions
- Case sensitivity - SQL keywords are not case-sensitive, but string comparisons might be
- Missing quotes - String literals need single quotes:
'value'notvalue - Ambiguous column names - Specify table name when joining:
table.column
✅ Before You Continue¶
- What's the difference between WHERE and HAVING?
- When would you use a LEFT JOIN instead of an INNER JOIN?
- How do you find duplicate values in a table?
- What's the difference between COUNT(*) and COUNT(column)?
Next Steps¶
- Run
examples.pyto see SQL queries in action - Practice with
exercises.py - Learn data cleaning techniques next!