SQL for beginners
There are plenty of articles on how to get going with SQL (or MySQL) for beginners. However, there aren’t many articles which take you back to basics describing the building blocks used and help you understand the concept of what a database actually is.
This document will hopefully introduce you to the kinds of things that SQL is capable of. Each example has the command and the results so hopefully you can just read and understand it. I’m aiming this document at people who may come across SQL as part of someone else’s job and feel like they should know a bit about it, but don’t really know where to begin.
So let’s get going…
Structured Query Language or SQL is a computer language used to get stuff out of a database. To understand this statement we need to know what a computer language is and what a database is. Simple again! A computer language is just a set of commands that will make the computer do stuff and a database is multiple groups of stuff.
Now that we have got that out of the way, let’s begin. For this lesson we’ll be using MySQL, partly because it’s free and partly because there are some instructions here to set up MySQL and PHP on a test system.
To understand SQL you have to understand a little bit more about the structure of a database. The groups of stuff I mentioned earlier are called tables. A table is a grid of rows and columns. A database is a collection of tables. Now you have, everything you need to know! So let’s do an example we can all relate to.
We’ll create a table to hold information on people. Table names tend to be plural names like people, parts, events, etc so in this case people and not person because the table holds lots of persons! We collect some relevant information together: first name, sex, age. Then we’ll try to do some thing useful with them.
Before we create the people table we must have a think about the best way of storing the information in the table. Each attribute, field or column needs to have a type. First name is text, sex is an enumeration of male or female and age is an integer. You could make all columns text but we would like to do sums with the age column so we’ll need to use a column type that allows this.
In the early days of programming the commands were separated by new lines but as the languages got more complex people wanted to spread the code over several lines and so a semi-colon was used to denote the end of a statement.
CREATE TABLE people ( first_name VARCHAR(20) ,sex ENUM(’male’, ‘female’) ,age INT );The words in capitals are keywords, they are reserved for use by the computer language. i.e. you can’t have a table called table! Test columns need to have a maximum length so you need to specify it. Now that we have a table let’s put some stuff in it:
INSERT INTO people(first_name, sex, age) VALUES (’David’, ‘male’, ‘40′) ,(’Jon’, ‘male’, ‘30′) ,(’Tom’, ‘male’, ‘20′) ,(’Sophie’, ‘female’, ‘10′);It’s only a small table so let’s list the contents.
SELECT first_name, sex, age FROM people;Produces the output:
+------------+--------+------+ | first_name | sex | age | +------------+--------+------+ | David | male | 40 | | Jon | male | 30 | | Tom | male | 20 | | Sophie | female | 10 | +------------+--------+------+Here you can definitely see the rows and columns structure I mentioned earlier. Each row is a complete set of information, that is there is an entry for every column. We didn’t put any restrictions on the table so we could have not supplied information for some of the columns if we wanted. In which case it might have looked something like this.
+------------+--------+------+ | first_name | sex | age | +------------+--------+------+ | NULL | male | 40 | | Jon | NULL | 30 | | NULL | NULL | NULL | | Sophie | female | 10 | +------------+--------+------+Where NULL denotes the absence of data. In database terms NULL values are a bit of a bane. This special NULL symbol says the information is not provided and so we can’t assume anything about it. Most applications usually filter out the rows where there are gaps in the information. Do you remember SETs from school? All of SQL is based on that idea. So lets look at our data and rephrase our question using SET-speak. Show me all the information for a person where the person is over 21 is translated into
SELECT first_name, sex, age FROM people WHERE age > 21;
+------------+------+------+ | first_name | sex | age | +------------+------+------+ | David | male | 40 | | Jon | male | 30 | +------------+------+------+Show me all the first names of all the females is translated into
SELECT first_name FROM people WHERE sex = ‘female’;
+------------+ | first_name | +------------+ | Sophie | +------------+You may think that we are using the WHERE clause to filter the data to get results but what we are doing is far more subtle than that. We are creating new sets by dividing up existing sets. The sets may be added together in a union or subtracted from each other in an intersection. When we create our subset of data we can use it as the bases for another query. This can be shown in the following query:
SELECT first_name FROM people WHERE age < 35 AND first_name IN ( SELECT first_name FROM people WHERE sex = ‘male’ );The inner SELECT is the sub-query and produces a set of males names. That set of people is used with another set filter to reduce the set even further.
+------------+ | first_name | +------------+ | Jon | | Tom | +------------+The previous query could have been written as the following but quite often you’ll find that the data is not conveniently placed in the same table! it’s in another table or is the result of a calculation.
SELECT first_name FROM people WHERE age < 35 AND sex = ‘male’;Now that we understand a little more about the concepts of sets and creating them let’s look at doing something (more) useful. We will start simple and count the number of males and females in the data set. To the average MySQL’er this will seem like an odd choice to talk about first but it will hopefully be obvious (later) that it is the only starting point.
SELECT COUNT(first_name), sex FROM people GROUP BY sex;Produces the output:
+-------------------+--------+ | COUNT(first_name) | sex | +-------------------+--------+ | 3 | male | | 1 | female | +-------------------+--------+The GROUP BY command is used all the time to collapse sets of information so that they can be aggregated. In this case we are collapsing around the sex column. To the SQL engine that gathers the rows the GROUP BY effectively makes the data look like this.
+------------+--------+------+ | first_name | sex | age | +------------+--------+------+ | ????? | male | 40 | | ??? | male | 30 | | ??? | male | 20 | | ?????? | female | 10 | +------------+--------+------+There aren’t many aggregate operations we can do on text columns like first_name, so the information is meaningless and unpredictable. The data is treated as a separate set of each unique entry in sex. Conceptually the data set has been split into two:
+------------+--------+------+ | first_name | sex | age | +------------+--------+------+ | ????? | male | 40 | | ??? | male | 30 | | ??? | male | 20 | +------------+--------+------+and
+------------+--------+------+ | first_name | sex | age | +------------+--------+------+ | ?????? | female | 10 | +------------+--------+------+The GROUP BY will allow you to create aggregate operations. For example:
SELECT COUNT(first_name), sex FROM people GROUP BY sex;shows the number of rows counted in each of the two groups of data, giving the result:
+----------+--------+ | COUNT(1) | sex | +----------+--------+ | 3 | male | | 1 | female | +----------+--------+or the total age and average for those sets.
SELECT SUM(age), AVG(age), sex FROM people GROUP BY sex;giving
+----------+----------+--------+ | SUM(age) | AVG(age) | sex | +----------+----------+--------+ | 90 | 30.0000 | male | | 10 | 10.0000 | female | +----------+----------+--------+When the GROUP BY is applied to the whole table the GROUP BY keywords may be omitted but it is there implicitly. For example:
SELECT SUM(age), AVG(age) FROM people;
+----------+----------+ | SUM(age) | AVG(age) | +----------+----------+ | 100 | 25.0000 | +----------+----------+This document introduces the concepts of sets, creating subsets and then treating a set as an entire entity and doing something useful. A table is one big set and next time we’ll look at joining tables together to see what matches and what’s missing.
3 comments
Comment from: Sidney [Visitor]
Comment from: Ben [Visitor]
I believe you mean “intersection” rather than “union” when describing the behaviour of WHERE in terms of set operations.
Comment from: davidnewcomb [Member]
I meant we are using a union of rules that makes a set.
I’ll think about re-wording it.
Form is loading...
Nice, if I have time I will have a play making my own table. Never knew how to actually create a table of my own.
Next step would be to have a simple interface to view the Database and do searches on that :-)
Thanks, very usefull
Sidney