Eternalistic Designs

JOIN the Dark Side

Aug 14, 2007
0 comments
Submitted By: Justin Stanley
Filed Under:
Share our glory:

When you've been playing around with SQL for awhile, you start to take your knowledge of JOINs for granted. When someone asks you about JOINing data from two or more tables together, you get this stunned look on your face... You know, the same one you get when you actually hear someone mutter the words "I've never seen 'Star Wars'." You forget that you, too, once didn't know a JOIN from a Jedi, and that everyone has to start somewhere.

So let's talk about JOINs. Like the name implies, a JOIN is simply a method for connecting two tables in a database, usually through some piece of data common to records in both tables.

And they are pretty simple once you figure them out. Before you bother to start writing queries, you need to ask yourself two questions: what kind of data do you want from each table, and how are those two tables related?

The first question will help you figure out which type of JOIN you're going to use. Do you only care about records in each table when they have a matching result in the other table or do you want all records from one or both tables, regardless of matches found?

And once you've answered that one, you can start thinking about that second question. SQL doesn't know exactly how those two tables are related on its own, so your well-designed database will likely take advantage of some foreign keys or other data that somehow links the tables together.

How 'bout an example? Let's say you're a DBA working in Star Fleet Command (hey, the site's called "Nerdliness," isn't it?). You store information about specific vessels in the STARSHIP table and info about different people manning those ships is in the OFFICER table. Occasionally, some admiral or another asks you to pull information about the ships and their crews. Should be pretty clear by now what SQL operation we'll be performing, eh?

We're going to make the tables pretty simple for this example. Our STARSHIP table looks something like this:

shipid registry name class
1 NCC-1701 Enterprise Constitution
2 NCC-1701D Enterprise Galaxy
3 NCC-1864 Reliant Miranda
4 NCC-50666 Justinian Nerdliness

And our OFFICER table looks like this:

officerid firstname lastname rank currentship
1 James Kirk Admiral 1
2 Jean-Luc Picard Captain 2
3 Khan Noonien Singh Captain 3
4 Pavel Chekov Cdr 3
5 NULL Spock Captain 1
6 William Ryker Cdr NULL

First, no. I didn't know all the information off the top of my head. Only about 95% of it...

And second, for all the Trekkers and Trekkies out there, in this example, Ryker's had a spat with Picard and is no longer assigned to his (or any other) ship. And, because Star Fleet's a government organization and, therefore, probably dirty as all Hell, they've put a positive spin on the whole oh-snap-this -300-year-old-super-dude-just-hijacked-one-of-our-ships thing by issuing a press release stating that Khan was actually given command of the Reliant and all records were forged to jive with the party line.

Oh, and the USS Justinian was just finished, but Star Fleet has yet to find a captain worthy of sitting at the helm of such a fantastic machine.

Anyway...

So there are three basic types of JOINs:

1. INNER JOIN: An INNER JOIN will return all rows from both tables involved so long as there's a matching row in each table. For example, if I did a straight INNER JOIN of my two tables, the query would probably look something like this:

SELECT * FROM officer o
INNER JOIN starship s
ON o.currentshipid = s.shipid

And my output:

officer
id
first
name
last
name
rank current
ship
ship
id
registry name class
1 James Kirk Admiral 1 1 NCC-1701 Enterprise Constituion
2 Jean-Luc Picard Captain 2 2 NCC-1701D Enterprise Galaxy
3 Khan Noonien Singh Captain 3 3 NCC-1864 Reliant Miranda
4 Pavel Chekov Cdr 3 3 NCC-1864 Reliant Miranda
5 NULL Spock Captain 1 1 NCC-1701 Enterprise Constitution

That might be useful if I wanted to find the names of all the crew members for all this ships. Or, if I just wanted to know about a specific ship, I could use the WHERE statement to slim down the result set (WHERE s.registry = 'NCC-1707', for example).

Notice a couple of things... First, I didn't specify columns, so my result set contains every piece of data from both tables. In all likelihood, I'd probably only need some of that info and would adjust my query accordingly in real Star Fleet life.

Second, the INNER keyword is actually optional. I like to use it for clarity's sake, but you'll end up with the same results if you leave it out. In other words, the default JOIN with MS SQL is the INNER JOIN.

Finally, see what's missing? Because I used an INNER JOIN, Ryker and the USS Justinian are missing from the result set. I only asked SQL to give me the records that met the "o.currentshipid = s.shipid" criteria, and they didn't fit the bill. And if I actually did want to see all of the records from a table, even if there wasn't a match? In that case, I'd need to use an...

2. OUTER JOIN: Unlike an INNER JOIN, the OUTER JOIN will return all the records from one (or more) of the tables involved. Let's say I wanted to know the names of all current officers and their currently assigned ships. However, unlike with the INNER JOIN, I also want them listed even if they aren't on a ship. That might look like this:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
LEFT OUTER JOIN starship s
ON o.currentshipid = s.shipid

The result?

firstname lastname registry name
James Kirk NCC-1701 Enterprise
Jean-Luc Picard NCC-1701D Enterprise
Khan Noonien Singh NCC-1864 Reliant
Pavel Chekov NCC-1864 Reliant
NULL Spock NCC-1701 Enterprise
William Ryker NULL NULL

Suddenly, Ryker matters. We pulled his info from the OFFICE table like we wanted, giving us the complete list of personnel. And for those people assigned a ship, we have all the data we asked for from the STARSHIP table.

Notice that the two columns from the STARSHIP table are both NULL for Ryker. That'll happen for every record that doesn't have a matching result from your OUTER JOINed table.

Also note that the USS Justinian still isn't listed anywhere. The way we structured our query, we only asked SQL for all the records in one of the two tables involved (the one on the LEFT of the equals sign). Guess what word we'd replace LEFT with if we wanted all the records in the table on the RIGHT side? Yep. RIGHT. So if we instead wanted our list to include all the STARSHIP records and, when possible, the assigned officers, we could use the exact same query, only changing that one word:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
RIGHT OUTER JOIN starship s
ON o.currentshipid = s.shipid

Gives us this:

firstname lastname registry name
James Kirk NCC-1701 Enterprise
NULL Spock NCC-1701 Enterprise
Jean-Luc Picard NCC-1701D Enterprise
Khan Noonien Singh NCC-1864 Reliant
Pavel Chekov NCC-1864 Reliant
NULL NULL NCC-50666 Justinian

No more Ryker, but now the Justinian shows up. Imagine that. SQL did what we told it to do. And, again, we see NULL values when there aren't any matching records from the JOINed table.

I know your next question: "What if I want everything from both columns, regardless of matches?" Easy. You raise up and get your FULL OUTER JOIN on:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
FULL OUTER JOIN starship s
ON o.currentshipid = s.shipid

And...

firstname lastname registry name
James Kirk NCC-1701 Enterprise
Jean-Luc Picard NCC-1701D Enterprise
Khan Noonien Singh NCC-1864 Reliant
Pavel Chekov NCC-1864 Reliant
NULL Spock NCC-1701 Enterprise
William Ryker NULL NULL
NULL NULL NCC-50666 Justinian

Piece of cake, eh? All records from both tables, matching when you can, NULL values where you can't.

Like the "INNER" in "INNER JOIN," the word "OUTER" is actually optional in each of the above queries. You'll see the exact same results whether you type "LEFT JOIN" or "LEFT OUTER JOIN." Again, I think using the full syntax makes your queries more readable and that you're better off with it than without (especially while you're still learning).

Ok, so you've got your INNER JOIN and your OUTER JOIN, and they pretty much cover 99.999999% of the JOINs you'll ever write. However, I promised you a third JOIN when we started this adventure, and I'm a nerd who delivers. I bring you...

3. CROSS JOIN: The CROSS JOIN gives you every possible combination of results from the two tables involved, and in all honesty, I've never ever had to use it in real life. I only mention it thoroughness's sake and I can't even think of a situation in which it might be useful. If you've got one, please please let me know about it. Dying to hear.

Syntax for the CROSS JOIN is a little different. Because you don't really care if any records from table A match anything in table B, you omit the ON keyword and everything after it:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
CROSS JOIN starship s

I'm not going to bother with the result set. With CROSS JOINs, they get really big, really fast, returning the number of records equal to the total in table A multiplied by the total in table B. For instance, our example with six guys in our OFFICER table and four clunkers in the STARSHIP table, the result set of our CROSS JOIN yields 24 records. Now imagine a CROSS JOIN between a table with 100,000 customers and 200,000 orders. For you folks playing along at home, that little query would give you 20,000,000,001 problems (the 20 billion records returned, plus one pissed off DBA).

And that's where our story ends, at least for today. Next time, in a much shorter post, maybe we'll talk about how you might use JOINs in UPDATE and INSERT statements. Or maybe I'll just bust our "Wrath of Khan" and ignore this place for awhile. Who knows.



Make people smarter:

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <div><span><br /><blockquote><table><thead><th><tr><td><form><input><h1><h2> <h3> <h4> <h5> <h6> <img> <p> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><i>
  • Lines and paragraphs break automatically.
  • You may post block code using <blockcode [type="language"]>...</blockcode> tags. You may also post inline code using <code [type="language"]>...</code> tags.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options