Oracle

My Oracle Sequence/Datapump Shenanigans

I love sharing my screw-ups with you all.  One of my favorite things, right up there with raindrops on roses and whiskers on kittens.  It's therapeutic for me and, hopefully, helpful for you in the God-I-hope-I-can-avoid-HIS-idiotic-mistakes way.  Win/win.

Thankfully, it happens so often I've got enough material to keep this blog afloat for months.  Take our recent Oracle issues at the office.

We have a fairly large Oracle database in one of our remote offices that's home to about 200GB of GIS data.  In addition to that production version, we maintain a development copy of the data for, well, development. 

Occasionally, we need to move data between the two instances, and my prefered method for doing so involves using the Oracle Datapump utilities (expdp for exporting, impdp for importing).  My process looks something like this:

  1. Dump source database to disk via expdp.
  2. Copy dump file to destination server.
  3. Drop the schemas I need to refresh.
  4. Import the specific schemas via impdp.

I could just export the specific schemas that I'm interested in, but I like to get the full backup just in case there's some object I forgot I needed, etc.

Couple quick explanations for any non-Oracle types out there.  First, in Oracle, a "schema" and a "user" are essentially interchangeable terms.  Schemas own objects, can have roles/privileges granted to them, etc. 

For example, the TestUser schema could be granted the CONNECT role, thereby giving it the ability to, well, connect to the database.  The TestUser role might also be granted the ability to create objects like tables, views, etc., so you might find yourself querying the TestUser.TestTable table.

Second, Oracle has an object called a Sequence that doesn't exist in certain other databases (<cough>MS SQL<cough>).  Like the name implies, a Sequence is an object you can use to create unique, sequential-ish* values to insert into tables.  Something you might insert, for example, into an ID field.

Got it?  Everything making sense?  Ok, on to My Screw Up.

The other day, I needed to refresh several Schemas in our Dev environment with Production data and, as usual, I did so using those 4 basic steps I mentioned earlier.  Export, copy, drop, import.  Seemed to work like a champ.

Until, that is, users started playing with Dev again.

Suddenly, they'd receive various contraint violations when trying to insert into tables. In a nutshell, some of the tables had unique constraints on ID fields so a user or application couldn't accidentally insert records with the same ID.  The users and applications would get those ID values by using the NextVal function of existing sequences.

Now, as I mentioned, Oracle GUARANTEES that a sequence will never ever give out the same value twice (unless, of course, you went and fiddled with the Sequence by reseeding it, dropping/recreating, etc).  And yet, a database that worked fine in Production seemed to be doing just that. 

Or rather, it seems that somehow the Sequences and Tables using them were getting out of sync.

The weird thing was that it didn't happen all the time.  We had some 2000 sequences in this database, and not all of them were acting like this.  Some applications would work fine after one database refresh, and fail completely after another.  No rhyme or reason.

We hit our heads against this for days, trying to get to the bottom of it before we figured out what was happening.

Remember how I mentioned that the Production database was pretty large, about 200GB in size?  Well, with our hardware, Oracle Data Dump would take about 1 hour to completely export the data.  And, because we were lazy, we were performing the export during normal working hours, meaning that users were actively hitting the database during that hour long export process.

Furthermore, after looking at the export logs, we realized that the expdp process dumped out the Sequence data toward the beginning of the process and dumped the table data toward the end.

See what was happening?

Let's say that we started the export at noon.  In our mythical database, we've got a TestUser schema with a TestSequence object used to populate the TestId field in TestTable. 

Before the export begins, TestSequence's Current Value would be 100, it's NextVal would be 101, and the MAX(TestId) in TestUser.TestTable should be 100. 

So the export kicks off and dumps the TestSequence info to disk at around 12:01.  It then goes on to dump other objects to disk.  Constraints, views, etc. 

Now it gets to the tables, maybe 15 minutes later.  It starts processing those tables, starting with the largest ones.  TestUser.TestTable is a smaller table, so maybe expdp doesn't dump it to disk until 12:45.

Meanwhile, endusers have been actively using the database and inserting rows into TestUser.TestTable, using TestUser.TestSequence to get the TestID data.  So maybe now TestSequence's Current Value is up to 105.  More importantly, the MAX(TestId) in TestTable is also up to 105 when it's finally written to the dumpfile.

Now, when I copy that dump file over to the destination server and import it, it imports TestSequence with the data it had when it was first exported (CurrVal: 100, NextVal: 101) and the TestTable with the data it had when it was exported (Max(TestID): 105).

See the problem?  When someone in Dev next tries to insert into the copy of TestTable, it gets the value 101 from TestSequence.  However, that table already had that value inserted.  User ends up with an ORA-00001: unique constraint violated message, and I get my lunch interrupted with phone calls.

Simple fix, of course, once we figured out the problem: don't export while users are connected.  Either schedule the dump to occur during after hour maintenance windows, or lock everyone out for that hour.

Alternatively, we could have taken a second export of the source database, one that only dumps the Sequence info.  Then, after importing the first, full file, import the second sequence-only one.

*Oracle guarantees that Sequence values will be unique, but not necessarily sequential. Has to do with how Oracle handles caching sequence values for different sessions, etc.  If you need the values to be sequential, you'll need to write your own function.

SQL:

Oh, Joy! More JOINs!

JOINing Tables in INSERT, UPDATE, and DELETE Queries

In our last SQL post, we talked about the three basic types of JOINs enterprising young SQL junkies might use to gather data from different tables in their databases.

"Enterprising..." Oh, how I crack myself up some times. Seriously. Don't make me explain why that's funny.

Anyway, in between Star Trek-related examples, we blabbed on and on about INNER JOINs, three kinds of OUTER JOINs, and the almost completely useless CROSS JOIN. You might remember, though, that every one of those brilliant examples involved querying existing data with SELECT statements. That's fine and dandy. After all, that's also how you'll use those JOINs about 98% of the time. However, JOINs aren't just confined to SELECT statements...

No, sir. You can use JOINs in your INSERT, UPDATE, and DELETE statements, too. Sit back and watch.
First, let's resurrect our sample tables from the last post. If you'll recall, we had a STARSHIP table that contained the following data:

shipid registry name class
1 NCC-1701 Enterprise Constitution
2 NCC-1701D Enterprise Galaxy
3 NCC-1864 Reliant Miranda
4 NCC-50666 Justinian Nerdliness

And an OFFICER table that looked a little something like this:

officerid firstname lastname rank currentship
1 James Kirk Admiral 1
2 Jean-Luc Picard Captain 2
3 Khan Noonien Singh Captain 3
4 Pavel Chekov Cdr 3
5 NULL Spock Captain 1
6 William Ryker Cdr NULL

So let's have some fun. First, we're going to add another record to our OFFICER table.

Let's say that we have a dashing new OFFICER who we want to assign to the USS Justinian. Star Fleet HR gave us the info we need for the firstname, lastname, and rank fields (Justin, Stanley, and Grand Poobah, respectively), but we still need to know the right value to insert into the currentshipid field.

One way to go about that would be to simply look it up first by throwing a SELECT statement at our STARSHIP table, writing down the shipid value that corresponds to the USS Justinian, then manually adding that to our normal INSERT statement:

INSERT INTO officer 
    (firstname, lastname, rank, currentshipid)
VALUES
    ('Justin', 'Stanley', 'Grand Poobah', 4)

Obviously, that works just fine and you'll probably end up doing that most of the time. But damn it, Jim! This post is about using JOINs (or will be eventually...), so that's what we're bloody hell gonna do. Before we get to that part, though, we need to get familiar with the INSERT...SELECT syntax.

So let's write that query. The first part looks like your regular ol' INSERT statement. You'll notice, though, that the syntax changes a bit right after we specify the column names of the destination table:

INSERT INTO officer
    (firstname, lastname, rank, currentshipid)
SELECT 'Justin', 'Stanley', 'Grand Poobah', s.shipid
    FROM starship s
    WHERE s.name = 'Justinian'

(Note: I didn't need to specify a value for the officerid column because we set it up as an IDENTITY field that'll autoincrement when new records are added when the table was initially created.)

Almost seems to overcomplicate things, doesn't it? Maybe you thought that you could get away with something like this instead:

INSERT INTO officer
    (firstname, lastname, rank, currentshipid)
VALUES
    ('Justin', 'Stanley', 'Grand Poobah', 
        (SELECT shipid FROM starship WHERE name = 'Justinian')
    )

A lot of people probably try that one first then get pissed when it doesn't work. I'm one of them. Thing you have to remember is that the SELECT statement doesn't necessarily return a scalar value, but that's what the INSERT statement (written in this format) requires. Even if you write the SELECT part so that it only returns one record (using TOP, for instance), this query will still poo itself.

And that brings us to a big warning: when you're using the INSERT...SELECT statement, be 100% certain that the SELECT part only returns a single row. Otherwise, there's a pretty good chance that you'll end up INSERTing more rows than you intended.

For example, what if I was trying to assign this new officer to Picard's Enterprise instead? If I just changed the WHERE statement in my query to 'WHERE s.name = 'Enterprise',' I'd be in for a rude awakening. That query would return results for both ships named Enterprise in my table and, therefore, would insert two rows into the OFFICER table with my new guy's info, one with shipid = 1 and another with shipid = 2 (the shipid's for the two ships named 'Enterprise' in my STARSHIP table).

So let's get to the JOIN example. Let's say that we have another new OFFICER to add to the table. Again, Star Fleet HR gives us his firstname, lastname, and rank ('Luke', 'Skoubo', 'Junior Lackey'), but this time they also tell us that he's to serve under Grand Poobah Stanley.

Well, again, we could just use a SELECT statement to find the currentshipid for Stanley's record and use that to create your INSERT. Boring. We're going to use a JOIN so we only have to write one query and don't have to bother fishing through our desk for a pen and Post-It Note:

INSERT INTO officer
    (firstname, lastname, rank, currentshipid)
SELECT 'Luke', 'Skoubo', 'Junior Lackey', s.shipid
    FROM starship s
    INNER JOIN officer o
    ON s.shipid = o.currentshipid
    WHERE o.lastname = 'Stanley'

Simple, eh? Again, though, just have to make sure that the SELECT part only pulls a single record so you don't end up INSERTing more than you expected.

UPDATEs work pretty much the same way and, frankly, I can think of many more situations in which I've used a JOIN in UPDATEs than INSERTs.

For example, what if Star Fleet decided to implement some cost-cutting measures by decommissioning the old Constitution class Enterprise and transferring all it's personnel over to... oh, what's it called? That one ship that Grand Poobah Stanley runs?

As the Star Fleet DBA, we need to ensure that the data in the OFFICER table reflects that change and adjust our currentshipid fields accordingly:

UPDATE o
SET o.currentshipid = 
    (SELECT currentshipid 
    FROM officer 
    WHERE lastname = 'Stanley')
FROM starship s  
INNER JOIN officer o
    ON s.shipid = o.currentshipid
WHERE s.registry = 'NCC-1701'

Now Kirk and Spock are Stanley's minions. Like the INSERT example, I still had to be careful that the subquery I used to find Stanley's currentshipid only returned on row. However, unlike that previous query, wacky things won't happen if my subquery returns multiple results. Instead, it'll just fail altogether with a warning that it can only return a single record if I expect things to actually work.

Finally, a DELETE example. It's very similar to the others, but I find it a little counter-intuitive in its redudancy. With the DELETE...JOIN statement, you have to list the table twice using the FROM statement to begin the JOIN. Check out the example.

Let's say that the Reliant is destroyed while Chekov and Khaaaaaan are both still on board. Star Fleet, anxious to sweep that dirty little hijacking under the rug, asks you to delete all records of OFFICERs assigned to that boat at the time:

DELETE officer
FROM officer o
INNER JOIN starship s
    ON s.shipid = o.currentshipid
WHERE s.name = 'Reliant'

You might try something like "DELETE FROM officer INNER JOIN...", but you'll faily miserably. Sorry, but I didn't write the language or standards.

Well, I don't know about you, but that's about all the JOINing I can take for awhile. As always, happy to hear any questions, comments, or suggestions.

JOIN the Dark Side

When you've been playing around with SQL for awhile, you start to take your knowledge of JOINs for granted. When someone asks you about JOINing data from two or more tables together, you get this stunned look on your face... You know, the same one you get when you actually hear someone mutter the words "I've never seen 'Star Wars'." You forget that you, too, once didn't know a JOIN from a Jedi, and that everyone has to start somewhere.

So let's talk about JOINs. Like the name implies, a JOIN is simply a method for connecting two tables in a database, usually through some piece of data common to records in both tables.

And they are pretty simple once you figure them out. Before you bother to start writing queries, you need to ask yourself two questions: what kind of data do you want from each table, and how are those two tables related?

The first question will help you figure out which type of JOIN you're going to use. Do you only care about records in each table when they have a matching result in the other table or do you want all records from one or both tables, regardless of matches found?

And once you've answered that one, you can start thinking about that second question. SQL doesn't know exactly how those two tables are related on its own, so your well-designed database will likely take advantage of some foreign keys or other data that somehow links the tables together.

How 'bout an example? Let's say you're a DBA working in Star Fleet Command (hey, the site's called "Nerdliness," isn't it?). You store information about specific vessels in the STARSHIP table and info about different people manning those ships is in the OFFICER table. Occasionally, some admiral or another asks you to pull information about the ships and their crews. Should be pretty clear by now what SQL operation we'll be performing, eh?

We're going to make the tables pretty simple for this example. Our STARSHIP table looks something like this:

shipid registry name class
1 NCC-1701 Enterprise Constitution
2 NCC-1701D Enterprise Galaxy
3 NCC-1864 Reliant Miranda
4 NCC-50666 Justinian Nerdliness

And our OFFICER table looks like this:

officerid firstname lastname rank currentship
1 James Kirk Admiral 1
2 Jean-Luc Picard Captain 2
3 Khan Noonien Singh Captain 3
4 Pavel Chekov Cdr 3
5 NULL Spock Captain 1
6 William Ryker Cdr NULL

First, no. I didn't know all the information off the top of my head. Only about 95% of it...

And second, for all the Trekkers and Trekkies out there, in this example, Ryker's had a spat with Picard and is no longer assigned to his (or any other) ship. And, because Star Fleet's a government organization and, therefore, probably dirty as all Hell, they've put a positive spin on the whole oh-snap-this -300-year-old-super-dude-just-hijacked-one-of-our-ships thing by issuing a press release stating that Khan was actually given command of the Reliant and all records were forged to jive with the party line.

Oh, and the USS Justinian was just finished, but Star Fleet has yet to find a captain worthy of sitting at the helm of such a fantastic machine.

Anyway...

So there are three basic types of JOINs:

1. INNER JOIN: An INNER JOIN will return all rows from both tables involved so long as there's a matching row in each table. For example, if I did a straight INNER JOIN of my two tables, the query would probably look something like this:

SELECT * FROM officer o
INNER JOIN starship s
ON o.currentshipid = s.shipid

And my output:

officer
id
first
name
last
name
rank current
ship
ship
id
registry name class
1 James Kirk Admiral 1 1 NCC-1701 Enterprise Constituion
2 Jean-Luc Picard Captain 2 2 NCC-1701D Enterprise Galaxy
3 Khan Noonien Singh Captain 3 3 NCC-1864 Reliant Miranda
4 Pavel Chekov Cdr 3 3 NCC-1864 Reliant Miranda
5 NULL Spock Captain 1 1 NCC-1701 Enterprise Constitution

That might be useful if I wanted to find the names of all the crew members for all this ships. Or, if I just wanted to know about a specific ship, I could use the WHERE statement to slim down the result set (WHERE s.registry = 'NCC-1707', for example).

Notice a couple of things... First, I didn't specify columns, so my result set contains every piece of data from both tables. In all likelihood, I'd probably only need some of that info and would adjust my query accordingly in real Star Fleet life.

Second, the INNER keyword is actually optional. I like to use it for clarity's sake, but you'll end up with the same results if you leave it out. In other words, the default JOIN with MS SQL is the INNER JOIN.

Finally, see what's missing? Because I used an INNER JOIN, Ryker and the USS Justinian are missing from the result set. I only asked SQL to give me the records that met the "o.currentshipid = s.shipid" criteria, and they didn't fit the bill. And if I actually did want to see all of the records from a table, even if there wasn't a match? In that case, I'd need to use an...

2. OUTER JOIN: Unlike an INNER JOIN, the OUTER JOIN will return all the records from one (or more) of the tables involved. Let's say I wanted to know the names of all current officers and their currently assigned ships. However, unlike with the INNER JOIN, I also want them listed even if they aren't on a ship. That might look like this:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
LEFT OUTER JOIN starship s
ON o.currentshipid = s.shipid

The result?

firstname lastname registry name
James Kirk NCC-1701 Enterprise
Jean-Luc Picard NCC-1701D Enterprise
Khan Noonien Singh NCC-1864 Reliant
Pavel Chekov NCC-1864 Reliant
NULL Spock NCC-1701 Enterprise
William Ryker NULL NULL

Suddenly, Ryker matters. We pulled his info from the OFFICE table like we wanted, giving us the complete list of personnel. And for those people assigned a ship, we have all the data we asked for from the STARSHIP table.

Notice that the two columns from the STARSHIP table are both NULL for Ryker. That'll happen for every record that doesn't have a matching result from your OUTER JOINed table.

Also note that the USS Justinian still isn't listed anywhere. The way we structured our query, we only asked SQL for all the records in one of the two tables involved (the one on the LEFT of the equals sign). Guess what word we'd replace LEFT with if we wanted all the records in the table on the RIGHT side? Yep. RIGHT. So if we instead wanted our list to include all the STARSHIP records and, when possible, the assigned officers, we could use the exact same query, only changing that one word:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
RIGHT OUTER JOIN starship s
ON o.currentshipid = s.shipid

Gives us this:

firstname lastname registry name
James Kirk NCC-1701 Enterprise
NULL Spock NCC-1701 Enterprise
Jean-Luc Picard NCC-1701D Enterprise
Khan Noonien Singh NCC-1864 Reliant
Pavel Chekov NCC-1864 Reliant
NULL NULL NCC-50666 Justinian

No more Ryker, but now the Justinian shows up. Imagine that. SQL did what we told it to do. And, again, we see NULL values when there aren't any matching records from the JOINed table.

I know your next question: "What if I want everything from both columns, regardless of matches?" Easy. You raise up and get your FULL OUTER JOIN on:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
FULL OUTER JOIN starship s
ON o.currentshipid = s.shipid

And...

firstname lastname registry name
James Kirk NCC-1701 Enterprise
Jean-Luc Picard NCC-1701D Enterprise
Khan Noonien Singh NCC-1864 Reliant
Pavel Chekov NCC-1864 Reliant
NULL Spock NCC-1701 Enterprise
William Ryker NULL NULL
NULL NULL NCC-50666 Justinian

Piece of cake, eh? All records from both tables, matching when you can, NULL values where you can't.

Like the "INNER" in "INNER JOIN," the word "OUTER" is actually optional in each of the above queries. You'll see the exact same results whether you type "LEFT JOIN" or "LEFT OUTER JOIN." Again, I think using the full syntax makes your queries more readable and that you're better off with it than without (especially while you're still learning).

Ok, so you've got your INNER JOIN and your OUTER JOIN, and they pretty much cover 99.999999% of the JOINs you'll ever write. However, I promised you a third JOIN when we started this adventure, and I'm a nerd who delivers. I bring you...

3. CROSS JOIN: The CROSS JOIN gives you every possible combination of results from the two tables involved, and in all honesty, I've never ever had to use it in real life. I only mention it thoroughness's sake and I can't even think of a situation in which it might be useful. If you've got one, please please let me know about it. Dying to hear.

Syntax for the CROSS JOIN is a little different. Because you don't really care if any records from table A match anything in table B, you omit the ON keyword and everything after it:

SELECT o.firstname, o.lastname, s.registry, s.name FROM officer o
CROSS JOIN starship s

I'm not going to bother with the result set. With CROSS JOINs, they get really big, really fast, returning the number of records equal to the total in table A multiplied by the total in table B. For instance, our example with six guys in our OFFICER table and four clunkers in the STARSHIP table, the result set of our CROSS JOIN yields 24 records. Now imagine a CROSS JOIN between a table with 100,000 customers and 200,000 orders. For you folks playing along at home, that little query would give you 20,000,000,001 problems (the 20 billion records returned, plus one pissed off DBA).

And that's where our story ends, at least for today. Next time, in a much shorter post, maybe we'll talk about how you might use JOINs in UPDATE and INSERT statements. Or maybe I'll just bust our "Wrath of Khan" and ignore this place for awhile. Who knows.

Subscribe to RSS - Oracle