Let's say you have a table that contains the monthly figures for your army of sales people. At the end of every month, the sales manager asks you to pull a list of his top ten performers from the previous period. What do you do? Well, you could just pull the entire list, ORDER BY the month, and manually count off the top ten. That query might look something like this:
SELECT salesperson_name, sales_total FROM salesdata
WHERE sales_month = 'June'
ORDER BY sales_total DESC
Or maybe you feed the result set into some other application that programatically extracts only the top ten for you. Whatever. Point is that, as a general rule, we don't want to pull any more information from the server than we need to. What if there are 50 salespeople? Or 500? If your queries extract data you aren't even going to use, you're wasting horsepower and bandwidth that could be used for more important tasks (like downloading music). It's the same reason we specify columns in our queries instead of always SELECTing *. Thankfully, there's an easy way to filter out the result set and only return the records you really need. If you use a flavor of Microsoft SQL, you can use the TOP clause in your SELECT query to specify either a certain number or certain percentage of records to return. Just slap it in your query after SELECT and before your column selections. In our previous example, it would look something like so:
SELECT TOP 10 salesperson_name, sales_total FROM salesdata
WHERE sales_month = 'June'
ORDER BY sales_total DESC
Want the top 5% instead? Simple:
SELECT TOP 5 PERCENT salesperson_name, sales_total FROM salesdata
WHERE sales_month = 'June'
ORDER BY sales_total DESC
What if you don't use the MS version of SQL? For instance, if you're cheap like I am and use a hosting service that gives you a MySQL or PostgreSQL DB instead? The syntax is a little different, but you can get the same results using the LIMIT clause:
SELECT salesperson_name, sales_total FROM salesdata
WHERE sales_month = 'June'
ORDER BY sales_total DESC
LIMIT 10Same idea, but the LIMIT clause appears at the end instead of the beginning like the TOP version in MS SQL. And here's where it gets fun. Both methods have a slight edge over the other, depending on what you need to return. I mentioned that the TOP clause in MS SQL can also be used to return a percentage of rows instead of just a set number. MySQL and PostgreSQL don't have that ability natively (you could write a function to do it, though). On the flip side, MySQL and PostgreSQL have a handy clause you can use with LIMIT called OFFSET. With OFFSET, you can actually specify a range of rows to return instead of just always returning the top X number. For example, let's say your sales team has an incentive program that rewards the top eleven through fifteen salespeople with steak knives each month. With MySQL and PostgreSQL, we can use LIMIT and OFFSET together to select only those specific five records:
SELECT salesperson_name, sales_total FROM salesdata
WHERE sales_month = 'June'
ORDER BY sales_total DESC
LIMIT 5 OFFSET
Pretty handy for writing code to paginate results, too. In that case, you'd probably start off with an OFFSET of zero, then increment it by whatever value you use for the LIMIT each time. So if you wanted to return results 1-10, then 11-20, then 21-30, etc., you'd use LIMIT 10 OFFSET 0, LIMIT 10 OFFSET 10, and LIMIT 10 OFFSET 20, respectively. Getting that same functionality out of MS SQL is trickier. If I just want salespeople ranked 11-25 like I did before, I'd have to use subqueries in my WHERE statement. Exactly how those worked would depend on exactly what information I was trying to pull and how my tables were set up. In this case, it might look something like this:
SELECT TOP 5 salesperson_name, sales_total FROM
(SELECT TOP 15 salesperson_name, sales_total
FROM salesdataWHERE sales_month = 'June'
ORDER BY sales_total DESC) TopFifteen
ORDER BY sales_total ASC
So we have an subquery called TopFifteen that returns just that: the top 15 salespeople ranked accordingly to their sales_total. Then, in our outer query, we take the top 5 of those to give us only the people we want. Notice that in the subquery we ORDER BY sales_total DESC, but in the outer query we ORDER BY sales_total ASC. If our outer query used the DESC keyword, we'd simply end up with the top 5 altogether. Instead, we flip the order around to make sure we're getting the five we want. The downside of this approach, though, is that, while we'll have the records we wanted, they'll be in the wrong order:
| sales_person | sales_total |
| Billy Jean | 44500.0000 |
| Super Freak | 47283.5000 |
| Sweet Caroline | 59277.7500 |
| Lovely Rita | 59548.8800 |
| Ruby Tuesday | 65292.0000 |
If we're just that anal and sadistic, we could always create yet another subquery to give us the proper order:
SELECT * FROM
(SELECT TOP 5 salesperson_name, sales_total FROM
(SELECT TOP 15 salesperson_name, sales_total
FROM salesdataWHERE sales_month = 'June'
ORDER BY sales_total DESC) TopFifteen
ORDER BY sales_total ASC) FifteenToTen
ORDER BY sales_total DESC
Oh, by the way, there's another way you might be able to get the same results, depending on what you're looking for... The other day, I wanted to return only the 2-9 most recent entries in a table. That particular table had an ID column that was incremented each time, so I knew that the most recent record would always have the largest ID. In that case, my query could be simplified my just making sure I wasn't pulling the one record with the MAX(ID) value. Looked something like this:
SELECT TOP 9 * FROM orders
WHERE id <>(SELECT MAX(id)
FROM ORDERS)ORDER BY id DESC
Could have written the same query using the same format as my other example:
SELECT * FROM
(SELECT TOP 9 * FROM
(SELECT TOP 10 *
FROM ordersORDER BY id DESC) InnerMostQuery
ORDER BY id ASC) InnerQuery
ORDER BY id DESC
Nice, huh? Here's hoping the good folks in Redmond are working to add that OFFSET-type functionality to a future version. In the meantime, if you know any other tricks, let us know.