Pulling Data from Excel

Enter OPENROWSET

I love ad hoc SQL tasks.

Seriously. Call me sadistic, but I think they're kinda fun. Not exactly kickin'-back-with-an-IPA-and-watchin'-Rome-on-DVD fun, but certainly a lot more entertaining than the usual day-to-day grind.

More often than not, those ad hoc tasks somehow involve an Excel spreadsheet. Maybe someone has a spreadsheet with the ID's of various customers who need to be deleted from the database, or maybe they need to update the area codes for certain people. Who knows. Point is that you need to know how to incorporate data from an Excel file just like you would from any other table.

When I first started working with SQL and Excel, I'd usually use DTS to import the Excel spreadsheet into a table in my existing database and access it like normal (JOINing data, etc).

That worked just fine, but it has a couple of downsides. First, it adds more work. You have create a table, insert the data, and usually remember to drop that table after you're done with it (unless you're using temporary tables). Granted, DTS can take care of those first steps for you, but why go through the extra steps if you don't have to? Plus, there's always the chance you might drop the wrong table in the last step. Not good times.

After I starting learned enough about coding to be dangerous, I started writing little scripts to loop through the input file line-by-line, connect to my SQL instance, and run the appropriate parameterized SQL query against the database. Again, it works in most cases, but for most cases that ends up causing more work than necessary. You need to write the code to connect to your database, create the Excel objects, read through the file, parse each line, execute the queries, etc. Throw in the time you'll need to debug it and you're looking at what could be an unnecessarily time-consuming process.

There's a much easier way to go about it: OPENROWSET. The OPENROWSET function (and the OPENDATASOURCE function) basically let you connect to the data in your Excel spreadsheet as though it was just another table in your database.

For example, recently I needed to update two date columns in a products table to new values contained in a spreadsheet. Rather than loop through that spreadsheet line-by-line, parsing each one and querying the database repeatedly, I used OPENROWSET to execute a single UPDATE query:

   1:  UPDATE products
   2:  SET date1 = Import.ExpDate, date2 = Import.SupDate
   3:  FROM INVENTOR INNER JOIN
   4:      (select ITEMID, Convert(datetime, [Expiration Date]) as ExpDate,
   5:          Convert(datetime, [Support Through Date]) as SupDate
   6:      FROM
   7:      OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;DATABASE=c:\newdates.xls',
   8:          'Select * from [Sheet1$] WHERE [Expiration Date] <> ''none''')
   9:      ) Import
  10:  ON INVENTOR.ITEM_ID = Import.ITEMID

The query itself is pretty straight-forward. The UPDATE syntax is the same you might use for any other update (though this might be a little more complex since it required a JOIN). The fun part is in the OPENROWSET function.

The Excel spreadsheet had three columns (ITEMID, Expiration Date, and Support Through Date). You can see how they're referenced just like any other column in the right half of my JOIN. The OPENROWSET function takes three arguments. The first is the provider name (in my case, 'Microsoft.Jet.OLEDB.4.0'). You have to use something here. OPENROWSET can be used to access any OLEDB data source, not just Excel spreadsheets, so you need to tell SQL how to talk to whatever it needs to talk to.

The second argument is the datasource you want to connect to. For my Excel spreadsheet called "newdates.xls" located in the root of my C:\ drive, that datasource was 'Excel 8.0;DATABASE=c:\newdates.xls'. Obviously, the file name and path need to change to match what you're using. Keep in mind, though, that you might need to provide additional info if you're connecting to a different OLEDB datasource. For instance, if your source is another database, you might need to provide a username and password.

The final argument can be either the name of the object you're connecting to or a specific query. In my case, the spreadsheet they gave me had information for products that didn't need to be updated along with the ones that did, so I decided only to import only the records that had a valid date. Alternatively, I could have just imported the whole page by replacing "'Select * from [Sheet1$] WHERE [Expiration Date] <> ''none''' with simply the name of the worksheet ([Sheet1$]) and filtered out the unnecessary records in my JOIN statement, but why bothering wasting bandwidth on records you aren't going to use?

One last thing... while OPENROWSET is extremely useful for ad hoc queries, Microsoft recommends using linked servers when you're talking about connections you'll use more often. Read all about that (and other ways to connect to Excel) in the References section.

SQL:

MS Office: