andrew burke





Three ways to avoid disaster with manual SQL statements

Posted on: 2006-11-13

I spend a lot of time avoiding SQL. I've only done one project where every SQL statement was hand-coded - ever since then I've either written my own tools to generate and manage database queries and udpates, or used existing ones like Rails' ActiveRecord. However, every so often I need to clean things up manually in a database - usually from some kind of obscure error that wasn't worth handling with buttons and code - and I've had a few close calls with disaster.

Note From The Future: You generally shouldn't be poking around in a live database anyhow - more details in tomorrow's post.

  1. SQL is powerful and unforgiving, and specifying what to change or delete goes from broad to specific - so leaving something out in your code will make it affect more than you expected. If you wanted to update the tax rate for all items from Nova Scotia (with id 5), you should write

    update lineitems set taxrate = 0.14 where province_id = 5;

    but since I had been concentrating on getting the tax rate correct, I've found myself writing

    update lineitems set taxrate = 0.14;

    and then just hitting enter - which goes and changes the tax rate for every item in the system. When this happens, usually the database takes a few seconds to run through the massive update, which is just enough time to realize that I've made a hideous mistake, but not enough time to do anything about it.

    To defend against this, I've learned to type my updates/deletes like this:

    update lineitems where province_id = 5;

    this doesn't run - but then I go back to the middle of the statement and insert the 'set taxrate = 0.14' part afterwards. This maps more cleanly with the way that my brain works, and I haven't made any of those mistakes since I started using this technique.

  2. The other risk is in deleting table rows with complicated criteria. For example, if you're cleaning the invoice line items from all invoices that belong to all jobs with 'cancelled' status that were created since September 1, you need to do something like:

    delete from invoicelines where invoice_id in (select id from invoices where job_id in (select id from jobs where status_id = 7) and created_at > '2006-09-01 00:00:00');

    (yes that's inefficient SQL, but these are one-off handwritten statements that need to be clear rather than elegant)

    These statements can get out of hand very easily - put a '<' where there should be a '>' and all the wrong data can be deleted.

    What helps for me is to create a select statement first:

    select * from invoicelines where invoice_id in (select id from invoices where job_id in (select id from jobs where status_id = 7) and created_at > '2006-09-01 00:00:00');

    and check the results to make sure they look right. This way it's easy to catch conditions that inadvertently refer to everything in the database. Once I'm sure the statement is fine, I bring up the previous statement (up arrow in most terminal-based clients) and just change the 'select * ' into 'delete '. Then you know exactly what you've just deleted, since it's right there in the previous results - you can even compare the number of rows returned from the select with the number of rows deleted as an extra check. You can also try running the previous select statement and it should now return nothing.

  3. Always have backups and event tracking! I've learned these techniques after a few real close calls with live business data (every order now going to Nova Scotia, for example). Each of these incidents was kept in the 'learning experience' range, rather than 'career-limiting move' because of nightly cron jobs that backup the entire database, and thorough tracking of all business activities in an 'event_tracking' table in the database. The times when I did inadvertently mess things up, instead of calling the client and saying 'Well, all of your business data has been corrupted - we'll have to take the next month to manually fix everything' I said 'Sorry, I made a mistake in that cleanup and we've lost the changes from this morning - but all that happened this morning was the admin confirmed these 3 orders, and invoiced this other one - I've already redone these actions, so everything should be fine.'

    That's still a bad thing to have to say - but with these safety techniques I haven't had to in a long time.

    If you have lots of money and time, there are fancier database solutions that provide journaling and rollbacks etc. - but these techniques work on any system without any extra software or hiring teams of expensive DBAs.

Previous: Email - From a Mountain to a Molehill
Next: More On Avoiding SQL Disasters