sql orm logo
by Kasper B. Graversen, 2008

Easy Database Refactoring

By Kasper B. Graversen, last revision 21/04-2008

This article explains how to achieve easy database refactorings in the Java code, and how to have fun when programming towards a database (in a safe and very easy manner). While the techniques can be used in ORMs in general such as Toplink, Hibernate, the outset of the article is using raw SQL or low level ORMs such as SqlORM.

History of Refactoring

Before going in to the technical details, I feel its important to first take a glimpse at history as it has unfolded. Refactoring was introduced to the world map of programming in 1999 by Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts in the seminal book "Refactoring: Improving the Design of Existing Code". The book showed how to abstract over and contemplate blocks of code sliced differently than just classes and methods. Further the book showed how to change code slices into being more readable and extensible but without changing the semantics of the code. This fueled the agile movements and other anti-waterfall philosophies, prescribing that it is natural for code to change over time, as requirements change, developers better understand the problem domain etc.

While the ideas of refactoring were great, carrying out the meticulous step-by-step recipes from the refactoring was a tedious affair. Fortunately, it was soon discovered, that these steps could be automated by a language-aware editor. In term, such editors (IDE's) supported the refactorings and is now a standard feature in any decent IDE.

While the refactoring book gave programmers and managers alike a conceptual tool to think of software as a dynamic artifact, it was the automation of refactorings through tools such as the IDE's, that impacted the world of programming. Refactoring changed from being a concept, a wishful mind-exercise, to being something that is carried out in practice every day.

Seven years later the book "Refactoring Databases: Evolutionary Database Design" by Scott W. Ambler and Pramodkumar J. Sadalage was released (http://www.ambysoft.com/books/refactoringDatabases.html). The main message of this book is that while refactoring code to an extend has become fully automated, refactoring databases and database code is a painful and altogether manual activity. However, with the proper approach this activity can become manageable, in fact, it too can be a part of the ordinary activities of a programmer. If you are not familiar with the book, a short version of the book is available at http://www.agiledata.org/essays/databaseRefactoring.html and is worth a read.

So why would you want to refactor your database? Just as there exist code smells in code, there exist database smells in databases.

  1. Badly chosen names or use of inconsistent naming conventions of column/table names.
  2. Properly implementing changing requirements or new functionality, may require refactoring.
  3. Multi-purpose tables/columns, may be split into several tables/columns
  4. Redundant data in the form of redundant columns.
  5. Tables with many columns.
  6. Tables with many rows may be split "vertically" by moving columns into other tables, or split "horizontally" by moving rows into other tables.
  7. Dreading change. As Scott W. Ambler puts it, if you're afraid to change your database schema because you're afraid to break something, then that's the surest sign that you need to refactor your schema.

Taking a generative approach

Before IDE's supported refactorings, refactorings were tedious to carry out. Automatic search/replace strategies did not suffice for obvious reasons. The lack of context (semantics of the language under refactoring) prevented the search/replace from knowing when to replace and when not to. Quite analogous are the problems we face when refactoring databases (from a coding perspective atleast), there are many more evil pitfalls in database programming).

Our thesis is that in order for database refactoring to be part of the every-day practices of a programmer, tools need to support the programmer. SQL Orm is one such tool. The rest of the article will explain how SQL Orm can support your refactorings.

One of the reasons database interaction is hard is because the link between the code and the database schema is so loosely defined. It is easy to make spelling mistakes in the SQL or in the column names of ResultSets. I therefore suggest a stronger relationship between code and database. Establishing such a link must be automated in order to be feasible. In the past I have quite unproductively manually defined constants mapping between database tables and ORM mapping names. The advantages a generative when using SQLOrm are many fold

  • IDE support (Auto-completion) on DB table and column names.
  • IDE support for compile-time checking on DB names (these errors even show up duing editing in most IDE's). No more accidental misspellings.
  • Easy refactoring of databases using the IDE's refactoring capabilities.
Hence programming towards the database layer has the same qualities in terms of fluency and integration as programming towards other classes has.

In this article we'll go through a very simple data base schema (called myapp) and show how SQL Orm can leverage your productivity, and ensure quality and agility through a very slight change in your work process.

Database refactoring in 4 easy steps

The core of the idea is that:
  1. You use the tool MetaDataDump to generate Java constant classes from your database schemas. This should be a part of your build script.
  2. All your interactions with the DB in the code must use the generated constants, e.g. when building SQL queries or handling ResultSet
  3. Every time you want to refactor your DB, you first change the generated constants and resolve any compilation errors that may occur, then your database
  4. Finally you delete all generated constants and re-generate them during the next build, double checking the changes you have done in the code is equivalent to the changes performed in the database.
The process is far from fully automatic, and I strongly advice you familiarize yourself with the DB refactoring book and its online resources http://www.agiledata.org/essays/databaseRefactoringCatalog.html. While the skills taught in the book are not overwhelmingly complicated, they are good practices needed in the programming trenches.

The fact that SQLOrm only provides limited support should not put you off. Would you really trust any tool that promised fully automated changes in both your DB and all your applications?? I sure wouldn't! My concerns are backed up by other people I've met in industry. I've met DBA's even being afraid of cascading effects on foreign key tables, such as cascading deletes. Often, the more a tool does for you, the more it is in your way when you want to do things in a way not envisaged by the tool. At least, that's one of the principles on which SQLOrm is designed.

Example 1. Taking baby steps

Let's explain the refactoring process through concrete examples. While the examples should be easy to digest, just reading them won't give you the full flavor of control or convey the experience of compile-time safety and the auto-completion smoothness to database interactions.
project structure Thus you should seriously consider try out the examples in your favorite IDE as you go along. We assume the following project structure which should be of little surprise to most seasoned programmers. We have a src folder with sensible packages. A lib folder containing sqlorm.jar and your database driver and finally, a bin folder for the compiled classes.

I've deliberately put all generated code under the explicatory folder generated/dbinfo. The name clearly states that the files in this folder probably shouldn't be put in the repository, and can be deleted as often as needed.

Step 1. Initial setup

Assume you have a schema myapp and a table address within it. The following code is MySQL specific, but I'm sure you'll be able to fit this into your current DB.
CREATE TABLE myapp.address (
  id INT(10)  UNSIGNED NOT NULL AUTO_INCREMENT,
  street1     VARCHAR(45) NOT NULL,
  street2     VARCHAR(45) NOT NULL,
  PRIMARY KEY (id)
);
Along with this you have the domain object com.project.domain.Address
package com.project.domain;
public class Address {
    int id;
    String street1;
    String street2;
    
    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }
    public String getStreet1() {
        return street1;
    }
    public void setStreet1(String street1) {
        this.street1 = street1;
    }
    public String getStreet2() {
        return street2;
    }
    public void setStreet2(String street2) {
        this.street2 = street2;
    }
}

1.1 Build script

Although we don't really have anything to compile as of now, any sensible project can be build automatically. We use ANT as our building tool. Here's our build.xml
<project name="db_refactoring_example" default="compile" basedir=".">
    <property name="dir.src" value="src/" />
    <property name="dir.build" value="bin/" />
    <property name="dir.lib" value="lib/" />

    <path id="classpath">
        <fileset dir="${dir.lib}">
            <include name="**/*.jar" />
        </fileset>
    </path>

    <target name="generate_db_constants">
        <java classname="org.sqlorm.metadatadumper.ConstantsDump" >
            <classpath refid="classpath"/>
            <arg line="-driver com.mysql.jdbc.Driver"/>
            <arg line="-url jdbc:mysql://127.0.0.1:3306"/>
            <arg line="-user root"/>
            <arg line="-password greatPassword"/>
            <arg line="-schema myapp"/>
            <arg line="-outputdir src/"/>
            <arg line="-javapackage com.project.generated.dbinfo"/>
            <arg line="-verbose "/>
        </java>
    </target>

    <target name="compile">
        <delete dir="${dir.build}" />
        <mkdir dir="${dir.build}" />
        <javac fork="true" srcdir="${dir.src}" destdir="${dir.build}" classpathref="classpath" />
    </target>
</project> 
Most important to notice is the target generate_db_constants. This is where we generate the constant files based on our DB layout. We are basically calling the class org.sqlorm.metadatadumper.ConstantsDump in the sqlorm.jar, which equally could have been done from the command line, in Maven, etc. While the class takes many arguments, they should all be self-explanatory.
Notice that if you need to dump DB layout from more than one schema, you must comma-separate (with out spaces) the schema names.
Also notice that the -verbose is optional and is most useful when setting up the tool the first time.

The only real obstacles should be finding the name of your database driver class and the connection url. To help you get started, the following table show databases and driver/connection information (please contact me with info on how to connect to your favorite db)

Database DriverExample URL
Derby org.apache.derby.jdbc.EmbeddedDriver jdbc:derby:myDB
H2 org.h2.Driver jdbc:h2:file:C:/coding/h2db
MySql com.mysql.jdbc.Driver jdbc:mysql://127.0.0.1:3306
Oracle oracle.jdbc.driver.OracleDriver jdbc:oracle:thin:@127.0.0.1:1526:orcl
Postgre SQL org.postgresql.Driver jdbc:postgresql://127.0.0.1:5432

Quite deliberately, the compile task does not depend on the generate_db_constants task. This has mainly to do with efficiency on larger builds, as re-generating the constants may provoke a re-compilation of most of your application. Conversely, for smaller projects, or build targets that always builds from scratch each time (i.e. no incremental compilation) can benefit from always generate the constants to ensure better consistency. Remember, the quicker you get an error from doing a change, the quicker it is for you to deduct possible causes of error and find possible solutions. Experiment and find out what best suits your needs.

Step 2. Generating the constants

In order to build the constants, you should execute both the build script's targets.

> ant 


> ant generate_db_constants

If everything went well, you should now have the class
package com.project.generated.dbinfo;

/**
 * This class is auto generated by SQL ORM.
 * Generated: Tue Apr 15 00:45:55 GMT 2008
 * 
 * DO NOT EDIT UNLESS WHILST REFACTORING!!
 */
public class MyappMetaData {
    public static final MyappADDRESS_TABLE ADDRESS = new MyappADDRESS_TABLE("address");
    public static class MyappADDRESS_TABLE implements org.sqlorm.metadatadumper.ITableName {
        public final ADDRESS_COL _STAR           = new ADDRESS_COL("address","*");
        public final ADDRESS_COL ID              = new ADDRESS_COL("address","id");
        public final ADDRESS_COL STREET2         = new ADDRESS_COL("address","street2");
        public final ADDRESS_COL STREET1         = new ADDRESS_COL("address","street1");
        // implementation
        private String tableName;
        private MyappADDRESS_TABLE(String tableName) { this.tableName = tableName; }
        public String toString() { return tableName; }
    }

    // uninteresting stuff below... 
     
    public static class ADDRESS_COL implements org.sqlorm.metadatadumper.IColumnName {
        private String tableName, columnName;
        private ADDRESS_COL(String tableName, String columnName) { 
            this.tableName = tableName;
            this.columnName = columnName;
        }
        public String _() { return columnName; }
        public String at(String tablePrefix) { return tablePrefix+"."+columnName; }
        public String toString() { return tableName +"."+columnName; }
    }
}
As can be seen from the generated code, time has been spent on making the generated code as accessible as possible. After all, you will need to edit/navigate it when refactoring.
  • Tables and columns are alphabetically ordered.
  • Visually each table is assembled in one solid block of code to make it easily distinguishable.
  • Formatting of code and special indentation on columns.
  • Sensible and concrete class names of the generated classes (eg. ADDRESS_COL rather than COLUMNAME).
  • Uninteresting implementation stuff is moved to the bottom of the code - out of sight, out of mind.
  • Header contains date of generation in order to quickly spot if the generated constants are out of date.
Looking closer at what has been generated we see:
  • A constant ADDRESS for the table ADDRESS
  • A constant for each column in the table as fields of the ADDRESS table.
  • A special _STAR constant as a shortcut for address.*. We'll see in a use in a moment.
  • Each column can return its qualified name (ie. table.column) or its short name (column) using the _() method.
  • Finally, a column can be prefixed with a different table name using the at() method. This is needed for aliasing tables.
While it may look daunting, in actuality, you'll be using these generated constants quite naturally when programming. You'll find that doing SQL just "flows" as names are suggested/auto-completed by your IDE. You only need to look at the generated classes when refactoring, and in such a situation you know what class/column is of interest and should be able to quickly find it.

Step 3. Creating the DAO

The next step is to create a dao implementation. The first thing we do is to statically import the generated ADDRESS table. Then we simply create a SQL string and fire it to the JDBC. Notice how the string sql is constructed using the ADDRESS constant table and specifically the shortcut ADDRESS._STAR.

We use the _() method when interacting with the ResultSet in order to fetch the column name. While a bit clumsy, using the constants proves useful not only in that we get compile-time checks on the spelling, it also helps us when refactoring (as we shall see in due time).

Simple String + JDBC

package com.project.dao;

import static com.project.generated.dbinfo.MyappMetaData.ADDRESS;
import com.project.domain.Address;
import java.sql.*;

public class AddressDao {
Connection conn;
public void setConnection(Connection conn) { this.conn = conn; }

public Address getAddress(int id) throws SQLException {
    final String sql = " SELECT " + ADDRESS._STAR 
            + " FROM " + ADDRESS 
            + " WHERE " + ADDRESS.ID + " = " + id;

    ResultSet rs = conn.createStatement().executeQuery(sql);
    if(rs.next()) {
        Address adr = new Address();
        adr.setId(rs.getInt(ADDRESS.ID._()));
        adr.setStreet1(rs.getString(ADDRESS.STREET1._()));
        adr.setStreet2(rs.getString(ADDRESS.STREET2._()));
        return adr;
    }
    return null;
}
}
Note that the above code lack error handling, proper resource handling etc, however, none of this is the focus of the article, so bear with me. What's interesting is the easy integration even with the most low-level database interaction.

The SQLOrm and ResultSetForMetaData alternatives

There are two things we can do to the above code to make it a bit safer and a bit more fluent (both to read, write and later change). Firstly, we can generate the SQL using the SQL Orm dynamic query builder. Secondly, we can use the helper class ResultSetForMetaData, which improve ResultSet interactions. The two improvements are independent of each other.

Using the query builder brings some sanity checks into the picture, e.g. you can't use a table constant on a position in the SQL requiring a column name. Furthermore, it allows auto completion and some compile time checks since the SQL is constructed from method calls rather than raw string concatenation. See this link for more information on the query builder.

The ResultSetForMetaData is simply a wrapper around the ResultSet. Let's have a look at how the code alternatives

package com.project.dao;

import static com.project.generated.dbinfo.MyappMetaData.ADDRESS;

import java.sql.*;
import org.sqlorm.metadatadumper.ResultSetForMetaData;
import org.sqlorm.querybuilder.*;
import com.project.domain.Address;

public class AddressDao {
Connection conn;
public void setConnection(Connection conn) { this.conn = conn; }

public Address getAddressDynQ(int id) throws SQLException {
    // Dynamic Query Builder
    String sql = new SelectBuilder().select(ADDRESS._STAR) 
            .from(ADDRESS) 
            .where().and("# = #", ADDRESS.ID, id) 
            .toSql();

    // ResultSetForMetaData s
    ResultSetForMetaData rs = new ResultSetForMetaData(conn.createStatement().executeQuery(sql));
    if(rs.next()) {
        Address adr = new Address();
        adr.setId(rs.getInt(ADDRESS.ID));
        adr.setStreet1(rs.getString(ADDRESS.STREET1));
        adr.setStreet2(rs.getString(ADDRESS.STREET2));
        return adr;
    }
    return null;
}
}

Step 4. Refactor!

Phew! We have finally come to the point in the example where we are going to do some fun. Let's refactor the code, and let's have fun doing so. Remember, if you're afraid to change your database schema because you're afraid to break something, then that's the surest sign that you need to refactor your schema. Learn to embrace change.

Let's start out with some really simple stuff.

Database refactoring: Rename column

Possibly the simplest database refactoring you can perform is to rename a column. Assuming your application is the only one using the database table, and assume no use of stored procedures, triggers or views (yes, let's keep things simple for now :-). We can safely rename the column address.street2 to address.zip following the below recipe.
  • >Rename the generated constant (not its content!) using the IDE refactoring.
  • Rename the column in the database.
  • Delete the generated constant classes.
  • Regenerate the constants classses as part of the compilation, and verify that the compilation and test suite do not fail.

Let's take it one step at a time.

  1. We rename the constant in our IDE and take advantage of its context awareness (i.e. it won't just rename all occurrences of "street2" in our code).
    img/database_refactoring_rename_street2.JPG
    The result is:
    public static class MyappADDRESS_TABLE implements org.sqlorm.metadatadumper.ITableName {
        ...
        public final ADDRESS_COL ZIP = new ADDRESS_COL("address", "street2");
        ...
    
    Magically other places in the code has also changed, namely our DAO implementation
    ...
    public Address getAddressDynQ(int id) throws SQLException {
        ItfSelectBuilder sql = new SelectBuilder().select(ADDRESS._STAR) //
                .from(ADDRESS) //
                .where().and("# = #", ADDRESS.ID, id);
    
        ResultSetForMetaData rs = new ResultSetForMetaData(conn.createStatement().executeQuery(sql.toSql()));
        if(rs.next()) {
            Address adr = new Address();
            adr.setId(rs.getInt(ADDRESS.ID));
            adr.setStreet1(rs.getString(ADDRESS.STREET1));
            adr.setStreet2(rs.getString(ADDRESS.ZIP));
            return adr;
        }
        return null;
    }
    
    Notice how we do not change the content of the constant! This is quite important, as it will assists us remembering to re-generate the constants after in achieving consistency. If we forget to re-generate the constants, ultimately our queries will fail.
  2. Rename the column in the database
      ALTER TABLE myapp.address 
      CHANGE COLUMN street2 zip VARCHAR(45) NOT NULL;
    
    Remember to consult the database refactoring book and the steps it recommend you taking rather than this very direct way suggested here.
  3. Re-generate and re-compile. In order to ensure consistency, we need to re-generate the constants. We do this by first deleting the generated source folder (hence the explicit naming of the folder), and then running our two ANT targets.

    Since this step is detached from the code base, compilation errors will occur if we have renamed the constant differently in the Java code than in the DB.
  4. After an error-free compilation, re-run your test suite and you're done! :-)
  5. Example 2. Adding more meat

    We have now been through the basic idea of using generated constants and seen how programming SQL became easier, and we've witnessed that renaming columns is straight forward. Now let's add some more meat to the example and explore other kinds of refactorings.

    Adding an extra table

    We now add an extra table to our database. This table has carefully been crafted to cause a bit of problems ;-). It has two references to ADDRESS, a shipping address and a billing address. This require us to make acquaintance with the at() method.
    CREATE TABLE myapp.person (
      id          INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
      name        VARCHAR(30) NOT NULL,
      age         INT(3) DEFAULT NULL,
      fk_bill_adr INT(10) DEFAULT NULL,
      fk_ship_adr INT(10) DEFAULT NULL,
      PRIMARY KEY (id),
      FOREIGN KEY fk_adr1 (fk_bill_adr) REFERENCES address (id),
      FOREIGN KEY fk_adr2 (fk_ship_adr) REFERENCES address (id)
    );
    
    To fetch a PERSON with both addresses attached, we need to join the three tables and use aliasing for the two address tables. While I prefer to use INNER JOINS, many less sql-skilled programmers prefer the normal join (which I've chosen for this example implementation).
    public String getPersonWithAddress_dqb(int id) {
        ItfSelectBuilder sql = new SelectBuilder() //
                .select(PERSON._STAR).select("shipping.*").select("billing.*") 
                .from(PERSON).from(ADDRESS, "shipping").from(ADDRESS,"billing")
                .where().and("# = #", PERSON.ID, id)
                .and("# = #", PERSON.FK_BILL_ADR, ADDRESS.ID.at("billing"))
                .and("# = #", PERSON.FK_SHIP_ADR, ADDRESS.ID.at("shipping"));
        return sql.toSql();
    }
    
    We can experiment a bit with the extract constant refactoring in Eclipse and end up with something like
    private static final String SHIPPING = "shipping";
    private static final String BILLING = "billing";
    
    public String getPersonWithAddress_dqb(int id) {
        ItfSelectBuilder sql = new SelectBuilder() //
                .select(PERSON._STAR).select(SHIPPING+".*").select(BILLING+".*") 
                .from(PERSON).from(ADDRESS, SHIPPING).from(ADDRESS, BILLING)
                .where().and("# = #", PERSON.ID, id)
                .and("# = #", PERSON.FK_BILL_ADR, ADDRESS.ID.at(BILLING))
                .and("# = #", PERSON.FK_SHIP_ADR, ADDRESS.ID.at(SHIPPING));
        return sql.toSql();
    }
    
    which gives the following result when binding the variable id = 12
    SELECT person.*, shipping.*, billing.* 
    FROM person, address AS shipping, address AS billing
    WHERE person.id = 12 
      AND person.fk_bill_adr = billing.id 
      AND person.fk_ship_adr = shipping.id
    
    The code is not perfect, but still very readable and easy to refactor in the future. The example shows that the current framework has its limits, but also that much can be achieved through the IDE's existing refactoring facilities.

    Step 7. Using views

    The ConstantsDump tool generates constants for views the same way it generates constants for tables. Hence you can use/access views the same ways we have explained how tables are used/accessed.

    Step 8. Download

    SQLOrm is freely available under the apache license. Download it here...

    Step 9. More info

    I'm currently working on several articles on how to efficiently making use of the ConstantsDump tool. These, however, are still work in progress. I'll publish them as soon as they are finished..
    • "database refactoring patterns" which goes through an arsenal of refactorings and how they can be done using ConstantsDump
    • "Automatic refactoring of schema setup"..

    Feedback

    Feedback is very welcome. I need your input to further improve the product. For example in the Source forge Discussion Forum.