Using Reflection to Automatically Map Objects to a Database

John M. Hammer, Charlie Hubbard, and Raveendra Gella

Internet Security Systems
6303 Barfield Road
Atlanta, GA 30328

{jhammer, chubbard, rgella @ iss.net}

404 236 2805

February 16, 2001

Introduction

What is the least amount of code needed to read and write objects from and to a database? Consider the simple class in Listing 1. With just a few lines of code (Listing 2), a programmer can de-serialize an object from a file. Listing 3 shows how our persistence manager can be used to read the same object from a relational database. The persistence manager uses reflection to discover the attributes of classes and thus generate automatically the SQL statements needed for inserting, deleting, updating, and selecting objects in a database. This article discusses how reflection can be used for persistence management and the issues that arise from this simple, serialization-like approach.

class Person implements Serializable, Persistable {
private String name;
private int age;
public Person() {}
}

Listing 1. A simple Person class.

InputStream in = new FileInputStream(person.ser);
ObjectInputStream objectStream = new ObjectInputStream(in);
Person father = (Person) objectStream.readObject();

Listing 2. Reading a serialized object.

PersistenceManager pm = new PersistenceManager(
Person.class,name,jdbc:odbc:database);
Person father = (Person) pm.select(George);

Listing 3. Reading an object using a reflective persistence manager.

Applicability of the Reflective Persistence Manager

Many software systems have a fair number of classes that require persistent storage. A large fraction of these classes has relatively modest throughput requirements. Examples include classes for configuration data, error logging, etc. Often, there seem to be more classes with modest throughput requirements than classes with high throughput requirements. This suggests a role for automatically generated code, which, while possibly less efficient than hand-written code, is still fast enough given the relatively low frequency of use. It is these classes for which the persistence manager was intended.

Goals for the Persistence Manager

The goal for our persistence manager was to build a simple, straightforward mechanism for persisting objects. In particular, persistence should require the same amount of code as serialization. The goals for this article are to explain the design, compare and contrast with serialization, and explore the many consequences of using reflection for persistence. Some obscure aspects of reflection will be discussed. The code in the article was taken from a production product. To limit the discussion to essentials, the code was modified to eliminate error handling, exception handling, and JDBC resource cleanup.

Persistence with Serialization versus Relational Databases

One possible, simple solution to the persistence problem is to store serialized instances in a file. While simple, it has several disadvantages. First, it will not easily scale to distributed processors each with its own file system. The resulting problem with file system replication is worse than the programmatic simplicity of the code.

Another problem with serialization is versioning. As the class evolves, reading old versions or multiple old versions can be troublesome. The difficulty is that serialization, for all its simplicity, has a rigid mapping to serialized form. Influencing this mapping is impossible without writing additional code. A properly designed persistence layer can decouple the persistent storage layout from the object layer.

An advantage of a relational database is that it scales better to distributed processors. The disadvantage of the relational database is that reading and writing objects is more complicated and requires developers to write code. Specifically, the developer must create embedded SQL statements or equivalent stored procedures to read and write objects. While this SQL is generally not complicated to write, there is a lot of it. Further, each class has similar code. This violates the principle of once and only once [Beck 1999]. Further, good programmers are lazy. Repeatedly writing similar code, and worse yet, maintaining it, is not an especially attractive task.

Review of Reflection

This section reviews those aspects of reflection, particularly java.lang.Field, that are used by the persistence mechanism. It can be safely skipped by those well versed in reflection.

The best way to understand Figure 1 is through an example. Consider the class Person in Listing 1. Once this class is loaded by the virtual machine, an instance of Class is created to describe Person. Class.getDeclaredFields can be used to return Fields that describe name and age. The Field.getModifiers method can be used to discover that name and age are private. After calling AccessibleObject.setAccessible, the methods Field.get and Field.set can be used to read and write the values in underlying instances or static variables. An instance of Constructor can be created to describe the no argument constructor in Person. The Constructor.newInstance method may be invoked to invoke the underlying constructor Person.Person() and thus indirectly create a new instance of Person.

The Manager Design Pattern

One of our frequently used design patterns is the Manager [Sommerland 1998]. The manager is responsible for the lifecycle (e.g., creation, deletion, and retrieval) of the objects it manages. We have found it especially useful for objects that are stored persistently in a database. One major advantage of managers is that they isolate application logic from backend storage concerns. For example, objects may have been stored initially in a flat file in XML or in serialized form, then transitioned to a RDBMS. This can be done by revisiting the internals of the manager. One of the most common shortcomings of software is direct coupling to the persistent storage, typically a RDBMS. Once this has been done from a large enough number of places, database schema changes become nearly impossible to make because of the amount of resulting broken code. Managers are one way to de-couple designs and thus avoid this problem. Managers should not be used in situations where high performance or specialized queries are required.

A RDBMS can be considered to be a set of classes with only public data members. In C, this would be a set of structs. We would never code an application with a set of structs for the important, persistent data; it is unclear to us why we would want to interact with a database that is logically no different.



Figure 1. Partial reflection class diagram.

The Persistence Manager

The persistence manager is an adapter to a relational database and uses reflection to map objects to and from the database. Only the PersistenceManager and Persistable interfaces are seen by client classes. The responsibilities of the classes are as follows. The PersistenceManager interface describes the methods that clients have for reading and writing objects to/from a database. Persistable is a marker interface (i.e., interface Persistable {}) that marks objects capable of being managed by the PersistenceManager. It is analogous to the Serializable interface. PersistenceManagerImpl is responsible for implementing PersistenceManager. TypeField and CollectionTypeField adapt and extend Field to meet the needs of PersistenceManagerImpl. SqlClauseGenerator generates strings that are common to a variety of SQL statements.



Figure 2. The persistence manager.

The PersistenceManagerImpl Constructor

The constructor is the most complex method of the entire class. It is here that all reflection and SQL generation takes place.

The constructor has three arguments: the Class, the string name of the key attribute, and the string name of the database. The Class is used to retrieve reflection data. The key name is used to identify the key in the Fields that are returned by the reflection mechanisms. The database name is used to open a JDBC database connection.

The constructor finds all of the Fields and processes them in a series of three steps.

Collecting Fields

The first step is to find all of the attributes/Fields of the class and its superclasses. This a simple iteration that climbs the class hierarchy.

Vector allFields = new Vector();
for(Class climber = aClass;
climber != null;
climber = climber.getSuperClass() ) {
Field[] fields = climber.getDeclaredFields();
for(int i=0; i<fields.length; i++)
allFields.add(fields[i]);
}

Listing 4. Gathering Fields up the class hierarchy.

Processing the Primitive and Collection Fields

Once all Fields have been found, three things must happen. The key is found by matching its name to the name passed to the constructor. The key must be segregated from the other primitive attributes (the ordinary attributes). Finally, the Fields representing collections (e.g., hash table, list, vector) must be found. All Fields are wrapped in an adapter, either TypeField for primitive types (e.g., int) or CollectionTypeField for collections.

TypeField keyTypeField;
Vector ordinaryTypeFields = new Vector();
Vector collectionTypeFields = new Vector();

for(int j=0; j < allFields.size(); j++) {
Field field = allFields.elementAt(j);
if(field.getName().equals(keyFieldName))
keyTypeField = TypeField.makeTypeField(field);
continue;
TypeField typeField = TypeField.makeTypeField(field);
if (typeField != null)
ordinaryTypeFields.add(typeField);
continue;
};
CollectionField collectionField = CollectionField.makeCollectionField(
field,keyTypeField);
if (collectionField != null) {
collectionTypeFields.add(collectionField);
};
};

Listing 5. Processing the Fields: key, primitives, and collections.

Generating SQL Statements

Once the Fields have been converted into TypeFields and CollectionFields, SQL statements can be generated. This section will discuss only the generation for TypeFields; CollectionField generation is more complex and will be covered later. The first step is to create a SqlClauseGenerator that produces the strings common to a variety of SQL statements.

SqlClauseGenerator generator = new SqlClauseGenerator(
aClass,
keyTypeField,
ordinaryTypeFields);
String selectStatement = selectSqlStatement(generator);

Listing 6. The SQL clause generator.

The actual SQL generation is fairly straightforward, as shown below.

private String selectSqlStatement(generator) {
return SELECT  + generator.columnNameList() +
 FROM  + generator.tableName() +
 WHERE  + generator.keyName() +  = ? ;
}

Listing 7. Generating a select statement using the SQL clause generator.

The method columnNameList generates a string consisting of the keys name, a comma, and all of the ordinary attributes (without a trailing comma). The method tablename returns the class name (alternative mappings from class name to table name are discussed later in the paper). The method keyname is of course the string name of keyTypeField. The generated SQL statement is intended to be used in a JDBC PreparedStatement; hence, the string  = ?  leaves a way to insert the desired value of the key using the method PreparedStatement.setObject(1,someObject). The initial version of the PersistenceManager generated SQL for JDBC Statements at every database operation. Later, this was changed to generate SQL for PreparedStatements once in the constructor, and this turned out to be simpler and almost certainly faster.

Although the actual mapping of data to/from an object is not done in the PersistenceManager constructor, it is worth seeing here how this is done.

public Persistable select(Object key) {
selectStatement.setObject(1,key);
ResultSet resultSet = selectStatement.executeQuery();
resultSet.next();
Persistable persistable = createPersistable();
keyTypeField.set(persistable,key);
for(int k=0; k<ordinaryTypeFields.size(); k++) {
Typefield field = (TypeField)ordinaryTypeFields.elementAt(k);
String fieldname = field.getName();
Object columnValue = resultSet.getObject(fieldName);
field.set(persistable,columnValue);
}
}

Listing 8. Reading and moving data from ResultSet to Persistable.

The setObject method (line 2) stores the keys value into the position occupied by the first question mark (Listing 7). The createPersistable method (line 5) calls a Constructor.newInstance method (not shown) to create a new Persistable object. The keyTypeField.set method (line 6) stores the keys value into the Persistable. The for loop (starting at line 7) stores the remaining values into the object.

TypeField

The responsibility of this class is to provide services surrounding naming and access for primitive data types. This class adapts and extends the built-in class java.lang.reflect.Field. There are two related types of names: the Java language name and the database column name. While the mapping from Java name to database name is fairly trivial in the sample code, it need not always be. It is easy to imagine names being completely different between the program and the database, with the PersistenceManager responsible for bridging the difference. We have assumed here that the attribute maps to a single column. If this were not true, the easiest way to handle it with this technology would be to read the database columns as they are and map them to the desired form after reading and before writing.

The second service is access. The chosen approach to access was direct reading and writing, which is permitted if Field.setAccessible(true) succeeds (does not throw a SecurityException). If successful, the methods Field.get(Object) and Field.set(Object o, Object value) will directly access memory to get and set values even if the underlying attributes are declared to be private or protected. In other words, private and protected attributes are not really hidden if reflection is used.

For private and protected attributes to be truly hidden, a security manager must be installed. The implementation of Field.setAccessible checks for the permission ReflectPermission. If lacking, setAccessible throws a SecurityException and no changes are made. Normally, if a SecurityManager were to be used, this permission would be granted only to the TypeField class. Were that to be the case, the call to setAccessible would be written as:

AccessController.doPrivileged(
new PrivilegedAction() {
public Object run() {
foo.setAccessible(true);
}
}
);

Listing 9. Special invocation of setAccessible.

The reason for this style of call is that only TypeField must have the appropriate permission, rather than TypeField and all of the classes calling into it.

An alternative to direct read/write access would be access through get/set methods on any object implementing Persistable, which would be more consistent with Java beans. We elected not to do this because it is more consistent with serialization not to require getters and setters. It would be a simple matter, however, to make these changes within TypeField, and doing so would eliminate the security issues already raised. We did not experiment with the performance implications of get/set versus reflection.

A third responsibility contemplated for TypeField proved to be unnecessary. We originally expected that the primitive types float, int, boolean, etc. would each require a subclass of TypeField. This was expected because the second argument to the set method was expected to be one of these primitive types. As it turned out, JDBC access to elements of a result set can be accessed as Object, although they may be primitives. Thus, differences between primitives can be ignored. There are probably performance hits here because JDBC is forced to construct Float, Integer, Boolean, etc. objects for each access.

The methods persistablePrimitive and makeTypeField has some interesting aspects. If the Field is defined as final, static, or transient, it is considered unsuitable for persisting in a relational database. Final variables are constants; static variables are part of a class, not part of instances. The transient modifier is used to mark attributes that serialization is supposed to ignore. We thought it sensible to ignore them as well in the PersistenceManager.

public static TypeField makeTypeField(Field field) {
if (!persistablePrimitive(field))
return null;
Class fieldType = field.getType();
if (fieldType == java.lang.String.class ||
fieldType == java.lang.Integer.class ||
fieldType == java.lang.Integer.TYPE ||
fieldType == java.lang.Long.class ||
fieldType == java.lang.Long.TYPE ||
fieldType == java.lang.Byte.class ||
fieldType == java.lang.Byte.TYPE ||
fieldType == java.lang.Character.class ||
fieldType == java.lang.Character.TYPE ||
fieldType == java.lang.Float.class ||
fieldType == java.lang.Float.TYPE ||
fieldType == java.lang.Double.class ||
fieldType == java.lang.Double.TYPE ||
fieldType == java.lang.Boolean.class ||
fieldType == java.lang.Boolean.TYPE ||
fieldType == java.sql.Timestamp.class ) {
return new TypeField(field);
}
return null;
}

Listing 10. makeTypeField factory method.

private static boolean persistablePrimitive(Field field) {
int modifier = field.getModifiers();
return ! (Modifier.isFinal(modifier)) &&
! (Modifier.isStatic(modifier)) &&
! (Modifier.isTransient(modifier));
}

Listing 11. Checking for a Field that should be persisted.

The other unusual aspect of makeTypeField is that constant java.lang.Integer.TYPE and its parallels. This class corresponds to the primitive type int. It is necessary for int to have a class because the reflection system must be able to describe every Field with a type. Therefore, while int is a primitive, it has a class.

Handling Collections and the CollectionTypeField

Collections are concerned with references from one object to many objects. Examples of collections include java.util.Vector and java.util.Hashmap. The collection itself is handled with a resolution table, which looks like:

FromKey

ToKey

ClassName

Ordering

able

baker

B

0

able

charlie

C

1

Table 1. Resolution table example for class A, vector foo

This resolution table example depicts that situation shown in Figure 3. In it, an instance able has a Vector foo. The zero element of foo refers to baker, and the first element of foo refers to charlie.



Figure 3. Situation described by resolution table example.

The responsibility of the CollectionTypeField class is primarily that of managing the resolution table. That is, when creates, inserts, updates, and deletes occur, the resolution table must be modified to reflect the state of the object. The problem of reading the objects named in the container is delegated to the PersistenceManager for that class. For example, in the example of Figure 3, the retrieval call to As persistence manager causes a retrieval call to the persistence manager for B and for C. The need for delegation explains the third column in the resolution table and the need to map this class name into the appropriate persistence manager.

Consider the code for creating a SQL select statement and for reading a Vector.

private String createSelectSql() {
return SELECT FromKey, ToKey, ClassName, Ordering +
 FROM  + resolutionTableName +
 WHERE FromKey = ?  +
 ORDER BY Ordering ; 
}

private void read(Persistable persistable) throws Exception {
Collection collection = (Collection) field.get(persistable);
if (collection == null) collection = new Vector();
field.set(persistable, collection);
collection.clear();
selectStatement.setObject(1, keyTypeField.get(persistable) );
ResultSet rs = selectStatement.executeQuery();
while( rs.next() ) {
String toClass = rs.getString("Class");
PersistenceManager toManager = this.getManager( toClass );
Object toKey = rs.getObject("ToKey");
Persistable to = toManager.read(toKey);
collection.add(to);
}
}

Listing 12. Creating SQL code for reading a Vector.

The following points are worth noting. We cannot assume that the no-argument constructor allocated containers; hence, there is a check for a null Vector. Second, all managers are registered and retrievable based on a class name. Third, the instances stored in the Vector must be instances of Persistable and have managers defined; otherwise, an error will occur. Fourth, there is no difficulty handling collections of mixed types as long as all are Persistable and have managers. Finally, the Vector after reading is not identical in every respect to the Vector that was present at writing. For example, if the Vector was pre-allocated with large amounts of extra space, that space is not guaranteed to be present when read back in. Similar distortions take place with hash tables.

Issues related to Object-Relational Mapping and Serialization-like Mapping

Caching and Object Identity

Object-oriented programs typically assume a form of object identity in which an object is uniquely identified by its address in memory. Refer to Figure 3. No matter what, object B:baker appears only once in memory. Thus, when the read is made to the manager, it is likely that a cache is consulted first to see if the object has already been read in. If so, that object is returned. Serialization and this persistence mechanism differ somewhat. Serialization guarantees that the same object will be read and written exactly once in a single input/output operation. In other words, if the same object is serialized three times in one output operation, only a single instance will be created when it is read back in. Repeating the read operation will, however, create a second instance. Our persistance manager offers the option of preserving object identity across many reads and writes through the use of a cache. Caching should also improve performance, but it presents other problems.

The first caching problem is discarding references to objects that are no longer needed. The easiest way to handle this is to use java.util.WeakHashMap to implement the cache. If a Persistable has no references to it other than that of the WeakHashMap, the garbage collector will remove the weak reference in the hash map and delete the object from memory. Recall that the marker interface Persistable imposes no requirements whatsoever on the object, including requirements for reference counting. This is perhaps the only way out of this problem given this starting position.

The second problem with caching is the cache coherence problem. Suppose that multiple clients of the database cache copies of the database rows. There is a problem if one client changes a row and other clients have no way of knowing this. One way to solve this problem is to centralize the server for objects and allow clients to lock at the central server. The central server can also notify clients via a subscribe-publish mechanism that cached objects have been changed by another client.

Dirty Bits

There is no indication in the persistence manager that an instance is dirty and therefore needs to be written to the database. This is an implication of the marker interface of Persistable. This responsibility is left to the client of the PersistenceManager. This is also consistent with how serialization works.

Management of Memory Lifetime

One limitation of reflection is that of determining responsibility for memory lifetime is impossible. For example, if object able refers to an object baker (Figure 3), reflection cannot determine if the deletion of able requires the deletion of baker. Although garbage collection eliminates this problem for in-memory Java objects, a relational database has no garbage collector, nor could it. In UML, this difference is termed association (no lifetime management) and composition (identical lifetime management). This difference is another example of the impedance mismatch between garbage-collected, object-oriented languages and relational databases. Because relational databases require explicit deletion, programming using the PersistenceManager becomes somewhat like C++ in that explicit deletes are required.

Our initial solution to this problem was to add a method deepDelete to the persistence manager. This method deleted an object and all objects referenced by the original object. While this worked, it disallowed mixing associations and compositions in the same object. A more general solution would be to use a Java bean approach. A persistence descriptor class could be created for every class that was persistence managed. If the class designer required other than the default behavior (probably association), then the descriptor would be written to describe those attributes that were composition. This persistence descriptor approach has the advantage of being able to answer a variety of questions that reflection is incapable of answering.

Error Handling while Reading Referenced Objects

When an error occurs deep in the PersistenceManager while reading an object referenced in a collection, throwing an exception seems less than desirable in every case. It is almost impossible for the client to recover from such an exception, and there are a variety of ways that a client might want have such an error handled:

The motivation for more subtle error handling is that retrieval from persistent memory may need to be robust in the presence of unexpected errors.

When simply throwing an exception wont suffice, another approach would be for the client to supply an error-handling strategy [Gamma et al. 1995]. If the desired object cannot be read from the database, the strategys method handleMissingObject could deal with the situation.

interface MissingObjectStrategy {
public Object handleMissingObject(
PersistenceManager toPM,
String fromKey,
String fromClass,
String toKey
String toClass);|
}

Listing 13. Interface for a strategy to handle missing object errors.

This strategys purpose is to supply a pluggable algorithm, which in this case is for handling an error.

Reflection Exceptions

When reflection is used, many types of exceptions can occur. To avoid requiring the client to catch these exceptions, they were caught inside the PersistenceManager and a PersistenceException was thrown. None of the reflection exceptions would have made any sense to the clients; all of them represented internal errors within the PersistenceManager.

Key Value Generation

Some objects required values to be generated for keys. Although it is possible to require the database to generate key values, several difficulties arise. First, caching the object becomes impossible without having its key value. Second, reading the object from the database to determine its key is problematic. A select statement employing all of the non-key attributes could be used to read a result set, but there is no guarantee that only one row will be returned. Consequently, it is simpler when creating an object in a database to first check for a key value and generate one automatically in the PersistenceManager if needed. Ambler [2001] discusses a number of strategies for generating key values.

Synchronization

Due to reflection, a very conservative practice of synchronization was adopted in the persistence manager. The practice was to allow only one thread to be running inside the entire set of persistence managers. The reason for this can be seen in the following example:

Thread 1

Thread 2

deepDelete(able)

read all Bs where B.date <= 1999.12.31.

If thread 2 retrieves Bs while thread 1 is deleting a B, the results could be non-deterministic.

Reflection forces a conservative synchronization policy because it is impossible to determine with reflection what classes can be referenced from a given class. Recall that these references may be contained in Vectors or HashMaps that cast everything as Object. One solution to this problem would be for designers to partition the persistence managed classes into non-interacting sets. While reflection cannot accomplish this, designers can provide the following information:

class PersistenceManagerImpl implements PersistenceManager {
public PersistenceManagerImpl(Class aClass, String keyName) {
this(aClass,keyName,defaultStrategy);
}
public PersistenceManagerImpl(Class aClass, String keyName,
PersistenceStrategy strategy) { & }

interface PersistenceStrategy implements
SynchronizationStrategy, MissingObjectStrategy { & }

interface SynchronizationStrategy {
public Object synchronizedObject();
}

class PersistenceStrategyImpl implements PersistenceStrategy {
public Object handleMissingObject(&) throws Exception {
throw new Exception();
}
public Object synchronizedObject() {
return PersistenceManagerImpl.class;
}
}

Listing 14. Interfaces for specifying synchronization.

Conclusion

This article has described a small class library for mapping objects to and from a relational database. To simply development, reflection is used to discover the attributes thus generate SQL statements. The library was intended to be a simple, 80% solution that would cover a fair number of classes with modest input/output performance needs. The code itself is not significant larger than that required to map a single class to a database; however, it is abstract due to the use of reflection. Of course, the database schema must still be laid out as expected by the PersistenceManager. It is possible to use reflection to generate SQL for defining tables; however, more issues arise, such as how big a varchar should be to hold a String.

Reflective object-relational mappings are not new [Castor, VisualBSF, JDBCStore, Webgain]. Doubtless, other products use reflection internally without explicitly stating this in their documentation. What is less well known is that creating a persistence mapping with reflection can be a relatively easy undertaking. An initial version of this software was written in about ten days, and subsequent development was less than a few person-months. Source code for reflective persistence mapping is also freely available and can be a useful example. In some cases, this functionality can be cost effective compared to writing lots of SQL by hand.

There are alternatives to using reflection. Some solutions manipulate graphic representations and then generate Java code to map the objects to database. Another approach would be to use Enterprise Java Beans. These solutions are somewhat heavier than the approach taken here, but they can offer significantly more functionality.

References

Amber, Scott (2001). Mapping Objects to Relational Databases (http://www.ambysoft.com/mappingObjects.pdf)

Beck, Kent. (1999). eXtreme Programming Explained: Embrace Change. Addison-Wesley: Reading, MA.

Castor (http://castor.exolab.org/)

Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995) Design Patterns: Elements of Reusable Object Oriented Software. Addison-Wesley: Reading, MA.

JDBCStore (http://www.ilap.com/lpc/html/features.html)

Sommerland, Peter (1998). Manager. Pattern Languages of Program Design 3. pp. 19-28. Addison-Wesley: Reading, MA.

VisualBSF (http://www.objectmatter.com/vbsf/docs/maptool/guide.html)

Webgain (http://www.webgain.com/products/toplink/ )

Author Information

John M. Hammer (jhammer@iss.net) is an engineering manager at Internet Security Systems. Charlie Hubbard (chubbard@iss.net) and Raveendra Gella (rgella@iss.net) are software engineers at Internet Security Systems.

28