[ajug-members] OFFTOPIC: Referential Integrity via Foreign Keys
Jeff
JeffL at pobox.com
Mon Sep 13 12:04:24 EDT 2004
I would agree with most of the responses here that in general, unless you
have some good strong reason not to (like unusual performance constraints,
as have been mentioned), the more structure you can enforce inside your
database, the better off you are. It's unlikely that your application will
be the only one accessing this database. Even if it is now, it may not be
the only one for the life of the data, data almost always outlives
applications by a large margin.. In many shops, you have a mix of toolsets
and language products being used to code apps which access data. It becomes
extremely difficult to verify that the same constraints are coded into the
DAO layers in each toolset or language product, and very difficult to share
such code across disparate toolsets and languages. Even if you can manage to
do it via CORBA or something, it's much more work that simply defining some
constraints in the database, and maybe a few triggers. Along this same
line, even in homogeneous shops with all Java (or all whatever), you
probably will not be the only one writing DB code. Even if you are now,
again, you probably will not be for the life of the data. It's a big risk
to assume that everyone will always (a) know about, (b) understand well
enough to use properly, and (c) never be tempted by deadlines, etc. to
shortcut use of, your DAO code which enforces the data constraints in lieu
of referential integrity. If you have a shop full of disciplined,
methodical, mid-to-high level staff that has little turnover, excellent
communication, and little schedule or budget pressure, you could probably
make the code-based approach work rather well. Then again, given such a
dream team, you could probably make anything you cared to do work well.
Doesn't sound like too many shops I've seen to me, especially these days.
It's really a percentage game. Putting the integrity constraints in the
database guards the best against a wide variety of possible data corruption
issues with very few drawbacks, and does not depend on optimal coding habits
or conditions, but rather on simple, inexpensive, easy-to-implement,
well-understood, industry standard technology (SQL 92). It provides a
single choke point for data integrity to maintain, instead of one-per-app.
On the downside, there are drawbacks, rarely in my view serious enough to
merit jettisoning RI in your DB, but they do exist and should be
acknowledged in the interest of fairness. You will have to be more careful
in the coding of your DAO code once RI is in place. You'll no longer be
able to delete customer-related information from the database in whatever
order when deleting a customer, for example. You can't delete the customer
and then their orders. You'll get an FK violation when trying to delete a
customer who still has orders. You'll need to delete orders for the
customer first, then the customer. The more related tables, the more
careful you have to be about the order. Updates and inserts can be
similarly affected. This can be mitigated rather easily be simply defining
a stored procedure to delete/add/update a customer and having the code make
that one call. Then the DBA can do the operations in the right order inside
the SP. When stored procedures are used in this way, to batch up SQL
statements, I see little drawback. When they're used to execute business
logic, you begin to have legitimate design gripes about that. Heavy RI can
also slow down updates, deletes, and inserts for large batch-type
operations. Unless your application is full of such operations, this can be
dealt with without dumping your RI by a little extra coding around the large
batch process. Before you start it, obtain a table lock on the affected
tables (you should do this anyway for performance, if you can't, then the
table is contested and you're going to have performance problems anyway due
to concurrent activity and locking), drop the RI constraints, run your
operation on the unconstrained tables, then add the constraints back again
and release your locks. I've used this strategy successfully to accelerate
3,000,000 row inserts from a batch process on an Oracle database over 75%
while still maintaining the RI on the database for the other 99% of the
applications using that data which did not require such extreme performance,
but instead preferred data integrity.
I did see a couple of mentions of specific problems that some folks had seen
with the use of RI constraints. That at times they were poorly implemented
by non-knowledgeable DBAs who botched things up by half-baked restores and
other errors related to incomplete understanding of or implementation of,
the RI. I heartily agree that badly done, RI is going to hurt more than it
helps. But I'd suggest that's true with any technology. How'd you like to
work on a J2EE app written by people who really didn't understand or
implement the involved technologies that well? It's going to be a train
wreck, but that isn't the fault of J2EE. If you have a DBA who isn't fully
knowledgeable about RI, you'll have the same problems, and it's
staff-related, not technology-related. Also, that some table relationships
cannot be expressed by foreign keys. This is absolutely true, and not just
with the example of summary data like year-to-date totals for transactional
data in other tables and so on that was given. In those cases, sometimes
triggers can work well to provide the additional needed intelligence. In
other cases, there may be other options to deal with the issue that don't
involve either RI or code-level constraints in the code you write. In this
example, putting summary data inside database tables explicitly violates the
normalization in a hard-to-prevent-corruption way. It's pretty easy for
those summary totals to get out of agreement with the underlying data
they're intended to summarize if modifications occur to that data in any
code that doesn't also remember to update the totals. For this reason, you
often see "integrity checker" type functions built into apps that do this,
to fix totals that don't agree with underlying data any longer. Yet there
is a legitimate problem with recalculating summary data every single time
it's asked for, especially as there becomes more raw data and the process
gets slower. Yet there are other options. You could do application-level
caching of such calculated data, perhaps even inside a framework like
Hibernate to keep it out of your code. You'd gain most of the performance
without the duplicate data, with the exception of the first query from which
the cached data was generated. If that were a big enough concern, you could
load the caches at application startup on a separate thread if need be, or
with a cron job or some other method that did not require the first
interactive user to wait. Or you could even look at a product like Oracle
that supports materialized views and use those to get maximum performance
while letting the database continue to maintain the integrity of related
data outside your app.
If you've got a smallish app in which the data is less important than the
code, or are forced to use a low-end database and run locally on the user's
system, often of limited processing power, then in those cases, doing more
in the code and less in the database can actually make sense. But for
enterprise-level apps, I'd have to say, the database is the place for RI.
Jeff
----- Original Message -----
From: "Scott Smith" <support at scott-smith.com>
To: <ajug-members at ajug.org>
Sent: Sunday, September 12, 2004 2:30 PM
Subject: [ajug-members] OFFTOPIC: Referential Integrity via Foreign Keys
> Sorry for the DB question here, but I respect the opinions of the people
> here -
>
> I want to get opinions regarding the lack of referential integrity being
> enforced at the DB level. I think (in 99% of the cases) foreign keys
> should be defined to allow the DB server to enforce RI. Except for
> cases where large DBs and huge volume is pushing the DB speed limits.
> In that case, I can see doing away with FKs for efficiency. The project
> I just came into does all RI at the Data Access Layer. And I think
> that's risky for many reasons I would get into here.
>
> I don't need a lot of detail in your response. Just let me know if you
> agree that RI enforced at the DB server level is the norm and that not
> using FKs is unconventional.
>
> Thanks,
>
> Scott
>
> _______________________________________________
> ajug-members mailing list
> ajug-members at ajug.org
> http://www.ajug.org/mailman/listinfo/ajug-members
>
More information about the ajug-members
mailing list