Performance Zone is brought to you in partnership with:

I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 637 posts at DZone. You can read more from them at their website. View Full User Profile

Object-relational mapping: seriously

10.07.2013
| 10031 views |
  • submit to reddit

After this metaphorical article by Uncle Bob on the subject of the impedance mismatch between objects and relational schemas, I figured I'll write a technical explanation of the mismatch and the advantages of using ORMs that try to solve it (hint: by limiting severely what you can do).

Data model

As Uncle Bob correctly states, the mapping performed by ORM does not fill relational tables by starting from objects, but from the internal data model used by a language to represent objects in memory. This means an ORM usually requires access to all the private fields of objects, compromising encapsulation.

However, it is misleading to think that an ORM does not work on objects as input and output as it often performs much work to try to keep the object model intact. It accesses fields by reflection to avoid forcing your properties to be public; it substitutes Proxies instead of real objects to avoid the instantiation of a large object graph; and after all, it recreates instances of your favorite classes, not giving you a generic ResultSet object but conforming to your API instead.

Power

Let's try a thought experiment: suppose you are able to take a periodical snapshot of RAM (every second, for example, could be enough for many web applications with non-critical data). This means in languages such as Java and .NET you can keep, with enough memory, your whole object graph as instantiated objects instead of as rows on a disk of a database server.

Here are the relational mapping features that you lose in this situation:

  • the memory savings: you have to allocate as much RAM as is needed for all the objects, even if you use only a small working set of them at any time.
  • incidentally, this means you have to keep your objects on a single machine as the various pointers/handlers connecting them cannot cross between machines without some abomination such as RMI/SOAP/insert your favorite remote procedure call protocol. I think secondary servers (a read-only copy of your object graph) and sharding (partitioning Aggregates into your object model) could still be possible.
  • querying on B-trees or other indexing data structures cannot be performed. By default, every search on your object model different from finding by id is a linear search; unless you take the time to introduce and maintain several additional data structures in your Repository classes.
  • you cannot perform transactions over multiple objects, even inside a single Aggregate containing just a few of them. Not only this means in case of exceptions your object may end up in an inconsistent state, but you have to resort to synchronization to avoid exposing changes to one object in the aggregate while the others still have to be updated.
  • whenever you update the code of a class on a production system, you need to hot swap the code in while ensuring retrocompatibility with the other classes involved. How to swap it in is a non-trivial problem and requires an API in your object model (Erlang does this with functions, safely).

Limitations

On the other hand, to provide all these features ORMs and relational databases put strong constraints on your object model. Now think of your object model first, it's a very good discipline to limit of the influence of these relational constraints; but if you fail to adapt the object model to its datastore (being it relational or NoSQL-based), you fail at reality and may lose the powerful ORM features listed above (such as *querying*). This is indeed intended in some architectural styles, but all architectural styles have limits of applicability.

ORMs puts the following constraints:

  • Only Entities and Value Objects can be persisted in them, as objects modelling the state of your application and the behavior that can be kept in them.
  • This also means there is a standard bag of tricks to establish outward references from Entities and Value Objects when they are thawed from the data store (such as a Service Locator, __wakeup() nethods, or hooks in the reconstitution process).
  • Data structures which were unlimited in size while in memory such as strings, usually have to be given a maximum size for performance reasons (char(16), varchar(255) instead of longtext).
  • Objects modelling machine resources cannot be persisted since they cannot be synchronized with the machine freeing them; say goodbye to Memcache connections, or opened files.
  • There has to be a way to build an object from state instead of its public API, such as a no-arguments constructor (some languages are more flexible on this).

There are good reasons for ORMs to require this from object models they have to persist: otherwise ORMs would be impossible to build or even more complex than now, which is telling.

Conclusion

Despite all our conceptual discussions, Von Neumann doesn't care: machines only execute a list of instructions for the CPU whether they violate the architecture or not. It is a violation of "pure" object modeling to use ORMs to persist them; but so are all the nice features we get from this architectural style, and we usually don't want to renounce to them.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

László Fülöp replied on Tue, 2013/10/08 - 2:46am

I found the link to 8light useful. Great share, thanks!

The rest however reminds me when the other day I was downloading Windows 8 trial from Microsoft. Before downloading they asked me a few questions, my favorite was: "*What operating system are you running currently?" marked with "*" so it is required. And the choices are: Windows XP / Windwos Vista / Windows 7 / Windows 8 / Windows 8.1 Preview. Nothing else. They could not imagine that I can have any different OS (and I'm not talking about Win'95).

Dear Giorgio, I really liked the pros and cons you collected in your post, great considerations. But I think it news for you: there are databases that aren't RDBMS (with a high possibility of a picture with a link on the right side of the article pointing to the NoSQL zone ( http://dzone.com/mz/nosql  ).

After reading the yout post I had to reread the 8light article since it gave me the feeling that that one was suggetesting not to use a database.

Then again, to me it reads like: when using an OO language design your app by OO principles - and of course have in mind what database you use.

And if I use only memory to store data that does not limit me to a single machine. If in doubt google for "glassfish cluster memory".

I think that Uncle Bob is someone to listen to if he speaks, I respect him. On the other hand I really liked that you did not accept his opinion blindly, you listened and formed your own, which is a really great thing to do!

Seriouly yours:

László


Ivano Pagano replied on Wed, 2013/10/09 - 5:14am in response to: László Fülöp

Hi there!

I start saying that Uncle Bob is often a "loud voice speaker" in the sense that he shares his opinions in strong tones, so his article could sound a bit too extreme at first read.

But reading your reply I'd say that to me you're missing the point of the 8light's article, which is not about technical representation of objects on the machine and with the language you're using, but much more on the conceptual side of things.

It's first about the contemporary notion that relational-db is a given part of any modern application. A notion that, accounting to what Bob says, has been pushed for marketing reasons for the past 20+ years as a given truth.

And then it's about how ER and OO models are not compatible, in the sense that the concept behind each model is orthogonal to the other: the idea of data stored in a table and objects passing messages are not related.

And this leads to the notion that ORM is a sort of misnomer for what actually is a storing facility for the "data"-part of your object model.

Maybe you never found yourself in a situation where the application model is so forcibly based on the persistent data schema that the consequent object model comes out fragmented and possibly incoherent. Or that you need to resort to directly using SQL to bypass the ORM relations between objects, since the queries you need would be too convoluted or needlessly expensive to execute using the ORM-language.

To avoid such situations Uncle Bob stresses the fact that RDMS is just one possible choice between many, especially these days. And that to create a sound object model you should think about the OO side first, and only after about how to persistently store the data you need, in some cases entirely avoiding an ORM.

These is what I get from the original article.

bye

Paul Merlin replied on Wed, 2013/10/09 - 5:31am in response to: Ivano Pagano

I may rephrase Ivano comment conclusion by saying that Uncle Bob say that ORM are not easy to write/read/understand/debug because of the impedance mismatch that implies complexity.

Sort of a KISS message.

Genzer Hawker replied on Thu, 2013/10/10 - 11:03pm

If you read another article of Uncle Bob - No DB (http://blog.8thlight.com/uncle-bob/2012/05/15/NODB.html), you will realize that he strongly against design the application from modeling the Dababase first.

Mohammad Nour El-Din replied on Wed, 2013/10/16 - 12:48pm

 Hi Giorgio

   1st of all thanks for initiating this discussion and for sharing the article from Uncle Bob

Despite your valuable information you added in your article allow me to point out that you missed the point from Uncle Bob's article, all what Uncle Bob was trying to say is that software designers and developers should not allow the rules imposed by the RDBMS, and hence through the usage of an ORM, affect the design of the Object Model of the system under development, he did not say that ORMs are bad or not useful he just pointed out that they might be used in a way that turn things upside down, thats how I understand what Uncle Bob says

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.