lundi 13 août 2012

Datalog revival (for database geeks only)


In research, sometimes, a new topics rises, blooms, slows down, and perhaps dies. I have worked many years on two such topics, deductive databases and object databases. These topics never died but at some points people would laugh when you would submit a paper on one of them. There was something like the feeling of being a Dinosaur coming directly from before the Web, i.e. from the Stone Age.
I was invited last year to give a talk in a Dagstuhl workshop on Relationships, Objects, Roles, and Queries in Modern Programming Languages. I discovered a new community interested in object databases. The success of systems such as DB4o also demonstrates that object databases are back. I am not surprised: this was a great idea. (Interestingly, I was not attending that workshop but another one on workflow, because of some works on Active XML, a language in the Datalog spirit.)
Deductive database with Datalog was also a great idea. I am speaking about this here to answer to a request of a friend (Dave Maier): I'm working with Todd Green on a contribution to the book for David Warren's symposium, on the history of Datalog. One of the things we want to address is the reasons behind the resurgence of Datalog.  To set the stage for that, we probably need to talk about why interest declined in Datalog and deductive databases after the 1980's.  We're asking around for insight…
What caused the decline of Datalog? What is causing its revival?
Warning: I am not sure I am the right person to ask since I never left the boat. I have been a constant fan. Ask those who deserted why they stopped caring about Datalog? Ask the new converts why they discover it now?
I can see 3 reasons:
1.     The language is a scam.
2.     The lack of killer applications.
3.     The guru system guys shied away (because of 1-2?).
Let us elaborate on (1): the scam. This is back to the advantages of “declarative programming”. The first scam was Prolog: The language is not really declarative. The second scam was Datalog:  It is declarative, but there is not much you can do with it.
Datalog is simple and beautiful – Horn clauses. We theory guys had a ball with it. There were beautiful results to obtain even at the cost of further simplifications (e.g., monadic to be able to decide containment). But the scam is that if you want to do anything serious beyond your stupid positive first-order queries, you need more.
There was no fix that I know of for Prolog. There were fixes for Datalog. Extend the language. And this was done during the last 30 years: Updates [e.g. SA. and Vianu], Skolem [e.g. Gottlob], Constraints [e.g. Revesz], Time [e.g. Chomicki], Distribution and Trees [e.g. SA. in ActiveXML], Aggregations [e.g. Consens, Mendelzon], Delegation [e.g. SA in Webdamlog]. I am sure I am missing some.
Now we get to (2): the lack of killer apps. The main argument for Datalog was the computation of transitive closure. This was stupid. Transitive closure could easily be expressed in supported versions of SQL. The bizarrerie is that although the language was simplistic, the killer apps had to be intense. They have to be such that they cannot be easily supported by the good old relational systems. The jury is still out but we now have candidates: Declarative networking [e.g. Lou, Hellerstein et al], Data integration [e.g. Clio, Orchestra], Program verification [e.g. Semmle], Data extraction from HTML [e.g. Gottlob, Lixto], Knowledge representation [e.g. Gottlob], Business Artifact and workflows [e.g. SA., ActiveXML], Web data management [e.g. SA., Webdamlog]…
Finally, let us now consider (3): the guru system guys. These guys were often working or at least consulting for relational vendors. They were rapid at denigrating ruptures with the good old SQL engines. They did the same for object databases. It is interesting to see that some of the renewed interest in Datalog engines comes from the works of Hellerstein. A top system guy, who once wrote with Stonebraker that Datalog was trash, developing a Datalog system. This is nothing but Oedipus killing his father and bedding his mother.
Now beyond the true pleasure of fans like me to read the mea culpa of Hellerstein, it is important to observe that Joe Hellerstein (1) used many known extensions to the pure Datalog in his systems and (2) promoted his works with beautiful applications such as networking in the thesis of Boon Tau Loo.
In Webdamlog, we propose for killer apps data management on the Web. In brief, reasons for that:
1.     The Web is a graph so recursion is built in: you ask someone, who asks someone who asks you.
2.     Web users don’t want to write in a programming language. Declarative languages seem the right way to go.
But of course, Datalog is too simplistic. This is why I spent years studying extensions of Datalog for Web data management.
Wouldn’t that be cool if Datalog (properly extended) was the data language of the Web.

2 commentaires: