Links

pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Tuesday, January 15, 2008

steve vinoski on schema

2007/01/15: if you happen to read through this blog entry, please also read the comments. Steve felt a bit misquoted here, especially in the title, and I offer up my rational, and apology, but more importantly, we continue the argument :-)

As someone who believes there is some value for 'schema' in the REST world, I'd like to respond to Steve Vinoski's latest blog post "Lying Through Their Teeth: Easy vs. Simple".

Since I'm more firmly in the REST camp than the WS-* camp, and since I love an underdog, I'll side with schema just for the halibut. It's probably more correct to say that I haven't made up my mind w/r/t schema than I believe there is some value - my 'pro' stance on schema is a gut feel - I've seen no significant evidence to sway me either way.

I definitely feel like schema has gotten a bad rap from it's association with WS-*. Well-deserved, to some extent; XML schema is overly complex, resulting in folks having differing interpretations of some of the structures; WSDL itself is so overly indirect it's horribly un-DRY, while on the other hand being terribly brittle at points like hardcoding URLs to your service in the document.

So? Yeah, they got it wrong. They got a lot of stuff wrong. Anyone consider we might be throwing the baby (schema) out with the bath water (WS-*)?

I'd say my general reaction to the typical RESTafarian's rant against schema is that it's misplaced, guilt-by-association.

Let's look at some of Steve's arguments:

"After all, without such a language, how can you generate code, which in turn will ease the development of the distributed system by making it all look like a local system?"

That's making a bit of a leap - "making it all look like a local system?". That's pretty much what the WS-* world did. Why would he think someone who is pro-schema would necessarily want to repeat the mistakes of the WS-* world? I don't want to!

Anyone who's done any amount of client-side HTTP programming is going to realize there's a fair amount of boiler plate-ish code here. Setting up connections, marshalling inputs, making requests, demarshalling outputs, etc. Want caching? How about connection pooling? You're going to be writing a little framework, my friend, if you don't have one handy. And once you've done that, perhaps you'd like to wrapper some of the per-resource operation bits in some generated code, just to make your life easier. And note, I count things like dynamic proxies as code generation; it's just that isn't static code generated by a tool that you have to re-absorb back into your application.

And guess what? Your generated code doesn't have to hide the foibles of the network, if it doesn't want to. And as Steve implies, it shouldn't.

"trying to reverse-map your programming language classes into distributed services, such as via Special Object Annotations, is an attempt to turn local design artifacts into distributed ones"

Again, no. Schema and "hiding remoteness" are two separate things, they aren't necessarily related.

Similarly, with REST, you read the documentation and you write your applications appropriately, but of course the focus is different because the interface is uniform.

The documentation? What documentation? I'm picturing here that Steve has in mind a separate document (separate from the code, not generated from the code) written by a developer. But human generated documentation like this is still schema, only it's not understandable by machines, pretty much guaranteed to get out of sync with the code, probably incomplete and/or imprecise. Not that machine generated schema might fare any better, but it couldn't be any worse.

But there are more problems with this thought. The notion of hand-crafted documentation for an API is quaint, but impractical if I'm dealing with more than a handful of APIs. In fact, the "uniform interface" constraint of REST pretty much implies you are going to have a greater number of "interfaces" than had you defined the functionality as a service (or set of services) in WS-*. Though presumably each REST interface has a smaller number of operations. Of course it would be nice to have this documentation highly standardized (at least given a related 'set' of documented interfaces), available electronically, etc. I don't see that happening with documentation generated by a human.

Another problem here is that although the "uniform interfaces" themselves will be generally easy to describe - "GET returns an Xyz in the HTTP response payload" - the format of the data is certainly not uniform. Is this sufficient documentation for data structures sent in HTTP payloads? It's not for me. Documenting data structures 'by example' like that seems to be the status quo, and is of course woefully inadequate.

Lastly, I'll point out that I think the primary reason for having some kind of schema available is specifically to generate standardized, human-readable documentation. But not the only one. I think there are opportunities to have have machine-generated 'client libraries', machine-generated test suites, many-flavored documentation (PDFs, web-based, man pages, context-assisted help in editors, etc), validators, assembly tools for use by non-programmers (think Yahoo! Pipes), etc.

What the proponents of definition languages seem to miss is that such languages are primarily geared towards generating tedious interface-specific code, which is required only because the underlying system forces you to specialize your interfaces in the first place.

Right. Except for the data, which will be different; and URL template variables. Don't forget that sometimes you'll be needing to be setting cache validators up; maybe some special headers. Face it, there's plenty of tedious code here, especially if you're making more than onesy-twosey requests. Especially if you're in a low signal-to-noise ratio language like Java. Tedious code is no fun. Why not framework-ize some of it?

Summary

Even though I have this feeling that schema will be of some value in the REST world, I actually welcome having to have an argument about it. Absolutely, assume we don't need it until we find a reason that we really need it. We haven't yet hit the big time with REST, to know if we'll need it or not. When we see APIs on the order of eBay's, only REAL REST, then we'll know REST will have hit the big time.

For right now though, none of the anti-schema arguments I've ever heard is very compelling to me.

In the meantime, it's also refreshing to see folks experimenting with stuff like WADL. If we end up needing / wanting schema, it would be nice to have some practical experience with a flavor or two, for REST, before trying to settle on some kind of universal standard.

11 comments:

Unknown said...

Patrick: schema != interface definition language. They are not equivalent. My blog post is about interface definition languages, and the word "schema" does not even appear in the post, but your entire criticism of what I wrote is based on twisting it as if it's about schema. You've even used quotes that were specifically directed at interface definition languages and taken them out of context to try to pretend they were written about schema. I fail to see how you can argue against things I did not say.

Patrick Mueller said...

I certainly am not intending to twist your words.

I would certainly agree that there is a lot of confusion, in both terminology, capabilities, and expectations in the IDL / schema arguments for REST. I conflate the two. My concern is meta-data - being able to get a description of my services, and the data they operate on. Which I refer to generically as schema. Bad terminology on my part, I guess - schema is perhaps too closely associated with just data formats. So, my bad; replace my usage of 'schema' with 'meta-data'; am I still twisting your words?

If you think schema (presumably data formats) is important, then I'll claim that that's just half the picture - how I can get the data, how I can update the data is similarly important to be able to describe, for the same reasons describing data is important.

Unknown said...

I just don't think you can conflate data definition and interface definition to make your argument. The former is clearly required -- I would have hoped it was entirely needless to say that, but I guess not -- and is in fact an important part of REST, in the form of media types or MIME types. The latter, however, is not required for REST because the interface is uniform.

Patrick Mueller said...

There are plenty of people who will claim the data doesn't need to be described, or are satisfied with an overly generic description of the data - text/xml, for instance. Or the more extreme view of the only reasonable data for REST is HTML because that's the only format that supports links (add structure w/microformats). So I go defensive when I talk about 'describing services' and include the data as well. Just in case.

The world is an opinionated place. :-)

w/r/t uniform interface - I think you can often logically consider most of the operations against a resource to deal with some more or less fixed representation of the resource - ie, I can do a PUT on something that I GET. Same representation used for both. And DELETE probably doesn't take a payload in either direction. But what about POST, the wildcard verb of HTTP? What about a resource that doesn't support DELETE - is there some way to identify that in my service implementation to signal the intention that it's not supported? Of course I can presumably issue an OPTIONS on the resource (I think that's how you'd do that) to figure out if DELETE was supported at runtime, but perhaps it would be nice to know this at development time. Last example, there is a common pattern of posting to a collection resource, which causes a 'child' to be created as a separate resource - eg, AtomPub. Wouldn't it be nice to be able to identify such a relationship?

Where this gets really hairy is the last example; typically a POST like that might have it's response be a redirect instead of an actual representation. Dealing with descriptions of data flowing over the wire is fairly straight-forward. Dealing with indirection of URLs is a bit harder. How do I 'describe' that?

Integral ):( Reporting said...

Patrick:

I like a lot the points that you are making. I think at this point arguing that a uniform interface is enough to interact with a resource is undefendable. Now whether you express this interface with a schema or an IDL, does not really matter.

What HTTP gives you is a really cheap:
- (instance level) directory service (no more look-ups), which can be used up to the correlation level
- free middleware including great scalable activation capabilities (using web servers) and great intermediation capabilities (such as caching)

What is not so great is:
- the quality of service offered and as people have pointed out,
- the frameworks are a bit under-developed.
- asynchronous interactions and events (it's be interesting to see how Steve mix Erlang and HTTP, since Erlang is asynchronous in nature and does not have any "resource" concept and HTTP's sweet spots are just the opposite)

I would not be surprised in the end if you fix these issues is that you come back on your fit and build something just as complex as WS-*

Cheers,

JJ-

Patrick Mueller said...

JJ,

I would not be surprised in the end if you fix these issues is that you come back on your fit and build something just as complex as WS-*

Well, if that were to happen, then I guess we might as well have stuck with WS-*, eh?

In other words, I don't believe it for a minute. :-) I think providing more meta-data around REST services can provide a lot of value with a low cost. That's really all I'm looking for in REST (at the moment), and it's purely an opt-in action.

QoS isn't an issue I've heard about, unless you mean general QoS issues with TCP (and HTTP). There are some common REST patterns which can contain QoS issues like lost connections, etc.

w/r/t Erlang and async; given async, you can implement sync, but the opposite is problematic. Erlang excels at multiprocessing via it's (largely) green thread model; single-assignment variables and async fit this model pretty well. For another example, see the Twisted framework for Python.

Integral ):( Reporting said...

Patrick:

yes, I definitely think that REST would benefit from more metadata. I like the "assembly". Metadata can be used for other things than "generating code", assembling being one of them.

IMHO a general resource API should at a minimum contain:
- QBEs (multiple URI point to the same resource, based on the information you have)
- inter-actions (bi-directionality is critical here for "assembling")
- events (express the occurrence of a state)

REST has found a really smart way to deal with the QBE section of the interface. The URI space can be extended infinitely without breaking existing relationships.

My issue is really the bi-directionality and the events.

Events are critical for REST because you get representations of a resource, you need to be informed in some ways when the representation becomes stale.

Now, I really think REST can be extended without breaking its benefits.

HTTP could be fairly easily extended to deal with events since when a resource passes a representation to a client it also knows which events this representation will subscribe to. All it needs is a subscription end-point on the client. That can't be hard to standardize and implement.

In terms of inter-action, there are no problems in "declaring" an inter-action interface (I like to call it a "surface" if you know what I mean) since this is already how the programming model works. Inter-action exist with or without REST. All we need is a standardization a la href to make it work with software agent clients.

So overall, I don't really understand all these discussions. Everybody will win to establish a better alignment between REST (the architecture of the web) and programming model (the architecture of enterprise information systems).

I am not fully qualified to say if the QoS will result in as much work as WS-*, so I won't argue this point.

Cheers,

JJ-

Integral ):( Reporting said...

BTW, I really meat QoS as in security, transaction, reliability...

Patrick Mueller said...

JJ,

cache validators (if-match, if-modified-since headers) can help identifying 'stale' data, kind of. More of an optimization for non-stale data, but also note these validators can be used on PUT to prevent the multi-update problem (two people updating a resource at the same time).

And folks are looking to Atom to provide some of the additional notification issues.

And then there's comet as well (persistent client)

Of course, none of this is 'realtime', and even comet is a bit of a hack. For something closer to 'realtime' behaviour, look into XMPP.

Unknown said...

Again, my point is that you criticized my posting based on pretending that it talks about data schema, when in reality it talks only about interface definition languages. And I'll stress again that those are two very very different things. For you to say, "Well, I was talking about metadata, and that's the basis I used to criticize your posting" is questionable, because you're then writing about a whole different topic which I did not cover.

I don't mind criticism; in fact I'm quite used to it. I prefer thoughtful objective criticism, since it helps me learn, and of course your posting is indeed thoughtful and objective. But I always prefer criticism that actually targets what I wrote. Putting my name in a blog posting title, and then proceeding to talk about an entirely different topic, even "quoting" me on the topic I didn't write about, hardly seems fair, does it?

Now, as for the questions you ask in your latest comment, sorry to say but they're all indeed answered in real life by human-generated and human-readable documentation. Despite the fact that you poo-poo the idea of such documentation, it has to exist in some form, even something as simple as a README file, otherwise nobody can use your service. Having an IDL does absolutely nothing for you in this regard, because nobody can simply take an IDL or a WSDL, read it, and then know precisely how to invoke the service it describes. Yet you seem to claim this is possible. How?

I think I'll go blog about this.

Patrick Mueller said...

"Pretending"? Pretty harse. I meant no ill will; certainly, I didn't intend to intentionally misuse your quotes, nor do I think I actually did. In re-reading my original blog post (I haven't edited it), I certainly agree, now, that the word "schema" is a rather dumb synonym for me to have used for "meta-data" because of it's close association with data (ie, SQL and XML). I don't really distinguish between IDL, annotations, etc; it's all meta-data, and I think I've been referring to it, generically, as schema for a while now. That will change. :-)

However, I think the context of the post makes it clear I'm not talking about just data.

I'll save a response to your final point on documentation for your next blog post :-)