Sunday, August 5, 2012

Why Puppet has its own configuration language

Why Puppet has its own configuration language:
I was O’Reilly’s Velocity conference back in June, giving a talk on hacking Puppet, and Puppet’s configuration language came up a lot. Most people love the language and find it the simplest way of expressing their configurations, but some are frustrated by how simple it is and wish they had a full Turing-complete language like Ruby for specification. I thought it would be worthwhile to discuss why Puppet has a custom language, and dive into some of the benefits and costs.

Puppet wasn’t the first configuration management tool to have a custom language, of course—Cfengine was founded 12 years before Puppet and had its own language (and in fact has a new one), and SmartFrog and Quattor also both have one. All of these languages are examples of what the programming world calls a ‘domain specific language’, or DSL, a language tailored to the needs of a specific problem domain. There are scads of DSLs out there we use but maybe don’t think of as DSLs—for example, regular expressions in Perl are essentially their own DSL within Perl, Bash is its own DSL for command execution, and HTML is a DSL for formatting text.
So, the idea of using a custom language is largely uncontroversial. Most geeks use them all the time, and rarely think twice about it. You could use a Turing-complete language like Ruby or Perl to format text, like HTML does, but not many of us want to.
I haven’t done any kind of high-level analysis of why people create DSLs—I’m just a sysadmin, not a scientist—but they seem to largely come down to two reasons: compression and simplicity.

Specific languages for specific problems

Compression is about getting a lot of work done in a very small amount of code, relative to what it would take with a general-purpose language like Ruby or Perl (or C). Yes, you can use plain Perl to do search and replace in strings, but regexes in Perl are about 1000 times (I made that number up) shorter than the non-regex code necessary to do the work.
Simplicity is about making otherwise-complex problems more approachable, and the best way to do so is to reduce the available problem domain. If you look at the ‘make’ tool, it has a fantastically simple DSL that exists just to build a graph of shell scripts to run for building code. If it could do more than that, build systems would quickly become too complex (and they’re complex enough in ‘make’), so it reduces capability down to the absolute minimum and punts everything else to the shell.
The downside of a DSL is that they’re not as powerful as a fully-featured general purpose language, almost by definition. They’re often not Turing-complete, but they generally have very clear guide rails that restrict what you can do.
So, the trade-off in a DSL is that you can do more work in less code, and it’s generally far simpler and more approachable to the average user, but it’s also less powerful, which means there are some problems they can’t be used to solve. There are some classes of users who will never accept them as a result.
As you can see, this trade-off does a great job of describing the benefits of Puppet’s DSL. You don’t have to be an expert, or even a programmer, to use it effectively, and you can specify some very complex configurations in a small amount of code, but it’s almost entirely limited to specifying resources and the relationships between them. If your problem domain is outside of that, then Puppet struggles.

Declarative resources, not procedural code

One of the benefits of Puppet’s DSL—beyond the simplicity—is that it encourages the mental shift that Puppet requires. To use Puppet effectively, you need to think in resources, not files or commands. If you wrote your configurations in Ruby, you could easily just open files and run commands all the live-long day, but with the DSL, you have to learn to think in resources. Like regexes, this shift is a challenge at first, but it’s powerful once you make it.
Another big benefit of the DSL is that you can do much more analysis of your code than you could with a traditional language. If you want to understand configurations written in a non-domain language, such as Ruby or perl, your choices are to read all of them or to do static analysis, which is fantastically complicated. With Puppet, we can parse all of the code and easily analyse the parse trees in memory. There are simple APIs for getting you a list of every class, type, module, or anything else you want to know about your configuration.
Lastly, because all work goes through the DSL, we can make a lot of guarantees that would be impossible without it. For instance, if you edited a file directly, or ran a command, using a normal language, Puppet would know nothing about it—thus, no logs, no ‘noop’ mode, and no control. Because of the DSL, we can make promises that every change Puppet ever makes will be logged, and that if you run Puppet in noop mode, no changes at all will be made.
Hopefully this has convinced you that Puppet’s DSL is the right balance between simplicity and power, and that in general DSLs are a powerful way to simplify what would otherwise be very large problems. However, in the odd chance you’re still skeptical, you can also try out Puppet’s Ruby DSL. In fact, we currently have a Google Summer of Code project focused on this right now. Our community tends to prefer the native language, but the pure Ruby interface is there (with all its benefits and costs) for those who want it.
Learn More:

No comments:

Post a Comment