Cloudy Journey: Stronger, Faster, Better: How Puppet Labs Sped up Puppet Enterprise [feedly]

Shared via feedly // published on Puppet Labs // visit site

As we prepared for the Puppet Enterprise 3.0 launch, a lot of talk among engineers and internal users centered on how much faster the new release was running.

Puppet Enterprise 3 speeds some operations by 200 percent

Performance testing indicated agent run times had sped up almost 200 percent. On top of the speed improvement, scalability gains allowed a single Puppet Enterprise 3.0 installation to manage double the number of nodes handled by Puppet Enterprise 2.8.

According to Deepak Giridharagopal, Puppet Labs' director of engineering, the improvements come from a number of optimizations and refactorings that have taken place across the product over the past year.

"When you take all of them together, no one individual piece provides that notable a performance improvement," he says. "They're all small, but they're all very additive, so the sum is truly greater than the individual parts."

According to Deepak, the performance and scalability improvements come from three key areas:

An upgrade to Puppet, the core technology for Puppet Enterprise
An upgrade to the version of Ruby shipped with Puppet Enterprise
The introduction of Puppet Enterprise's centralized storage service, PuppetDB

Upgrading Upstream Puppet

As the core technology for Puppet Enterprise 3.0, Puppet 3.2 can take a lot of credit for improvements to performance.

"Our engineers put in a ton of work over the past year to radically speed things up," says Deepak.

"It boils down to a lot of profile-directed optimization. We take the Puppet source code, we run through a bunch of input data, then we time it, we profile it, we try to find where the hotspots are, then we [optimize] the biggest bottleneck that we see."

Those sorts of optimizations can be a battle for inches. Deepak says just optimizing the way data types are used and processed within Puppet's code netted performance benefits.

"We normalized a bunch of data types within the codebase," he notes, "which eliminated the need to do costly data type conversions — converting from a string to a symbol or converting arrays back and forth. Making sure we're using the same types of data structures throughout the codebase reduces a lot of work and helps improve its robustness."

Another key area where data handling improvements have helped Puppet performance involve serialization. Puppet relies on both JSON and YAML, and Puppet Labs engineers worked to optimize serialization to both formats.

"Any data that goes over a network boundary in Puppet tends to be in one of those two data formats," says Deepak, so speedups there have translated to better performance throughout Puppet Enterprise 3.0.

Some improvements to Puppet 3 have also involved dropping dead weight, such as the formerly useful but very heavy ActiveSupport library.

"It was code that wasn't being used, we weren't relying on those features anymore, and it had a performance penalty, so we got rid of that dependency."

Deepak says engineers sometimes also felt as if they were working at odds with Ruby itself.

"We did a lot of work to refactor and eliminate code that allocates objects internal to the Puppet codebase. Object allocation and garbage collection aren't something Ruby's particularly fast at, relative to a lot of other programming languages, so that's something that had a pretty measurable impact on bottom-line speed."

Improving Puppet Code

Puppet Enterprise depends on not just the upstream Puppet codebase, but modules written with the Puppet language, so that code came under scrutiny as well.

"There are a number of modules that set up Puppet Enterprise itself, as well as automate configuration of certain key components, like the Puppet agent and MCollective [the orchestration engine for Puppet Enterprise]," says Deepak.

"We went through those and we did an optimization pass to try and find what modules were responsible for the most chatter — talking between clients and servers," he says.

"We were able to collapse down a lot of network communication. Instead of 'every agent talks over the network 15 times 20 times,' we could collapse that down to a single request that batches up a lot of info."

The decrease in chattiness has helped not only with performance, but scale.

"By reducing the number of connections the master needs to handle for file transfers and things like that, the master is more easily able to keep connections free for new agents that need to check in. It helps the overall liveness of Puppet Enterprise," says Deepak.

Upgrading Ruby

Much of Puppet Enterprise is written in Ruby — it ships with a vendored version of the language — so improvements to Ruby contribute to performance improvements in the product.

With an upgrade to Ruby 1.9, Deepak says Puppet Enterprise 3.0 benefits from improvements to garbage collection and other foundational capabilities that Ruby itself made between versions 1.8 and 1.9.

It's also a great example of open source, community-driven development providing real value to enterprise users, since support for Ruby 1.9 in the underlying Puppet core was vetted by open source developers and users before making its way to Puppet Enterprise 3.0.

"Our large community has been really helpful finding issues and bugs, making Puppet work on this more recent version of Ruby," he says.

Introducing PuppetDB

A lot of the improved performance and scalability Puppet Enterprise users have noticed come from additive, incremental optimizations. For users who have been using stored configurations via the storeconfigs configuration option, though, a much more dramatic performance increase is introduced with Puppet Enterprise 3.0. This latest update now uses PuppetDB, the centralized storage service for Puppet Enterprise.

PuppetDB is something Deepak's familiar with. In addition to his duties as Puppet Labs' director of engineering, he's also a lead engineer on the PuppetDB project.

"This is a huge improvement," says Deepak. "In the user community, we've seen anywhere from between one and three orders of magnitude improvement in things like compilation time and catalog storage time with the storeconfigs feature enabled."

PuppetDB works within Puppet Enterprise to persist compiled catalogs, facts and reports, and to provide a fast query engine.

"Query times in many cases drop from taking seconds, ten of seconds, or minutes down to taking milliseconds," says Deepak.

"It also provides faster inventory API than the older version of Puppet Enterprise. So now if you're using the console or using the Puppet inventory service to programmatically integrate Puppet with your other systems, your CMDBs, monitoring systems, they're considerably faster as well because that API is also backed by that really optimized PuppetDB."

PuppetDB is still only part of the Puppet Enterprise 3.0 story. According to Deepak, there's a larger story of collaborative development backed by a large, enthusiastic user base with active contributors.

"Our community took all these fixes and performance improvements," he says. "They tried them out, they vetted them, they reported bugs. Now that it's stable enough, I'm really happy to see it in the hands of our commercial customers."