Microsoft once used its own brand of 'Lego' to optimize Windows

4 hours ago 17

OFFBEAT

Making software feel snappier when you only have 12 MB RAM

People of a certain age sometimes like to reminisce about how software in the old days was somehow more responsive and more efficient on far less powerful hardware. Microsoft's approach was to take its software binaries and optimize the heck out of them.

Former Microsoft engineer Dave Plummer spilled the beans on the practice, confirming that the company used an internal application called Basic Block Tool (BBT) – known internally as Microsoft Lego – to shuffle the internals of binaries to speed execution.

Plummer's recollections go back to the '90s, when his first NT development system ran on a paltry 12 MB of RAM, but software was relentlessly growing in size. A binary might have 10 MB of code, but the startup path only needed 300 KB of it.

"But if those 300 KB are sprinkled like Parmesan across 10 MB of binary, then the loader and the memory manager have to touch far more pages than the actual executed code would suggest," Plummer said.

And if a trip to disk was needed to page the code in and out, the performance impact could be disastrous.

Hence BBT, through which Microsoft ran a binary and came up with something that was functionally the same, but a good deal more performant. The binary was effectively defragmented as related code was lumped together.

Similar techniques have, of course, persisted even as computational power has increased. BOLT, for example, can speed up large applications by optimizing the layout of binaries. Then there was HP's Dynamo [PDF], which could optimize code at runtime.

This approach is not without risk. Tinkering with a binary is not for the faint of heart, but Microsoft had an incentive to wring every last bit of performance from systems.

"Windows and Office were large native code products running on constrained machines, and the wins were user-visible," Plummer explained.

"If you could reduce the number of pages touched during boot or shell startup, users felt it. If you could make common application paths fit into fewer memory pages, multitasking got better.

"If you could keep hot code out of the swap file, the whole system felt less like it was dragging a refrigerator through wet cement."

As with Raymond Chen's recent war story regarding binary translation and code rerolling at Microsoft, Redmond's engineers were laser-focused on performance.

Whether that same focus survives in some of today's software is another matter. Plummer thinks his past efforts remain applicable.

"Modern software has the same problem at a different scale," he said. "The binaries are much larger. The services are distributed. The frameworks are deeper. The machines are faster, but the dependency graphs are absurd.

"And we still discover over and over again that locality matters as it always does. So put the hot data together. Put the hot code together. Keep the common path small. Push rare paths away.

"Don't make the CPU fetch a haystack when it only needs the needle." ®

Read Entire Article