Wednesday, July 20, 2011

speculative execution

Running some benchmarks of hadoop using teragen/terasort. One of the recommendations I was given was to disable speculative execution. Noticed something rather strange when I forced it to disabled in the config.

Runtime with speculative execution: 18.5 minutes
Runtime without speculative execution: 1 hour

Seems that 2-3 map tasks are taking longer than the rest.

Question now is: why. Each map task is responsible for generating the same % of data - why would speculative execution make the job run quicker. Does this point to hardware differences ( if so, the slow tasks are on different machines - I have not noticed a pattern yet ), configuration problems elsewhere, or just random bad luck.