Eliminate decoding from code cache for trace building

I believe this was PR 307388: "don't build entire mangled trace in parallel w/ unmangled"

Originally we decoded blocks from the code cache for the execute step of trace building (create_private_copy() used to call decode_fragment()) as well as the append step (extend_trace()). I believe this was to have the mangled version and not have to re-mangle. Yet we later had to add mangling functionality over traces to handle clients changing code.

Decoding from the cache is complex and imposes a long list of limitations on blocks: single bitwidth; no code beyond ctis; etc. Search the code base to see many restrictions imposed purely because we can't figure things out when decoding.

To improve the client interface, we switched create_private_copy() to use build_basic_block_fragment() (as a private copy to solve #940). So we have an unmangled instrlist. Yet when we go to append the block to the trace, we still go and decode from the cache. The proposal here is to use the instrlists we're re-creating for each block and building up and completely eliminate decoding from the cache.