Add zero-masking vs merge-masking to IR and handle in scatter expansion

Today the AVX-512 EVEX.z bit ({z} in assembler syntax when set to 1) controls whether zero elements in the mask are zeroed or not in the output. This is not represented in DR's IR, nor is zero-masking handled in scatter-gather expansion. This issue covers addressing both.

Generally, we want the IR to be an abstraction, mapping ISA encoding details into general concepts. The various prefixes in x86 map into operand size differences or opcode differences today. This zero-masking prefix bit doesn't affect just one operand: it affects the operation. The precedent there is to split the opcode. We have separate opcodes for OP_rep_movs vs OP_movs. This is similar to the "sub-opcode" numeric values indicating behavior in ARM (see the discussion at https://github.com/DynamoRIO/dynamorio/pull/4386#discussion_r462356954 regarding opcode philosophy of separate opcodes for separate semantics; see also #4388).

We do expose some x86 encoding prefixes in instr_t.prefixes today but they generally do not change the semantics enough that most tools, including taint-tracking tools, can completely ignore them. Part of me wants to get rid of instr_t.prefixes entirely (other than predication), so that tools don't have to worry about this separate set of flags controlling behavior, and split up opcodes where behavior differs. But if we did that for the zero-masking we may have a huge number of split opcodes and compatibility issues since the existing opcodes have been public for a while. So maybe embracing the prefixes is an ok solution for these behavior changes that apply to many different opcodes. I would name this something like PREFIX_MASK_ZERO though to try to be cross-platform if another ISA had the same concept, and not put anything about "EVEX" in there.