Add test for sequence parallel
Created by: suchenzang
Right now, we have the option to use sequence parallel via the --sequence-parallel
flag: https://github.com/facebookresearch/metaseq/blob/a6ef598cc7b4dac394ba2eab5d0e75ca27a9e8c0/metaseq/modules/transformer_decoder_layer.py#L210-L223
Now that https://github.com/facebookresearch/metaseq/issues/578 is completed, we should add a test here to check rough equivalence between going through the sequence-parallel code-path (say, with MP 2) vs the current non sequence-parallel run included in the unit tests for the 8M model.