Integrate LucidRain's RotaryEmbeddings
Created by: suchenzang
And from PaLM paper:
We use RoPE embeddings (Su et al., 2021) rather than absolute or relative position embeddings, since RoPE embeddings have been shown to have better performance on long sequence lengths.