Import a Megatron-LM or HuggingFace OPT/GPT2 model file(s)

Created by: yuhaohaoyu

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

What is the most straightforward way import a small GPT2 or OPT models produced by Megatron-LM or HuggingFace, into Metaseq model checkpoints?

Reason: we have found a nice metaseq branch, https://github.com/facebookresearch/metaseq/tree/cuda_graph_incremental_decoding , that has a nice case for Cuda Graph application and we want to verify the speedup effort over other models.

Where stuck: There are almost use-ready tools to convert models hosted at Huggingface to be loaded by Megatron-LM inference examples. But we found that the hyper-argument/parameters stored in the model checkpoint files (used by Metaseq) are massively different from those in Megatron-LM model checkpoint files.

Kind of looking for: if you have some tools (automatic or semi) that bridge the difference of the hyper-parameters of Metagron-LM and Metaseq checkpoint files.

Code

What have you tried?

Used https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py : convert_checkpoint_from_transformers_to_megatron() to convert a huggingface gpt2 345m model file to 4-way megatron checkpoint files. Hacky from time, but eventually polled through.
Load the converted checkpoint files using an example script from metagron-lm repo: examples/run_text_generation_server_345M.sh
Trying to load it via https://github.com/facebookresearch/metaseq/blob/1de510e3b714384d4ebaf9782216e45e361dbaab/metaseq/cli/interactive_hosted.py , and end up seeing metaseq is expecting many more hyper-parameters in the checkpoint files.

What's your environment?

metaseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0): 1.13.0.dev20220926+cu117
OS (e.g., Linux): Linux
How you installed metaseq (pip, source): pip install -e . (followed https://github.com/facebookresearch/metaseq/blob/main/docs/setup.md)
Build command you used (if compiling from source): N/A
Python version: 3.8.13
CUDA/cuDNN version: 11.7
GPU models and configuration: A100-80GB, Driver Version: 515.65.07
Any other relevant information: