Import a Megatron-LM or HuggingFace OPT/GPT2 model file(s)
Created by: yuhaohaoyu
❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
What is the most straightforward way import a small GPT2 or OPT models produced by Megatron-LM or HuggingFace, into Metaseq model checkpoints?
Reason: we have found a nice metaseq branch, https://github.com/facebookresearch/metaseq/tree/cuda_graph_incremental_decoding , that has a nice case for Cuda Graph application and we want to verify the speedup effort over other models.
Where stuck: There are almost use-ready tools to convert models hosted at Huggingface to be loaded by Megatron-LM inference examples. But we found that the hyper-argument/parameters stored in the model checkpoint files (used by Metaseq) are massively different from those in Megatron-LM model checkpoint files.
Kind of looking for: if you have some tools (automatic or semi) that bridge the difference of the hyper-parameters of Metagron-LM and Metaseq checkpoint files.
Code
What have you tried?
- Used
https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py : convert_checkpoint_from_transformers_to_megatron()
to convert a huggingface gpt2 345m model file to 4-way megatron checkpoint files. Hacky from time, but eventually polled through. - Load the converted checkpoint files using an example script from metagron-lm repo: examples/run_text_generation_server_345M.sh
- Trying to load it via https://github.com/facebookresearch/metaseq/blob/1de510e3b714384d4ebaf9782216e45e361dbaab/metaseq/cli/interactive_hosted.py , and end up seeing metaseq is expecting many more hyper-parameters in the checkpoint files.
What's your environment?
- metaseq Version (e.g., 1.0 or master): master
- PyTorch Version (e.g., 1.0): 1.13.0.dev20220926+cu117
- OS (e.g., Linux): Linux
- How you installed metaseq (
pip
, source): pip install -e . (followed https://github.com/facebookresearch/metaseq/blob/main/docs/setup.md) - Build command you used (if compiling from source): N/A
- Python version: 3.8.13
- CUDA/cuDNN version: 11.7
- GPU models and configuration: A100-80GB, Driver Version: 515.65.07
- Any other relevant information: