model_arch
: model architecture family, e.g. “llama”model_config_name
: concrete instantiation of model config, e.g. “v2-7b” for “llama”.
checkpoint_format
: must be “huggingface”use_huggingface_tokenizer
: can be set to true in order to user HuggingFace tokenizer instead of Fireworks implementation. It might be slightly slower, but is helpful if the model was fine-tuned with additional tokens.vocab_size
: the container knows model’s vocabulary size based on model_arch and model_config_name. However, if the model was fine-tuned using additional tokens the vocab_size might need to be increased. In this case specify the value matching HuggingFace’s config.json.