Querying text models
Fireworks.ai offers an OpenAI-compatible REST API for querying text models. There are several ways to interact with it
- The Fireworks Python client library
- The web console
- LangChain
- Directly invoking the REST API using your favorite tools or language
- The OpenAI Python client
Using the web console
All Fireworks models can be accessed through the web console at https://app.fireworks.ai/. Clicking on a model will take you to the playground where you can enter a prompt along with additional request parameters.
Non-chat models will use the completions API which passes your input directly into the model.
Models with a conversation config are considered chat models (also known as instruct models). By default, chat models will use the chat completions API which will automatically format your input with the conversation style of the model. Advanced users can revert back to the completions API by unchecking the “Use chat template” option.
Using the API
Completions API
Text models generate text based on the provided input prompt. All text models support this basic completions API. Using this API, the model will successively generate new tokens until either the maximum number of output tokens has been reached or if the model’s special end-of-sequence (EOS) token has been generated.
Here are some examples of calling the completions API:
Chat Completions API
Models with a conversation config have the chat completions API enabled. These models are typically tuned with a specific conversation styles for which they perform best. For example Llama chat models use the following template:
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>{user_message_1} [/INST]
Some templates like llama-chat
can support multiple chat messages as well. In general, we recommend users use the chat completions API whenever possible to avoid common prompt formatting errors. Even small errors like misplaced whitespace may result in poor model performance.
Here are some examples of calling the chat completions API: