Using the web console

All Fireworks models can be accessed through the web console at https://app.fireworks.ai/. Clicking on a model will take you to the playground where you can enter a prompt along with additional request parameters.

Non-chat models will use the completions API which passes your input directly into the model.

Models with a conversation config are considered chat models (also known as instruct models). By default, chat models will use the chat completions API which will automatically format your input with the conversation style of the model. Advanced users can revert back to the completions API by unchecking the “Use chat template” option.

Using the API

Completions API

Text models generate text based on the provided input prompt. All text models support this basic completions API. Using this API, the model will successively generate new tokens until either the maximum number of output tokens has been reached or if the model’s special end-of-sequence (EOS) token has been generated.

Llama-family models will automatically prepend the beginning-of-sequence (BOS) token (<s>) to your prompt input. This is to be consistent with the original implementation.

Here are some examples of calling the completions API:

from fireworks.client import Fireworks

client = Fireworks(api_key="<FIREWORKS_API_KEY>")
response = client.completions.create(
  model="accounts/fireworks/models/llama-v2-7b",
  prompt="Say this is a test",
)

print(response.choices[0].text)

Chat Completions API

Models with a conversation config have the chat completions API enabled. These models are typically tuned with a specific conversation styles for which they perform best. For example Llama chat models use the following template:

<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_message_1} [/INST]

Some templates like llama-chat can support multiple chat messages as well. In general, we recommend users use the chat completions API whenever possible to avoid common prompt formatting errors. Even small errors like misplaced whitespace may result in poor model performance.

Here are some examples of calling the chat completions API: