Skip to content

Releases: mistralai/mistral-common

Patch release v1.4.2

18 Sep 12:46
992f4a0
Compare
Choose a tag to compare

Make sure to send user agent for downloading pictures that require a user agent. E.g.:

from mistral_common.protocol.instruct.messages import (
    UserMessage,
    TextChunk,
    ImageURLChunk,
    ImageChunk,
)
from PIL import Image
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

tokenizer = MistralTokenizer.from_model("pixtral")

url_dog = "https://picsum.photos/id/237/200/300"
url_mountain = "https://picsum.photos/seed/picsum/200/300"
url1 = "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg"
url2 = "https://upload.wikimedia.org/wikipedia/commons/7/77/002_The_lion_king_Snyggve_in_the_Serengeti_National_Park_Photo_by_Giles_Laurent.jpg"


# tokenize image urls and text
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="Can this animal"),
                    ImageURLChunk(image_url=url1),
                    TextChunk(text="live here?"),
                    ImageURLChunk(image_url=url2),
                ]
            )
        ],
        model="pixtral",
    )
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images

# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))

What's Changed

New Contributors

Full Changelog: v1.4.1...v1.4.2

Patch release v1.4.1 - Use cv2 resize instead of PIL

17 Sep 08:44
bae45b2
Compare
Choose a tag to compare

cv2 resize gives significantly better results when running pixtral in inference as compared to PIL hence we're making a patch release to resize images using cv2 as shown here: bae45b2

v1.4.0 - Mistral common goes 🖼️

10 Sep 22:44
7b88116
Compare
Choose a tag to compare

Pixtral is out!

Mistral common has image support! You can now pass images and URLs alongside text into the user message.

pip install --upgrade mistral_common

Images

You can encode images as follows

from mistral_common.protocol.instruct.messages import (
    UserMessage,
    TextChunk,
    ImageURLChunk,
    ImageChunk,
)
from PIL import Image
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

tokenizer = MistralTokenizer.from_model("pixtral")

image = Image.new('RGB', (64, 64))

# tokenize images and text
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="Describe this image"),
                    ImageChunk(image=image),
                ]
            )
        ],
        model="pixtral",
    )
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images

# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))

Image URLs

You can pass image url which will be automatically downloaded

url_dog = "https://picsum.photos/id/237/200/300"
url_mountain = "https://picsum.photos/seed/picsum/200/300"

# tokenize image urls and text
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="Can this animal"),
                    ImageURLChunk(image_url=url_dog),
                    TextChunk(text="live here?"),
                    ImageURLChunk(image_url=url_mountain),
                ]
            )
        ],
        model="pixtral",
    )
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images

# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))

ImageData

You can also pass image encoded as base64

tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="What is this?"),
                    ImageURLChunk(image_url="...
Read more

Patch release 1.3.4 - Loosen pydantic requirement

15 Aug 10:11
Compare
Choose a tag to compare

In this patch release the pydantic requirement is loosened to be <= 3.0.0

as noticed in multiple issues, e.g.:

Tekkenizer

18 Jul 14:01
Compare
Choose a tag to compare

Tekkenizer

The new Tekkenizer class is based on Open AI's tiktoken and supports the new Mistral-Nemo model.

Tekkenizer always makes use of version 3 or higher.

Examples:

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

tokenizer = MistralTokenizer.v3(is_tekken=True)
tokenizer = MistralTokenizer.from_model("...")

Function calling (just like before)

# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "..."

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

What's Changed

Full Changelog: v1.3.0...v1.3.1

Patch release: Fix FIM tokenizer

30 May 10:37
Compare
Choose a tag to compare

As noticed here: https://huggingface.co/mistralai/Codestral-22B-v0.1/discussions/10

The wrong tokenizer was used for FIM. This patch release fixes that so that the following works correctly:

from mistral_common.tokens.tokenizers.base import FIMRequest
from mistral_common_private.tokens.tokenizers.mistral import MistralTokenizer
tokenizer =  MistralTokenizer.v3()
tokenized = tokenizer.encode_fim(FIMRequest(prompt="def f(", suffix="return a + b"))
assert tokenized.text == "<s>[SUFFIX]return▁a▁+▁b[PREFIX]▁def▁f("

mistral-common v1.2.0

30 May 09:36
Compare
Choose a tag to compare

Fill-in-the-middle (FIM) with [SUFFIX] and [PREFIX] logic is added to allow building code completion workflows such as:

from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.instruct.request import FIMRequest

tokenizer = MistralTokenizer.v3()
model = Transformer.from_folder("~/codestral-22B-240529")

prefix = """def add("""
suffix = """    return sum"""

request = FIMRequest(prompt=prefix, suffix=suffix)

tokens = tokenizer.encode_fim(request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])

middle = result.split(suffix)[0].strip()
print(middle)

mistral-common v1.1.0

29 May 14:26
19ec537
Compare
Choose a tag to compare
  • Adds improved function calling validator
  • Adds improved fine-tuning assistant message