Skip to content

Process Visual Query

Vladimir Mandic edited this page Jun 24, 2024 · 2 revisions

Process/Visual query

Visual query subsection of the Process tab contains tools to use Visual Question Answering interrogation of images using Vision Language Models.

Currently supported models:

  • Moondream 2
  • GiT Textcaps
  • GIT VQA
    • Base
    • Large
  • Blip
    • Base
    • Large
  • ViLT Base
  • Pix Textcaps
  • MS Florence 2
    • Base
    • Large
Clone this wiki locally