The latest version of xAI’s Grok can process images

xAI, the OpenAI competitor founded by Elon Musk, has introduced the first version of Grok that can process visual information. Grok-1.5V is the company’s first-generation multimodal AI model, which cannot only process text, but also “documents, diagrams, charts, screenshots and photographs.” In xAI’s announcement, it gave a few samples of how its capabilities can be used in the real world. You can, for instance, show it a photo of a flow chart and ask Grok to translate it into Python code, get it to write a story based on a drawing and even have it explain a meme you can’t understand. Hey, not everyone can keep up with everything the internet spits out.

The new version comes just a couple of weeks after the company unveiled Grok-1.5. That model was designed to be better at coding and math than its predecessor, as well as to be able to process longer contexts so that it can check data from more sources to better understand certain inquiries. xAI said its early testers and existing users will soon be able to enjoy Grok-1.5V’s capabilities, though it didn’t give an exact timeline for its rollout.

In addition to introducing Grok-1.5V, the company has also released a benchmark dataset it’s calling RealWorldQA. You can use any of RealWorldQA’s 700 images to evaluate AI models: Each item comes with questions and answers you can easily verify, but which may stump multimodal models like Grok. xAI claimed its technology received the highest score when the company tested it with RealWorldQA against competitors, such as OpenAI’s GPT-4V and Google Gemini Pro 1.5.

Go to Source