Progress Report


๐Ÿ› ๏ธ Devlog Update โ€“ October 10, 2025


๐Ÿ–ผ๏ธ Image Injection Live on Front End

The front-end UI now supports image injection, enabling real-time image input for vision-capable models. This lays the groundwork for upcoming features like local image analysis, tagging, and visual reasoning workflows.


๐Ÿ” Backend Research โ€“ Qwen 2.5 Omni + llama.cpp GPU/vision

We're actively exploring llama.cpp builds that support both:


GPU acceleration (ideally via cuBLAS or Metal)


Vision input (multi-modal / image-token streaming)

Goal is to find a runtime that can run Qwen 2.5 Omni or similar locally with full image comprehension.


If compatibility doesn't exist yet, fallback options include:


Testing with alternative vision models (e.g., MiniCPM-V or Qwen-VL variants)

Isolated multimodal inference pipeline outside llama.cpp, piped in 

More soon.

Leave a comment

Log in with itch.io to leave a comment.