OpenCLIP (OpenAI)

how totransforms Image Classification into a Text-Image similarity matching task. - youtube / github

C++

CLIP inference in plain C/C++ with no extra dependencies

Features

Dependency-free and lightweight inference thanks to ggml.
4-bit, 5-bit and 8-bit quantization support.
Support inference with text-only, vision-only and two-tower model variants. It might be relevant to use a single modality in certain cases, as in encoders for large multimodal models, or building and/or searching for semantic image search.
Dependency free Python binding without relying any large third-party packages. No need for Numpy, TensorFlow, PyTorch, ONNX etc. In fact, nothing more than the standard Python library.

Use image-search-build to build the database of images and their embeddings beforehand. Currently it does not support updating.

Use image-search to search for indexed images by semantic similarity.

$ pip install open_clip_torch