HomeAI 101™AI Vision Inspector
Intermediate22 min readAI 101™

AI Vision Inspector

Build AI systems that analyze, interpret, and understand visual data — from simple image descriptions to production-quality quality inspection systems.

Introduction

Computer vision is one of the most impactful applications of AI. From quality control in manufacturing to medical imaging, AI vision systems can analyze images with speed and consistency that surpass human inspectors.

In this guide, you will build a complete AI vision inspector that can describe images, detect objects, compare visuals, and perform automated quality checks.

How AI Vision Works

AI vision models process images through a multi-stage pipeline:

1

Image Input

The image is loaded and converted into a format the model can process (usually base64 encoding or a URL).

2

Feature Extraction

The model identifies low-level features (edges, shapes, textures) and combines them into higher-level concepts (faces, objects, scenes).

3

Semantic Analysis

The model understands what the objects are, their relationships, and the overall context of the image.

4

Structured Output

The analysis is returned as natural language descriptions, JSON data, or classification labels.

💡 Available Vision Models

GPT-4o (OpenAI) and Claude 3.5 (Anthropic) are the current leaders for general-purpose vision tasks. For specialized tasks like medical imaging or industrial inspection, fine-tuned models often outperform general models.

Setting Up

Install the required libraries for working with images and the OpenAI API:

bash
pip install openai pillow opencv-python numpy
python
import openai
import base64
import os
from dotenv import load_dotenv

load_dotenv()
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def encode_image(image_path):
    """Convert an image file to base64 for the API."""
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

Image Analysis

The simplest vision task is asking the AI to describe or analyze an image:

python
def analyze_image(image_path, question="What do you see in this image?"):
    """Analyze an image using GPT-4 Vision."""
    base64_image = encode_image(image_path)
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}",
                            "detail": "high"
                        }
                    }
                ]
            }
        ],
        max_tokens=1000
    )
    return response.choices[0].message.content

# Usage
result = analyze_image("photo.jpg", "Describe this image in detail.")
print(result)

Comparing Two Images

Vision models can compare multiple images, which is useful for quality control, before/after analysis, and change detection:

python
def compare_images(image_path_1, image_path_2, criteria):
    """Compare two images based on specific criteria."""
    img1 = encode_image(image_path_1)
    img2 = encode_image(image_path_2)
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": f"""Compare these two images.
Criteria: {criteria}
Provide a detailed comparison in JSON format:
{{"similarities": [...], "differences": [...], 
 "recommendation": "..."}}"""},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img1}"}},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img2}"}}
                ]
            }
        ],
        response_format={"type": "json_object"},
        max_tokens=1000
    )
    return response.choices[0].message.content

Object Detection

AI can identify and locate individual objects within an image with detailed descriptions:

python
def detect_objects(image_path):
    """Detect and describe objects in an image."""
    base64_image = encode_image(image_path)
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": """List every object you can 
identify in this image. For each object provide:
- Name
- Approximate location (top-left, center, bottom-right, etc.)
- Color
- Size relative to the image (small, medium, large)
- Condition (if applicable)

Return as JSON: {"objects": [{"name": "", "location": "", 
"color": "", "size": "", "condition": ""}]}"""},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
                    }
                ]
            }
        ],
        response_format={"type": "json_object"},
        max_tokens=1500
    )
    return response.choices[0].message.content

Quality Inspection

Industrial quality inspection is one of the highest-value applications of AI vision. Here is a reusable quality inspection class:

python
class QualityInspector:
    """AI-powered visual quality inspection system."""
    
    def __init__(self, reference_description=""):
        self.client = openai.OpenAI()
        self.reference = reference_description
        self.inspection_log = []
    
    def inspect(self, image_path, criteria=None):
        """Inspect a product image for quality issues."""
        base64_image = encode_image(image_path)
        
        prompt = f"""You are a quality control inspector.
Inspect this product image for defects or issues.

Reference standard: {self.reference}
{"Specific criteria: " + criteria if criteria else ""}

Report in JSON format:
{{"pass": true/false, 
 "confidence": 0.0-1.0,
 "defects": [{{"type": "", "severity": "low/medium/high", 
   "location": "", "description": ""}}],
 "overall_quality": "excellent/good/acceptable/poor/reject",
 "notes": ""}}"""
        
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "user", "content": [
                    {"type": "text", "text": prompt},
                    {"type": "image_url", "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }}
                ]}
            ],
            response_format={"type": "json_object"}
        )
        
        result = response.choices[0].message.content
        self.inspection_log.append(result)
        return result

# Usage
inspector = QualityInspector(
    reference_description="Smooth surface, no scratches, uniform color"
)
result = inspector.inspect("product_photo.jpg")
print(result)

Improving Accuracy

For critical quality inspection, provide reference images of both good and defective products along with your prompt. Use "high" detail mode for the image URL to capture fine defects. Consider running multiple inspections per image and taking a consensus for important decisions.

Real-Time Processing

For real-time applications like surveillance or live inspection, you can process video frames at regular intervals:

python
import cv2
import time

def process_video_frames(video_source=0, interval=5):
    """Analyze video frames at regular intervals."""
    cap = cv2.VideoCapture(video_source)
    last_analysis = 0
    
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        current_time = time.time()
        if current_time - last_analysis >= interval:
            # Save frame temporarily
            cv2.imwrite("temp_frame.jpg", frame)
            
            # Analyze with AI
            result = analyze_image(
                "temp_frame.jpg",
                "Describe what you see. Flag any safety concerns."
            )
            print(f"Analysis: {result}")
            last_analysis = current_time
        
        cv2.imshow("Vision Inspector", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()

Summary

You have built a complete AI vision inspection system. Key takeaways:

  • Modern vision models can analyze, compare, and understand images with remarkable accuracy.
  • Object detection and quality inspection are high-value real-world applications.
  • For quality-critical systems, use reference images and multiple inspection passes.
  • Real-time processing is achievable by analyzing video frames at regular intervals.
Vionis Labs - Intelligent AI Solutions for Every Industry | Vionis Labs