Updated to shrink big images before sending them.

2025-03-29 12:31:26 -05:00 · 2025-03-29 12:31:26 -05:00 · d4ea970b3d
parent 11ea971542
commit d4ea970b3d
17 changed files with 431 additions and 174 deletions
--- a/.clinerules
+++ b/.clinerules
@ -0,0 +1,38 @@
+# PyNamer Project Rules
+
+**Implementation Patterns:**
+1. **Image Processing:**
+   - Always maintain aspect ratio when resizing.
+   - Use LANCZOS resampling for quality downscaling.
+   - Handle transparency conversion when saving as JPEG.
+   - Keep original image files untouched until final rename operation.
+
+2. **Filename Generation:**
+   - Enforce snake_case format.
+   - Remove special characters.
+   - Handle duplicate filenames by appending incrementing numbers.
+
+3. **Error Handling:**
+   - Log detailed errors for debugging.
+   - Fail gracefully with clear user feedback.
+   - Preserve original files on errors.
+
+4. **Configuration:**
+   - Sensible defaults for all configurable parameters.
+   - Environment variables can override sensitive settings (API keys).
+   - Config changes require restart (no hot-reloading).
+
+**User Preferences:**
+- Default to JPEG format for resized images (better compression).
+- Default max dimension of 1024px (balances quality and efficiency).
+- Dry-run mode enabled by flag for safety.
+
+**Known Challenges:**
+- Large images may still consume significant memory during processing.
+- Some LLM models may have different optimal image sizes/formats.
+- Transparency handling requires special consideration when converting formats.
+
+**Workflow Patterns:**
+- Always check file existence and supported formats first.
+- Process images sequentially (no parallel processing yet).
+- Log each major operation step for traceability.
--- a/.gitignore
+++ b/.gitignore
@ -1,2 +1,170 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it's recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# operating system files
 .DS_Store
-.venv/
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+# Image files
+*.png
+*.jpg
+*.jpeg
+*.gif
+*.bmp
+*.tiff
+
+# Log files
+*.log
+
+# Editor directories and files
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+*.bak
+*.tmp
+*.orig
+*.class
+*.jar
+*.war
+*.ear
+*.zip
+*.tar.gz
+*.rar
+
+# Local development files
+*.local
+*.dev
--- a/config.yaml
+++ b/config.yaml
@ -2,10 +2,10 @@

 # LLM API Configuration
 llm:
-  provider: "openai"  # Provider name (openai, anthropic, etc.)
-  model: "gpt-4o-mini"  # Model name
-  api_key: ""  # Your API key (leave empty to use environment variable)
-  endpoint: ""  # Custom endpoint URL (if using a proxy or alternative service)
+  provider: "openrouter"  # Supported: openai, anthropic, openrouter
+  model: "openrouter/google/gemma-3-27b-it"  # Must be a vision-capable model
+  api_key: ""  # Your API key (or set OPENAI_API_KEY environment variable)
+  endpoint: ""  # Custom endpoint URL if needed
  max_tokens: 100  # Maximum tokens for response
  temperature: 0.7  # Temperature for generation

@ -17,7 +17,9 @@ image:
    - ".png"
    - ".gif"
    - ".webp"
-  
+  resize_max_dimension: 1024  # Max width/height before resizing
+  resize_format: "JPEG"  # Format for resized images
+
 # Prompt Configuration
 prompt:
  system_message: "You are a helpful assistant that generates concise, descriptive filenames for images. Focus on the main subject, key attributes, and context. Use snake_case format without special characters."
--- a/memory-bank/activeContext.md
+++ b/memory-bank/activeContext.md
@ -0,0 +1,25 @@
+# Active Context: PyNamer - Image Resizing Implementation
+
+**Current Focus:** Implementing image resizing functionality to normalize image dimensions before sending them to the LLM.
+
+**Decisions Made:**
+- Use the `Pillow` library for image manipulation due to its robustness and ease of use in Python.
+- Add `Pillow` to `requirements.txt`.
+- Introduce configuration options in `config.yaml` under the `image` section:
+    - `resize_max_dimension`: Controls the maximum size (width or height) of the image sent to the LLM. Defaults to 1024.
+    - `resize_format`: Specifies the image format (e.g., 'JPEG', 'PNG') to use after resizing. Defaults to 'JPEG'.
+- Modify the `_encode_image` method (renamed to `_resize_and_encode_image`) to perform resizing:
+    - Open the image using `PIL.Image.open()`.
+    - Check if the image's largest dimension exceeds `resize_max_dimension`.
+    - If it exceeds, calculate new dimensions maintaining aspect ratio and resize using `img.resize()` with `Image.Resampling.LANCZOS`.
+    - Save the (potentially resized) image to an in-memory buffer (`io.BytesIO`) using the configured `resize_format`.
+    - Handle potential transparency issues when saving formats like JPEG by converting the image mode to 'RGB' if necessary.
+    - Base64 encode the bytes from the buffer.
+- Update the `generate_filename` method to call `_resize_and_encode_image` instead of `_encode_image`.
+- Update the `generate_filename` method to dynamically set the `mime_type` in the LLM request based on `resize_format`.
+- Load the new configuration options in `_setup_llm`.
+
+**Next Steps:**
+- Create `memory-bank/progress.md`.
+- Create `.clinerules`.
+- Final review and testing.
--- a/memory-bank/productContext.md
+++ b/memory-bank/productContext.md
@ -0,0 +1,16 @@
+# Product Context: PyNamer
+
+**Problem:** Manually naming large numbers of image files is tedious and time-consuming. Generic filenames (e.g., `IMG_1234.JPG`) lack descriptive value, making it hard to find specific images later.
+
+**Solution:** `pynamer` automates the process of generating descriptive filenames for images by leveraging the image understanding capabilities of multimodal LLMs.
+
+**User Experience:**
+- The user provides one or more image paths via the command line.
+- The tool processes each image, interacts with an LLM (configured via `config.yaml`), and renames the file with a descriptive, clean filename.
+- A dry-run option allows users to preview the changes without modifying files.
+- **Efficiency Enhancement:** By resizing large images before sending them to the LLM, the tool aims to:
+    - Reduce the amount of data transferred.
+    - Potentially lower API costs (as some models charge based on input size/tokens).
+    - Speed up the processing time.
+
+**Target User:** Individuals or teams dealing with many images who need a better way to organize and retrieve them based on content (e.g., photographers, researchers, content creators).
--- a/memory-bank/progress.md
+++ b/memory-bank/progress.md
@ -0,0 +1,28 @@
+# Progress: PyNamer - Image Resizing Implementation
+
+**Completed:**
+1. Added Pillow dependency to `requirements.txt`.
+2. Updated `pynamer.py` with image resizing functionality:
+    - Renamed `_encode_image` to `_resize_and_encode_image`.
+    - Implemented image resizing logic using Pillow.
+    - Added proper error handling for image processing.
+    - Updated `generate_filename` to use the new method and set correct mime type.
+3. Updated `config.yaml` with new image resizing configuration options:
+    - `resize_max_dimension`
+    - `resize_format`
+4. Created comprehensive memory bank documentation:
+    - `projectbrief.md`
+    - `productContext.md`
+    - `systemPatterns.md`
+    - `techContext.md`
+    - `activeContext.md`
+
+**Remaining:**
+1. Create `.clinerules` file.
+2. Final testing and verification.
+
+**Issues/Notes:**
+- The implementation maintains backward compatibility with existing configurations.
+- The default resize format is set to JPEG for better compression, but this may need adjustment for images with transparency.
+- The LANCZOS resampling filter provides good quality for downscaling.
+- Error handling has been improved to provide better feedback when image processing fails.
--- a/memory-bank/projectbrief.md
+++ b/memory-bank/projectbrief.md
@ -0,0 +1,18 @@
+# Project Brief: PyNamer
+
+**Goal:** Enhance the `pynamer` tool to improve efficiency and potentially reduce costs by normalizing image sizes before submitting them to a Large Language Model (LLM) for filename generation.
+
+**Core Functionality:**
+- Takes one or more image file paths as input.
+- Reads configuration from `config.yaml`.
+- Resizes images exceeding a configured maximum dimension while maintaining aspect ratio.
+- Encodes the (potentially resized) image to base64.
+- Sends the image data and configured prompts to an LLM (via `litellm`).
+- Receives a descriptive filename suggestion from the LLM.
+- Cleans the suggested filename (snake_case, alphanumeric).
+- Renames the original image file with the new filename.
+- Supports dry-run mode.
+
+**Enhancement:**
+- Added image resizing using the Pillow library before encoding and sending to the LLM.
+- Introduced configuration options (`resize_max_dimension`, `resize_format`) in `config.yaml`.
--- a/memory-bank/systemPatterns.md
+++ b/memory-bank/systemPatterns.md
@ -0,0 +1,36 @@
+# System Patterns: PyNamer
+
+**Architecture:** Command-Line Interface (CLI) tool.
+
+**Core Components:**
+- **CLI Parser (`argparse`):** Handles command-line arguments (`images`, `config`, `dry-run`, `verbose`).
+- **Configuration Loader (`PyYAML`):** Loads settings from `config.yaml`.
+- **LLM Interaction (`litellm`):** Abstracts communication with various LLM providers. Handles API key and endpoint configuration.
+- **Image Processing (`Pillow`):**
+    - Opens and reads image files.
+    - Resizes images exceeding `resize_max_dimension` while maintaining aspect ratio.
+    - Saves the processed image to a specified format (`resize_format`) in memory.
+- **Encoding (`base64`, `io`):** Encodes the processed image data for transmission via API.
+- **File System Interaction (`os`, `pathlib`):** Checks file existence, extracts paths/extensions, renames files.
+- **Filename Cleaning:** Simple string manipulation to enforce snake_case and remove invalid characters.
+- **Logging (`logging`):** Provides informative output about the process.
+
+**Workflow Pattern:**
+1. Parse CLI arguments.
+2. Initialize `PyNamer` class with the config path.
+3. Load configuration (`_load_config`).
+4. Set up LLM client (`_setup_llm`), including image resize settings.
+5. Iterate through input image paths provided via CLI.
+6. For each image:
+    a. Check existence and supported format (`_is_supported_format`).
+    b. Resize and encode the image (`_resize_and_encode_image`).
+    c. Prepare API request payload (prompts + image data).
+    d. Call LLM via `litellm.completion`.
+    e. Extract and clean the suggested filename.
+    f. Construct the new file path.
+    g. If not dry-run, rename the file, handling potential name collisions (`rename_image`).
+    h. Log/print the outcome.
+
+**Configuration Pattern:**
+- Centralized YAML file (`config.yaml`) for user-configurable settings (LLM details, API keys, prompts, image processing parameters).
+- Environment variables can override API keys/endpoints if not set in the config.
--- a/memory-bank/techContext.md
+++ b/memory-bank/techContext.md
@ -0,0 +1,40 @@
+# Tech Context: PyNamer
+
+**Language:** Python 3
+
+**Core Libraries:**
+- `litellm`: For interacting with various LLM APIs (OpenAI, Anthropic, etc.). Handles model routing, API key management, and standardized response format.
+- `PyYAML`: For parsing the `config.yaml` configuration file.
+- `Pillow`: For image manipulation (opening, resizing, saving to buffer).
+- `argparse`: Standard library for parsing command-line arguments.
+- `base64`: Standard library for encoding image data.
+- `io`: Standard library for handling in-memory byte streams (used with Pillow).
+- `os`, `pathlib`: Standard libraries for file system operations.
+- `logging`: Standard library for application logging.
+
+**Dependencies:**
+- Listed in `requirements.txt`.
+- Key dependencies: `litellm`, `pyyaml`, `Pillow`.
+
+**Setup & Execution:**
+1.  **Installation:**
+    ```bash
+    pip install -r requirements.txt 
+    # or potentially: pip install . (if setup.py is configured correctly)
+    ```
+2.  **Configuration:**
+    - Create or modify `config.yaml`.
+    - Set LLM `api_key` in the config or via environment variable (e.g., `OPENAI_API_KEY`).
+    - Adjust `model`, `max_tokens`, `temperature`, `resize_max_dimension`, `resize_format`, and `prompts` as needed.
+3.  **Execution:**
+    ```bash
+    python pynamer.py <image_path_1> [image_path_2 ...] [-c config.yaml] [-d] [-v]
+    ```
+    - `<image_path>`: Path to the image file(s). Handles paths with spaces.
+    - `-c`: Specify a different config file path.
+    - `-d`: Dry run (preview changes).
+    - `-v`: Verbose logging.
+
+**Environment:**
+- Assumes a standard Python environment where dependencies can be installed via pip.
+- Relies on network access to reach the configured LLM API endpoint.
--- a/pynamer.py
+++ b/pynamer.py
@ -2,14 +2,17 @@

 import argparse
 import base64
+import io
 import os
 import sys
 from pathlib import Path
 import yaml
 from typing import Dict, List, Optional, Union
+
 import litellm
 from litellm import completion
 import logging
+from PIL import Image # Added for image processing

 # Configure logging
 logging.basicConfig(
@ -65,21 +68,54 @@ class PyNamer:
        self.model = llm_config.get('model', 'gpt-4-vision-preview')
        self.max_tokens = llm_config.get('max_tokens', 100)
        self.temperature = llm_config.get('temperature', 0.7)
+
+        # Image processing settings
+        image_config = self.config.get('image', {})
+        self.resize_max_dimension = image_config.get('resize_max_dimension', 1024) # Default max dimension
+        self.resize_format = image_config.get('resize_format', 'JPEG') # Default format after resize
        
        logger.info(f"LLM setup complete. Using model: {self.model}")
-    
-    def _encode_image(self, image_path: str) -> str:
-        """Encode image to base64 for API submission.
-        
+        logger.info(f"Image resize settings: max_dimension={self.resize_max_dimension}, format={self.resize_format}")
+
+    def _resize_and_encode_image(self, image_path: str) -> str:
+        """Resize image if necessary and encode to base64 for API submission.
+
        Args:
            image_path: Path to the image file
            
        Returns:
            Base64 encoded image string
        """
-        with open(image_path, "rb") as image_file:
-            return base64.b64encode(image_file.read()).decode('utf-8')
-    
+        try:
+            with Image.open(image_path) as img:
+                # Calculate new size maintaining aspect ratio
+                width, height = img.size
+                if max(width, height) > self.resize_max_dimension:
+                    if width > height:
+                        new_width = self.resize_max_dimension
+                        new_height = int(height * (self.resize_max_dimension / width))
+                    else:
+                        new_height = self.resize_max_dimension
+                        new_width = int(width * (self.resize_max_dimension / height))
+                    
+                    logger.debug(f"Resizing image from {width}x{height} to {new_width}x{new_height}")
+                    img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)
+                else:
+                    logger.debug("Image size is within limits, no resize needed.")
+
+                # Save resized image to a bytes buffer
+                buffer = io.BytesIO()
+                # Handle potential transparency issues when saving as JPEG
+                if self.resize_format.upper() == 'JPEG' and img.mode in ('RGBA', 'P'):
+                     img = img.convert('RGB')
+                img.save(buffer, format=self.resize_format)
+                img_bytes = buffer.getvalue()
+
+            return base64.b64encode(img_bytes).decode('utf-8')
+        except Exception as e:
+            logger.error(f"Error processing image {image_path}: {e}")
+            raise # Re-raise the exception to be caught by the caller
+
    def _is_supported_format(self, file_path: str) -> bool:
        """Check if the file format is supported.
        
@ -111,9 +147,12 @@ class PyNamer:
            return None
        
        try:
-            # Encode image
-            base64_image = self._encode_image(image_path)
+            # Resize and encode image
+            base64_image = self._resize_and_encode_image(image_path)
            
+            # Determine the mime type based on the resize format
+            mime_type = f"image/{self.resize_format.lower()}"
+
            # Prepare messages for LLM
            system_message = self.config.get('prompt', {}).get('system_message', '')
            user_message = self.config.get('prompt', {}).get('user_message', '')
@ -126,7 +165,7 @@ class PyNamer:
                        {"type": "text", "text": user_message},
                        {
                            "type": "image_url",
-                            "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
+                            "image_url": {"url": f"data:{mime_type};base64,{base64_image}"}
                        }
                    ]
                }
@ -271,4 +310,4 @@ def main():
            print(f"Failed to process: {image_path}")

 if __name__ == "__main__":
-    main()
+    main()
--- a/requirements.txt
+++ b/requirements.txt
@ -1,2 +1,3 @@
 litellm>=1.10.0
 pyyaml>=6.0
+Pillow>=9.0.0 # Added for image resizing
--- a/setup.py
+++ b/setup.py
@ -30,6 +30,7 @@ setup(
    install_requires=[
        "litellm>=1.10.0",
        "pyyaml>=6.0",
+        "Pillow>=9.0.0",
    ],
    python_requires=">=3.7",
    entry_points={
--- a/src/pynamer.egg-info/PKG-INFO
+++ b/src/pynamer.egg-info/PKG-INFO
@ -1,138 +0,0 @@
-Metadata-Version: 2.1
-Name: pynamer
-Version: 0.1.0
-Summary: Generate descriptive filenames for images using LLMs
-Home-page: https://github.com/yourusername/pynamer
-Author: Your Name
-Author-email: your.email@example.com
-Classifier: Development Status :: 3 - Alpha
-Classifier: Intended Audience :: Developers
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.7
-Classifier: Programming Language :: Python :: 3.8
-Classifier: Programming Language :: Python :: 3.9
-Classifier: Programming Language :: Python :: 3.10
-Requires-Python: >=3.7
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: litellm>=1.10.0
-Requires-Dist: pyyaml>=6.0
-
-# PyNamer
-
-PyNamer is a command-line tool that uses AI vision models to generate descriptive filenames for images. It analyzes the content of images and renames them with meaningful, descriptive filenames in snake_case format.
-
-## Features
-
- Uses LiteLLM to integrate with various vision-capable LLMs (default: GPT-4 Vision)
- Configurable via YAML config file
- Supports multiple image formats (jpg, jpeg, png, gif, webp)
- Dry-run mode to preview changes without renaming files
- Handles filename collisions automatically
-
-## Installation
-
-### Option 1: Install from PyPI (recommended)
-
-```bash
-pip install pynamer
-```
-
-### Option 2: Install from source
-
-1. Clone this repository
-2. Install the package in development mode:
-
-```bash
-pip install -e .
-```
-
-### Set up your API key
-
-You need to set up your API key for the vision model:
-
- Set the appropriate environment variable (e.g., `OPENAI_API_KEY`), or
- Create a custom config file with your API key
-
-## Configuration
-
-PyNamer comes with a default configuration, but you can create a custom config file to customize:
-
- LLM provider and model
- API key and endpoint
- Supported image formats
- Prompt templates for filename generation
-
-Example custom config file (config.yaml):
-
-```yaml
-llm:
-  provider: "openai"
-  model: "gpt-4-vision-preview"
-  api_key: "your-api-key-here"
-  max_tokens: 100
-  temperature: 0.7
-```
-
-## Usage
-
-After installation, you can use PyNamer directly from the command line:
-
-Basic usage:
-
-```bash
-pynamer path/to/image.jpg
-```
-
-Process multiple images:
-
-```bash
-pynamer image1.jpg image2.png image3.jpg
-```
-
-Use a different config file:
-
-```bash
-pynamer -c custom_config.yaml image.jpg
-```
-
-Preview changes without renaming (dry run):
-
-```bash
-pynamer -d image.jpg
-```
-
-Enable verbose logging:
-
-```bash
-pynamer -v image.jpg
-```
-
-## Example
-
-Input: `IMG_20230615_123456.jpg` (a photo of a cat sleeping on a window sill)
-
-Output: `orange_cat_sleeping_on_sunny_windowsill.jpg`
-
-## Development
-
-### Building the package
-
-```bash
-pip install build
-python -m build
-```
-
-### Installing in development mode
-
-```bash
-pip install -e .
-```
-
-## Requirements
-
- Python 3.7+
- LiteLLM
- PyYAML
- Access to a vision-capable LLM API (OpenAI, Anthropic, etc.)
--- a/src/pynamer.egg-info/SOURCES.txt
+++ b/src/pynamer.egg-info/SOURCES.txt
@ -1,15 +0,0 @@
-LICENSE
-MANIFEST.in
-README.md
-pyproject.toml
-setup.py
-src/pynamer/__init__.py
-src/pynamer/cli.py
-src/pynamer/config.yaml
-src/pynamer/core.py
-src/pynamer.egg-info/PKG-INFO
-src/pynamer.egg-info/SOURCES.txt
-src/pynamer.egg-info/dependency_links.txt
-src/pynamer.egg-info/entry_points.txt
-src/pynamer.egg-info/requires.txt
-src/pynamer.egg-info/top_level.txt
--- a/src/pynamer.egg-info/requires.txt
+++ b/src/pynamer.egg-info/requires.txt
@ -1,2 +0,0 @@
-litellm>=1.10.0
-pyyaml>=6.0
--- a/src/pynamer/init.py
+++ b/src/pynamer/init.py
@ -1,3 +1,3 @@
 """PyNamer - Generate descriptive filenames for images using LLMs."""

-__version__ = "0.1.0"
+__version__ = "0.2.0"
--- a/src/pynamer/core.py
+++ b/src/pynamer/core.py
@ -1,4 +1,4 @@
-"""Core functionality for PyNamer."""
+"#""Core functionality for PyNamer."""

 import argparse
 import base64