Updated to shrink big images before sending them.

2025-03-29 12:31:26 -05:00 · 2025-03-29 12:31:26 -05:00 · d4ea970b3d
parent 11ea971542
commit d4ea970b3d
17 changed files with 431 additions and 174 deletions
--- a/.clinerules
+++ b/.clinerules
@ -0,0 +1,38 @@
 # PyNamer Project Rules
 **Implementation Patterns:**
 1. **Image Processing:**
   - Always maintain aspect ratio when resizing.
   - Use LANCZOS resampling for quality downscaling.
   - Handle transparency conversion when saving as JPEG.
   - Keep original image files untouched until final rename operation.
 2. **Filename Generation:**
   - Enforce snake_case format.
   - Remove special characters.
   - Handle duplicate filenames by appending incrementing numbers.
 3. **Error Handling:**
   - Log detailed errors for debugging.
   - Fail gracefully with clear user feedback.
   - Preserve original files on errors.
 4. **Configuration:**
   - Sensible defaults for all configurable parameters.
   - Environment variables can override sensitive settings (API keys).
   - Config changes require restart (no hot-reloading).
 **User Preferences:**
 - Default to JPEG format for resized images (better compression).
 - Default max dimension of 1024px (balances quality and efficiency).
 - Dry-run mode enabled by flag for safety.
 **Known Challenges:**
 - Large images may still consume significant memory during processing.
 - Some LLM models may have different optimal image sizes/formats.
 - Transparency handling requires special consideration when converting formats.
 **Workflow Patterns:**
 - Always check file existence and supported formats first.
 - Process images sequentially (no parallel processing yet).
 - Log each major operation step for traceability.
--- a/.gitignore
+++ b/.gitignore
@ -1,2 +1,170 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 .python-version
 # pipenv
 #   According to pypa/pipenv#598, it's recommended to include Pipfile.lock in version control.
 #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 #   install all needed dependencies.
 #Pipfile.lock
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # operating system files
 .DS_Store
-.venv/
+.DS_Store?
 ._*
 .Spotlight-V100
 .Trashes
 ehthumbs.db
 Thumbs.db
 # Image files
 *.png
 *.jpg
 *.jpeg
 *.gif
 *.bmp
 *.tiff
 # Log files
 *.log
 # Editor directories and files
 .idea/
 .vscode/
 *.swp
 *.swo
 *~
 *.bak
 *.tmp
 *.orig
 *.class
 *.jar
 *.war
 *.ear
 *.zip
 *.tar.gz
 *.rar
 # Local development files
 *.local
 *.dev
--- a/config.yaml
+++ b/config.yaml
@ -2,10 +2,10 @@
 # LLM API Configuration
 llm:
-  provider: "openai"  # Provider name (openai, anthropic, etc.)
+  provider: "openrouter"  # Supported: openai, anthropic, openrouter
-  model: "gpt-4o-mini"  # Model name
+  model: "openrouter/google/gemma-3-27b-it"  # Must be a vision-capable model
-  api_key: ""  # Your API key (leave empty to use environment variable)
+  api_key: ""  # Your API key (or set OPENAI_API_KEY environment variable)
-  endpoint: ""  # Custom endpoint URL (if using a proxy or alternative service)
+  endpoint: ""  # Custom endpoint URL if needed
  max_tokens: 100  # Maximum tokens for response
  temperature: 0.7  # Temperature for generation
@ -17,7 +17,9 @@ image:
    - ".png"
    - ".gif"
    - ".webp"
-  
+  resize_max_dimension: 1024  # Max width/height before resizing
  resize_format: "JPEG"  # Format for resized images
 # Prompt Configuration
 prompt:
  system_message: "You are a helpful assistant that generates concise, descriptive filenames for images. Focus on the main subject, key attributes, and context. Use snake_case format without special characters."
--- a/memory-bank/activeContext.md
+++ b/memory-bank/activeContext.md
@ -0,0 +1,25 @@
 # Active Context: PyNamer - Image Resizing Implementation
 **Current Focus:** Implementing image resizing functionality to normalize image dimensions before sending them to the LLM.
 **Decisions Made:**
 - Use the `Pillow` library for image manipulation due to its robustness and ease of use in Python.
 - Add `Pillow` to `requirements.txt`.
 - Introduce configuration options in `config.yaml` under the `image` section:
    - `resize_max_dimension`: Controls the maximum size (width or height) of the image sent to the LLM. Defaults to 1024.
    - `resize_format`: Specifies the image format (e.g., 'JPEG', 'PNG') to use after resizing. Defaults to 'JPEG'.
 - Modify the `_encode_image` method (renamed to `_resize_and_encode_image`) to perform resizing:
    - Open the image using `PIL.Image.open()`.
    - Check if the image's largest dimension exceeds `resize_max_dimension`.
    - If it exceeds, calculate new dimensions maintaining aspect ratio and resize using `img.resize()` with `Image.Resampling.LANCZOS`.
    - Save the (potentially resized) image to an in-memory buffer (`io.BytesIO`) using the configured `resize_format`.
    - Handle potential transparency issues when saving formats like JPEG by converting the image mode to 'RGB' if necessary.
    - Base64 encode the bytes from the buffer.
 - Update the `generate_filename` method to call `_resize_and_encode_image` instead of `_encode_image`.
 - Update the `generate_filename` method to dynamically set the `mime_type` in the LLM request based on `resize_format`.
 - Load the new configuration options in `_setup_llm`.
 **Next Steps:**
 - Create `memory-bank/progress.md`.
 - Create `.clinerules`.
 - Final review and testing.
--- a/memory-bank/productContext.md
+++ b/memory-bank/productContext.md
@ -0,0 +1,16 @@
 # Product Context: PyNamer
 **Problem:** Manually naming large numbers of image files is tedious and time-consuming. Generic filenames (e.g., `IMG_1234.JPG`) lack descriptive value, making it hard to find specific images later.
 **Solution:** `pynamer` automates the process of generating descriptive filenames for images by leveraging the image understanding capabilities of multimodal LLMs.
 **User Experience:**
 - The user provides one or more image paths via the command line.
 - The tool processes each image, interacts with an LLM (configured via `config.yaml`), and renames the file with a descriptive, clean filename.
 - A dry-run option allows users to preview the changes without modifying files.
 - **Efficiency Enhancement:** By resizing large images before sending them to the LLM, the tool aims to:
    - Reduce the amount of data transferred.
    - Potentially lower API costs (as some models charge based on input size/tokens).
    - Speed up the processing time.
 **Target User:** Individuals or teams dealing with many images who need a better way to organize and retrieve them based on content (e.g., photographers, researchers, content creators).
--- a/memory-bank/progress.md
+++ b/memory-bank/progress.md
@ -0,0 +1,28 @@
 # Progress: PyNamer - Image Resizing Implementation
 **Completed:**
 1. Added Pillow dependency to `requirements.txt`.
 2. Updated `pynamer.py` with image resizing functionality:
    - Renamed `_encode_image` to `_resize_and_encode_image`.
    - Implemented image resizing logic using Pillow.
    - Added proper error handling for image processing.
    - Updated `generate_filename` to use the new method and set correct mime type.
 3. Updated `config.yaml` with new image resizing configuration options:
    - `resize_max_dimension`
    - `resize_format`
 4. Created comprehensive memory bank documentation:
    - `projectbrief.md`
    - `productContext.md`
    - `systemPatterns.md`
    - `techContext.md`
    - `activeContext.md`
 **Remaining:**
 1. Create `.clinerules` file.
 2. Final testing and verification.
 **Issues/Notes:**
 - The implementation maintains backward compatibility with existing configurations.
 - The default resize format is set to JPEG for better compression, but this may need adjustment for images with transparency.
 - The LANCZOS resampling filter provides good quality for downscaling.
 - Error handling has been improved to provide better feedback when image processing fails.
--- a/memory-bank/projectbrief.md
+++ b/memory-bank/projectbrief.md
@ -0,0 +1,18 @@
 # Project Brief: PyNamer
 **Goal:** Enhance the `pynamer` tool to improve efficiency and potentially reduce costs by normalizing image sizes before submitting them to a Large Language Model (LLM) for filename generation.
 **Core Functionality:**
 - Takes one or more image file paths as input.
 - Reads configuration from `config.yaml`.
 - Resizes images exceeding a configured maximum dimension while maintaining aspect ratio.
 - Encodes the (potentially resized) image to base64.
 - Sends the image data and configured prompts to an LLM (via `litellm`).
 - Receives a descriptive filename suggestion from the LLM.
 - Cleans the suggested filename (snake_case, alphanumeric).
 - Renames the original image file with the new filename.
 - Supports dry-run mode.
 **Enhancement:**
 - Added image resizing using the Pillow library before encoding and sending to the LLM.
 - Introduced configuration options (`resize_max_dimension`, `resize_format`) in `config.yaml`.
--- a/memory-bank/systemPatterns.md
+++ b/memory-bank/systemPatterns.md
@ -0,0 +1,36 @@
 # System Patterns: PyNamer
 **Architecture:** Command-Line Interface (CLI) tool.
 **Core Components:**
 - **CLI Parser (`argparse`):** Handles command-line arguments (`images`, `config`, `dry-run`, `verbose`).
 - **Configuration Loader (`PyYAML`):** Loads settings from `config.yaml`.
 - **LLM Interaction (`litellm`):** Abstracts communication with various LLM providers. Handles API key and endpoint configuration.
 - **Image Processing (`Pillow`):**
    - Opens and reads image files.
    - Resizes images exceeding `resize_max_dimension` while maintaining aspect ratio.
    - Saves the processed image to a specified format (`resize_format`) in memory.
 - **Encoding (`base64`, `io`):** Encodes the processed image data for transmission via API.
 - **File System Interaction (`os`, `pathlib`):** Checks file existence, extracts paths/extensions, renames files.
 - **Filename Cleaning:** Simple string manipulation to enforce snake_case and remove invalid characters.
 - **Logging (`logging`):** Provides informative output about the process.
 **Workflow Pattern:**
 1. Parse CLI arguments.
 2. Initialize `PyNamer` class with the config path.
 3. Load configuration (`_load_config`).
 4. Set up LLM client (`_setup_llm`), including image resize settings.
 5. Iterate through input image paths provided via CLI.
 6. For each image:
    a. Check existence and supported format (`_is_supported_format`).
    b. Resize and encode the image (`_resize_and_encode_image`).
    c. Prepare API request payload (prompts + image data).
    d. Call LLM via `litellm.completion`.
    e. Extract and clean the suggested filename.
    f. Construct the new file path.
    g. If not dry-run, rename the file, handling potential name collisions (`rename_image`).
    h. Log/print the outcome.
 **Configuration Pattern:**
 - Centralized YAML file (`config.yaml`) for user-configurable settings (LLM details, API keys, prompts, image processing parameters).
 - Environment variables can override API keys/endpoints if not set in the config.
--- a/memory-bank/techContext.md
+++ b/memory-bank/techContext.md
@ -0,0 +1,40 @@
 # Tech Context: PyNamer
 **Language:** Python 3
 **Core Libraries:**
 - `litellm`: For interacting with various LLM APIs (OpenAI, Anthropic, etc.). Handles model routing, API key management, and standardized response format.
 - `PyYAML`: For parsing the `config.yaml` configuration file.
 - `Pillow`: For image manipulation (opening, resizing, saving to buffer).
 - `argparse`: Standard library for parsing command-line arguments.
 - `base64`: Standard library for encoding image data.
 - `io`: Standard library for handling in-memory byte streams (used with Pillow).
 - `os`, `pathlib`: Standard libraries for file system operations.
 - `logging`: Standard library for application logging.
 **Dependencies:**
 - Listed in `requirements.txt`.
 - Key dependencies: `litellm`, `pyyaml`, `Pillow`.
 **Setup & Execution:**
 1.  **Installation:**
    ```bash
    pip install -r requirements.txt 
    # or potentially: pip install . (if setup.py is configured correctly)
    ```
 2.  **Configuration:**
    - Create or modify `config.yaml`.
    - Set LLM `api_key` in the config or via environment variable (e.g., `OPENAI_API_KEY`).
    - Adjust `model`, `max_tokens`, `temperature`, `resize_max_dimension`, `resize_format`, and `prompts` as needed.
 3.  **Execution:**
    ```bash
    python pynamer.py <image_path_1> [image_path_2 ...] [-c config.yaml] [-d] [-v]
    ```
    - `<image_path>`: Path to the image file(s). Handles paths with spaces.
    - `-c`: Specify a different config file path.
    - `-d`: Dry run (preview changes).
    - `-v`: Verbose logging.
 **Environment:**
 - Assumes a standard Python environment where dependencies can be installed via pip.
 - Relies on network access to reach the configured LLM API endpoint.
--- a/pynamer.py
+++ b/pynamer.py
@ -2,14 +2,17 @@
 import argparse
 import base64
 import io
 import os
 import sys
 from pathlib import Path
 import yaml
 from typing import Dict, List, Optional, Union
 import litellm
 from litellm import completion
 import logging
 from PIL import Image # Added for image processing
 # Configure logging
 logging.basicConfig(
@ -65,21 +68,54 @@ class PyNamer:
        self.model = llm_config.get('model', 'gpt-4-vision-preview')
        self.max_tokens = llm_config.get('max_tokens', 100)
        self.temperature = llm_config.get('temperature', 0.7)
        # Image processing settings
        image_config = self.config.get('image', {})
        self.resize_max_dimension = image_config.get('resize_max_dimension', 1024) # Default max dimension
        self.resize_format = image_config.get('resize_format', 'JPEG') # Default format after resize
        logger.info(f"LLM setup complete. Using model: {self.model}")
-    
+        logger.info(f"Image resize settings: max_dimension={self.resize_max_dimension}, format={self.resize_format}")
-    def _encode_image(self, image_path: str) -> str:
+
-        """Encode image to base64 for API submission.
+    def _resize_and_encode_image(self, image_path: str) -> str:
-        
+        """Resize image if necessary and encode to base64 for API submission.
        Args:
            image_path: Path to the image file
        Returns:
            Base64 encoded image string
        """
-        with open(image_path, "rb") as image_file:
+        try:
-            return base64.b64encode(image_file.read()).decode('utf-8')
+            with Image.open(image_path) as img:
-    
+                # Calculate new size maintaining aspect ratio
                width, height = img.size
                if max(width, height) > self.resize_max_dimension:
                    if width > height:
                        new_width = self.resize_max_dimension
                        new_height = int(height * (self.resize_max_dimension / width))
                    else:
                        new_height = self.resize_max_dimension
                        new_width = int(width * (self.resize_max_dimension / height))
                    logger.debug(f"Resizing image from {width}x{height} to {new_width}x{new_height}")
                    img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)
                else:
                    logger.debug("Image size is within limits, no resize needed.")
                # Save resized image to a bytes buffer
                buffer = io.BytesIO()
                # Handle potential transparency issues when saving as JPEG
                if self.resize_format.upper() == 'JPEG' and img.mode in ('RGBA', 'P'):
                     img = img.convert('RGB')
                img.save(buffer, format=self.resize_format)
                img_bytes = buffer.getvalue()
            return base64.b64encode(img_bytes).decode('utf-8')
        except Exception as e:
            logger.error(f"Error processing image {image_path}: {e}")
            raise # Re-raise the exception to be caught by the caller
    def _is_supported_format(self, file_path: str) -> bool:
        """Check if the file format is supported.
@ -111,9 +147,12 @@ class PyNamer:
            return None
        try:
-            # Encode image
+            # Resize and encode image
-            base64_image = self._encode_image(image_path)
+            base64_image = self._resize_and_encode_image(image_path)
            # Determine the mime type based on the resize format
            mime_type = f"image/{self.resize_format.lower()}"
            # Prepare messages for LLM
            system_message = self.config.get('prompt', {}).get('system_message', '')
            user_message = self.config.get('prompt', {}).get('user_message', '')
@ -126,7 +165,7 @@ class PyNamer:
                        {"type": "text", "text": user_message},
                        {
                            "type": "image_url",
-                            "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
+                            "image_url": {"url": f"data:{mime_type};base64,{base64_image}"}
                        }
                    ]
                }
@ -271,4 +310,4 @@ def main():
            print(f"Failed to process: {image_path}")
 if __name__ == "__main__":
-    main()
+    main()
--- a/requirements.txt
+++ b/requirements.txt
@ -1,2 +1,3 @@
 litellm>=1.10.0
 pyyaml>=6.0
 Pillow>=9.0.0 # Added for image resizing
--- a/setup.py
+++ b/setup.py
@ -30,6 +30,7 @@ setup(
    install_requires=[
        "litellm>=1.10.0",
        "pyyaml>=6.0",
        "Pillow>=9.0.0",
    ],
    python_requires=">=3.7",
    entry_points={
--- a/src/pynamer.egg-info/PKG-INFO
+++ b/src/pynamer.egg-info/PKG-INFO
@ -1,138 +0,0 @@
 Metadata-Version: 2.1
 Name: pynamer
 Version: 0.1.0
 Summary: Generate descriptive filenames for images using LLMs
 Home-page: https://github.com/yourusername/pynamer
 Author: Your Name
 Author-email: your.email@example.com
 Classifier: Development Status :: 3 - Alpha
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.7
 Classifier: Programming Language :: Python :: 3.8
 Classifier: Programming Language :: Python :: 3.9
 Classifier: Programming Language :: Python :: 3.10
 Requires-Python: >=3.7
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: litellm>=1.10.0
 Requires-Dist: pyyaml>=6.0
 # PyNamer
 PyNamer is a command-line tool that uses AI vision models to generate descriptive filenames for images. It analyzes the content of images and renames them with meaningful, descriptive filenames in snake_case format.
 ## Features
 - Uses LiteLLM to integrate with various vision-capable LLMs (default: GPT-4 Vision)
 - Configurable via YAML config file
 - Supports multiple image formats (jpg, jpeg, png, gif, webp)
 - Dry-run mode to preview changes without renaming files
 - Handles filename collisions automatically
 ## Installation
 ### Option 1: Install from PyPI (recommended)
 ```bash
 pip install pynamer
 ```
 ### Option 2: Install from source
 1. Clone this repository
 2. Install the package in development mode:
 ```bash
 pip install -e .
 ```
 ### Set up your API key
 You need to set up your API key for the vision model:
 - Set the appropriate environment variable (e.g., `OPENAI_API_KEY`), or
 - Create a custom config file with your API key
 ## Configuration
 PyNamer comes with a default configuration, but you can create a custom config file to customize:
 - LLM provider and model
 - API key and endpoint
 - Supported image formats
 - Prompt templates for filename generation
 Example custom config file (config.yaml):
 ```yaml
 llm:
  provider: "openai"
  model: "gpt-4-vision-preview"
  api_key: "your-api-key-here"
  max_tokens: 100
  temperature: 0.7
 ```
 ## Usage
 After installation, you can use PyNamer directly from the command line:
 Basic usage:
 ```bash
 pynamer path/to/image.jpg
 ```
 Process multiple images:
 ```bash
 pynamer image1.jpg image2.png image3.jpg
 ```
 Use a different config file:
 ```bash
 pynamer -c custom_config.yaml image.jpg
 ```
 Preview changes without renaming (dry run):
 ```bash
 pynamer -d image.jpg
 ```
 Enable verbose logging:
 ```bash
 pynamer -v image.jpg
 ```
 ## Example
 Input: `IMG_20230615_123456.jpg` (a photo of a cat sleeping on a window sill)
 Output: `orange_cat_sleeping_on_sunny_windowsill.jpg`
 ## Development
 ### Building the package
 ```bash
 pip install build
 python -m build
 ```
 ### Installing in development mode
 ```bash
 pip install -e .
 ```
 ## Requirements
 - Python 3.7+
 - LiteLLM
 - PyYAML
 - Access to a vision-capable LLM API (OpenAI, Anthropic, etc.)
--- a/src/pynamer.egg-info/SOURCES.txt
+++ b/src/pynamer.egg-info/SOURCES.txt
@ -1,15 +0,0 @@
 LICENSE
 MANIFEST.in
 README.md
 pyproject.toml
 setup.py
 src/pynamer/__init__.py
 src/pynamer/cli.py
 src/pynamer/config.yaml
 src/pynamer/core.py
 src/pynamer.egg-info/PKG-INFO
 src/pynamer.egg-info/SOURCES.txt
 src/pynamer.egg-info/dependency_links.txt
 src/pynamer.egg-info/entry_points.txt
 src/pynamer.egg-info/requires.txt
 src/pynamer.egg-info/top_level.txt
--- a/src/pynamer.egg-info/requires.txt
+++ b/src/pynamer.egg-info/requires.txt
@ -1,2 +0,0 @@
 litellm>=1.10.0
 pyyaml>=6.0
--- a/src/pynamer/init.py
+++ b/src/pynamer/init.py
@ -1,3 +1,3 @@
 """PyNamer - Generate descriptive filenames for images using LLMs."""
-__version__ = "0.1.0"
+__version__ = "0.2.0"
--- a/src/pynamer/core.py
+++ b/src/pynamer/core.py
@ -1,4 +1,4 @@
-"""Core functionality for PyNamer."""
+"#""Core functionality for PyNamer."""
 import argparse
 import base64
`@ -1,3 +1,3 @@`
	`"""PyNamer - Generate descriptive filenames for images using LLMs."""`	`"""PyNamer - Generate descriptive filenames for images using LLMs."""`

	`__version__ = "0.1.0"`	`__version__ = "0.2.0"`