Updated to shrink big images before sending them.

This commit is contained in:
Steve White 2025-03-29 12:31:26 -05:00
parent 11ea971542
commit d4ea970b3d
17 changed files with 431 additions and 174 deletions

38
.clinerules Normal file
View File

@ -0,0 +1,38 @@
# PyNamer Project Rules
**Implementation Patterns:**
1. **Image Processing:**
- Always maintain aspect ratio when resizing.
- Use LANCZOS resampling for quality downscaling.
- Handle transparency conversion when saving as JPEG.
- Keep original image files untouched until final rename operation.
2. **Filename Generation:**
- Enforce snake_case format.
- Remove special characters.
- Handle duplicate filenames by appending incrementing numbers.
3. **Error Handling:**
- Log detailed errors for debugging.
- Fail gracefully with clear user feedback.
- Preserve original files on errors.
4. **Configuration:**
- Sensible defaults for all configurable parameters.
- Environment variables can override sensitive settings (API keys).
- Config changes require restart (no hot-reloading).
**User Preferences:**
- Default to JPEG format for resized images (better compression).
- Default max dimension of 1024px (balances quality and efficiency).
- Dry-run mode enabled by flag for safety.
**Known Challenges:**
- Large images may still consume significant memory during processing.
- Some LLM models may have different optimal image sizes/formats.
- Transparency handling requires special consideration when converting formats.
**Workflow Patterns:**
- Always check file existence and supported formats first.
- Process images sequentially (no parallel processing yet).
- Log each major operation step for traceability.

170
.gitignore vendored
View File

@ -1,2 +1,170 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it's recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# operating system files
.DS_Store .DS_Store
.venv/ .DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Image files
*.png
*.jpg
*.jpeg
*.gif
*.bmp
*.tiff
# Log files
*.log
# Editor directories and files
.idea/
.vscode/
*.swp
*.swo
*~
*.bak
*.tmp
*.orig
*.class
*.jar
*.war
*.ear
*.zip
*.tar.gz
*.rar
# Local development files
*.local
*.dev

View File

@ -2,10 +2,10 @@
# LLM API Configuration # LLM API Configuration
llm: llm:
provider: "openai" # Provider name (openai, anthropic, etc.) provider: "openrouter" # Supported: openai, anthropic, openrouter
model: "gpt-4o-mini" # Model name model: "openrouter/google/gemma-3-27b-it" # Must be a vision-capable model
api_key: "" # Your API key (leave empty to use environment variable) api_key: "" # Your API key (or set OPENAI_API_KEY environment variable)
endpoint: "" # Custom endpoint URL (if using a proxy or alternative service) endpoint: "" # Custom endpoint URL if needed
max_tokens: 100 # Maximum tokens for response max_tokens: 100 # Maximum tokens for response
temperature: 0.7 # Temperature for generation temperature: 0.7 # Temperature for generation
@ -17,7 +17,9 @@ image:
- ".png" - ".png"
- ".gif" - ".gif"
- ".webp" - ".webp"
resize_max_dimension: 1024 # Max width/height before resizing
resize_format: "JPEG" # Format for resized images
# Prompt Configuration # Prompt Configuration
prompt: prompt:
system_message: "You are a helpful assistant that generates concise, descriptive filenames for images. Focus on the main subject, key attributes, and context. Use snake_case format without special characters." system_message: "You are a helpful assistant that generates concise, descriptive filenames for images. Focus on the main subject, key attributes, and context. Use snake_case format without special characters."

View File

@ -0,0 +1,25 @@
# Active Context: PyNamer - Image Resizing Implementation
**Current Focus:** Implementing image resizing functionality to normalize image dimensions before sending them to the LLM.
**Decisions Made:**
- Use the `Pillow` library for image manipulation due to its robustness and ease of use in Python.
- Add `Pillow` to `requirements.txt`.
- Introduce configuration options in `config.yaml` under the `image` section:
- `resize_max_dimension`: Controls the maximum size (width or height) of the image sent to the LLM. Defaults to 1024.
- `resize_format`: Specifies the image format (e.g., 'JPEG', 'PNG') to use after resizing. Defaults to 'JPEG'.
- Modify the `_encode_image` method (renamed to `_resize_and_encode_image`) to perform resizing:
- Open the image using `PIL.Image.open()`.
- Check if the image's largest dimension exceeds `resize_max_dimension`.
- If it exceeds, calculate new dimensions maintaining aspect ratio and resize using `img.resize()` with `Image.Resampling.LANCZOS`.
- Save the (potentially resized) image to an in-memory buffer (`io.BytesIO`) using the configured `resize_format`.
- Handle potential transparency issues when saving formats like JPEG by converting the image mode to 'RGB' if necessary.
- Base64 encode the bytes from the buffer.
- Update the `generate_filename` method to call `_resize_and_encode_image` instead of `_encode_image`.
- Update the `generate_filename` method to dynamically set the `mime_type` in the LLM request based on `resize_format`.
- Load the new configuration options in `_setup_llm`.
**Next Steps:**
- Create `memory-bank/progress.md`.
- Create `.clinerules`.
- Final review and testing.

View File

@ -0,0 +1,16 @@
# Product Context: PyNamer
**Problem:** Manually naming large numbers of image files is tedious and time-consuming. Generic filenames (e.g., `IMG_1234.JPG`) lack descriptive value, making it hard to find specific images later.
**Solution:** `pynamer` automates the process of generating descriptive filenames for images by leveraging the image understanding capabilities of multimodal LLMs.
**User Experience:**
- The user provides one or more image paths via the command line.
- The tool processes each image, interacts with an LLM (configured via `config.yaml`), and renames the file with a descriptive, clean filename.
- A dry-run option allows users to preview the changes without modifying files.
- **Efficiency Enhancement:** By resizing large images before sending them to the LLM, the tool aims to:
- Reduce the amount of data transferred.
- Potentially lower API costs (as some models charge based on input size/tokens).
- Speed up the processing time.
**Target User:** Individuals or teams dealing with many images who need a better way to organize and retrieve them based on content (e.g., photographers, researchers, content creators).

28
memory-bank/progress.md Normal file
View File

@ -0,0 +1,28 @@
# Progress: PyNamer - Image Resizing Implementation
**Completed:**
1. Added Pillow dependency to `requirements.txt`.
2. Updated `pynamer.py` with image resizing functionality:
- Renamed `_encode_image` to `_resize_and_encode_image`.
- Implemented image resizing logic using Pillow.
- Added proper error handling for image processing.
- Updated `generate_filename` to use the new method and set correct mime type.
3. Updated `config.yaml` with new image resizing configuration options:
- `resize_max_dimension`
- `resize_format`
4. Created comprehensive memory bank documentation:
- `projectbrief.md`
- `productContext.md`
- `systemPatterns.md`
- `techContext.md`
- `activeContext.md`
**Remaining:**
1. Create `.clinerules` file.
2. Final testing and verification.
**Issues/Notes:**
- The implementation maintains backward compatibility with existing configurations.
- The default resize format is set to JPEG for better compression, but this may need adjustment for images with transparency.
- The LANCZOS resampling filter provides good quality for downscaling.
- Error handling has been improved to provide better feedback when image processing fails.

View File

@ -0,0 +1,18 @@
# Project Brief: PyNamer
**Goal:** Enhance the `pynamer` tool to improve efficiency and potentially reduce costs by normalizing image sizes before submitting them to a Large Language Model (LLM) for filename generation.
**Core Functionality:**
- Takes one or more image file paths as input.
- Reads configuration from `config.yaml`.
- Resizes images exceeding a configured maximum dimension while maintaining aspect ratio.
- Encodes the (potentially resized) image to base64.
- Sends the image data and configured prompts to an LLM (via `litellm`).
- Receives a descriptive filename suggestion from the LLM.
- Cleans the suggested filename (snake_case, alphanumeric).
- Renames the original image file with the new filename.
- Supports dry-run mode.
**Enhancement:**
- Added image resizing using the Pillow library before encoding and sending to the LLM.
- Introduced configuration options (`resize_max_dimension`, `resize_format`) in `config.yaml`.

View File

@ -0,0 +1,36 @@
# System Patterns: PyNamer
**Architecture:** Command-Line Interface (CLI) tool.
**Core Components:**
- **CLI Parser (`argparse`):** Handles command-line arguments (`images`, `config`, `dry-run`, `verbose`).
- **Configuration Loader (`PyYAML`):** Loads settings from `config.yaml`.
- **LLM Interaction (`litellm`):** Abstracts communication with various LLM providers. Handles API key and endpoint configuration.
- **Image Processing (`Pillow`):**
- Opens and reads image files.
- Resizes images exceeding `resize_max_dimension` while maintaining aspect ratio.
- Saves the processed image to a specified format (`resize_format`) in memory.
- **Encoding (`base64`, `io`):** Encodes the processed image data for transmission via API.
- **File System Interaction (`os`, `pathlib`):** Checks file existence, extracts paths/extensions, renames files.
- **Filename Cleaning:** Simple string manipulation to enforce snake_case and remove invalid characters.
- **Logging (`logging`):** Provides informative output about the process.
**Workflow Pattern:**
1. Parse CLI arguments.
2. Initialize `PyNamer` class with the config path.
3. Load configuration (`_load_config`).
4. Set up LLM client (`_setup_llm`), including image resize settings.
5. Iterate through input image paths provided via CLI.
6. For each image:
a. Check existence and supported format (`_is_supported_format`).
b. Resize and encode the image (`_resize_and_encode_image`).
c. Prepare API request payload (prompts + image data).
d. Call LLM via `litellm.completion`.
e. Extract and clean the suggested filename.
f. Construct the new file path.
g. If not dry-run, rename the file, handling potential name collisions (`rename_image`).
h. Log/print the outcome.
**Configuration Pattern:**
- Centralized YAML file (`config.yaml`) for user-configurable settings (LLM details, API keys, prompts, image processing parameters).
- Environment variables can override API keys/endpoints if not set in the config.

View File

@ -0,0 +1,40 @@
# Tech Context: PyNamer
**Language:** Python 3
**Core Libraries:**
- `litellm`: For interacting with various LLM APIs (OpenAI, Anthropic, etc.). Handles model routing, API key management, and standardized response format.
- `PyYAML`: For parsing the `config.yaml` configuration file.
- `Pillow`: For image manipulation (opening, resizing, saving to buffer).
- `argparse`: Standard library for parsing command-line arguments.
- `base64`: Standard library for encoding image data.
- `io`: Standard library for handling in-memory byte streams (used with Pillow).
- `os`, `pathlib`: Standard libraries for file system operations.
- `logging`: Standard library for application logging.
**Dependencies:**
- Listed in `requirements.txt`.
- Key dependencies: `litellm`, `pyyaml`, `Pillow`.
**Setup & Execution:**
1. **Installation:**
```bash
pip install -r requirements.txt
# or potentially: pip install . (if setup.py is configured correctly)
```
2. **Configuration:**
- Create or modify `config.yaml`.
- Set LLM `api_key` in the config or via environment variable (e.g., `OPENAI_API_KEY`).
- Adjust `model`, `max_tokens`, `temperature`, `resize_max_dimension`, `resize_format`, and `prompts` as needed.
3. **Execution:**
```bash
python pynamer.py <image_path_1> [image_path_2 ...] [-c config.yaml] [-d] [-v]
```
- `<image_path>`: Path to the image file(s). Handles paths with spaces.
- `-c`: Specify a different config file path.
- `-d`: Dry run (preview changes).
- `-v`: Verbose logging.
**Environment:**
- Assumes a standard Python environment where dependencies can be installed via pip.
- Relies on network access to reach the configured LLM API endpoint.

View File

@ -2,14 +2,17 @@
import argparse import argparse
import base64 import base64
import io
import os import os
import sys import sys
from pathlib import Path from pathlib import Path
import yaml import yaml
from typing import Dict, List, Optional, Union from typing import Dict, List, Optional, Union
import litellm import litellm
from litellm import completion from litellm import completion
import logging import logging
from PIL import Image # Added for image processing
# Configure logging # Configure logging
logging.basicConfig( logging.basicConfig(
@ -65,21 +68,54 @@ class PyNamer:
self.model = llm_config.get('model', 'gpt-4-vision-preview') self.model = llm_config.get('model', 'gpt-4-vision-preview')
self.max_tokens = llm_config.get('max_tokens', 100) self.max_tokens = llm_config.get('max_tokens', 100)
self.temperature = llm_config.get('temperature', 0.7) self.temperature = llm_config.get('temperature', 0.7)
# Image processing settings
image_config = self.config.get('image', {})
self.resize_max_dimension = image_config.get('resize_max_dimension', 1024) # Default max dimension
self.resize_format = image_config.get('resize_format', 'JPEG') # Default format after resize
logger.info(f"LLM setup complete. Using model: {self.model}") logger.info(f"LLM setup complete. Using model: {self.model}")
logger.info(f"Image resize settings: max_dimension={self.resize_max_dimension}, format={self.resize_format}")
def _encode_image(self, image_path: str) -> str:
"""Encode image to base64 for API submission. def _resize_and_encode_image(self, image_path: str) -> str:
"""Resize image if necessary and encode to base64 for API submission.
Args: Args:
image_path: Path to the image file image_path: Path to the image file
Returns: Returns:
Base64 encoded image string Base64 encoded image string
""" """
with open(image_path, "rb") as image_file: try:
return base64.b64encode(image_file.read()).decode('utf-8') with Image.open(image_path) as img:
# Calculate new size maintaining aspect ratio
width, height = img.size
if max(width, height) > self.resize_max_dimension:
if width > height:
new_width = self.resize_max_dimension
new_height = int(height * (self.resize_max_dimension / width))
else:
new_height = self.resize_max_dimension
new_width = int(width * (self.resize_max_dimension / height))
logger.debug(f"Resizing image from {width}x{height} to {new_width}x{new_height}")
img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)
else:
logger.debug("Image size is within limits, no resize needed.")
# Save resized image to a bytes buffer
buffer = io.BytesIO()
# Handle potential transparency issues when saving as JPEG
if self.resize_format.upper() == 'JPEG' and img.mode in ('RGBA', 'P'):
img = img.convert('RGB')
img.save(buffer, format=self.resize_format)
img_bytes = buffer.getvalue()
return base64.b64encode(img_bytes).decode('utf-8')
except Exception as e:
logger.error(f"Error processing image {image_path}: {e}")
raise # Re-raise the exception to be caught by the caller
def _is_supported_format(self, file_path: str) -> bool: def _is_supported_format(self, file_path: str) -> bool:
"""Check if the file format is supported. """Check if the file format is supported.
@ -111,9 +147,12 @@ class PyNamer:
return None return None
try: try:
# Encode image # Resize and encode image
base64_image = self._encode_image(image_path) base64_image = self._resize_and_encode_image(image_path)
# Determine the mime type based on the resize format
mime_type = f"image/{self.resize_format.lower()}"
# Prepare messages for LLM # Prepare messages for LLM
system_message = self.config.get('prompt', {}).get('system_message', '') system_message = self.config.get('prompt', {}).get('system_message', '')
user_message = self.config.get('prompt', {}).get('user_message', '') user_message = self.config.get('prompt', {}).get('user_message', '')
@ -126,7 +165,7 @@ class PyNamer:
{"type": "text", "text": user_message}, {"type": "text", "text": user_message},
{ {
"type": "image_url", "type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"} "image_url": {"url": f"data:{mime_type};base64,{base64_image}"}
} }
] ]
} }
@ -271,4 +310,4 @@ def main():
print(f"Failed to process: {image_path}") print(f"Failed to process: {image_path}")
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@ -1,2 +1,3 @@
litellm>=1.10.0 litellm>=1.10.0
pyyaml>=6.0 pyyaml>=6.0
Pillow>=9.0.0 # Added for image resizing

View File

@ -30,6 +30,7 @@ setup(
install_requires=[ install_requires=[
"litellm>=1.10.0", "litellm>=1.10.0",
"pyyaml>=6.0", "pyyaml>=6.0",
"Pillow>=9.0.0",
], ],
python_requires=">=3.7", python_requires=">=3.7",
entry_points={ entry_points={

View File

@ -1,138 +0,0 @@
Metadata-Version: 2.1
Name: pynamer
Version: 0.1.0
Summary: Generate descriptive filenames for images using LLMs
Home-page: https://github.com/yourusername/pynamer
Author: Your Name
Author-email: your.email@example.com
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: litellm>=1.10.0
Requires-Dist: pyyaml>=6.0
# PyNamer
PyNamer is a command-line tool that uses AI vision models to generate descriptive filenames for images. It analyzes the content of images and renames them with meaningful, descriptive filenames in snake_case format.
## Features
- Uses LiteLLM to integrate with various vision-capable LLMs (default: GPT-4 Vision)
- Configurable via YAML config file
- Supports multiple image formats (jpg, jpeg, png, gif, webp)
- Dry-run mode to preview changes without renaming files
- Handles filename collisions automatically
## Installation
### Option 1: Install from PyPI (recommended)
```bash
pip install pynamer
```
### Option 2: Install from source
1. Clone this repository
2. Install the package in development mode:
```bash
pip install -e .
```
### Set up your API key
You need to set up your API key for the vision model:
- Set the appropriate environment variable (e.g., `OPENAI_API_KEY`), or
- Create a custom config file with your API key
## Configuration
PyNamer comes with a default configuration, but you can create a custom config file to customize:
- LLM provider and model
- API key and endpoint
- Supported image formats
- Prompt templates for filename generation
Example custom config file (config.yaml):
```yaml
llm:
provider: "openai"
model: "gpt-4-vision-preview"
api_key: "your-api-key-here"
max_tokens: 100
temperature: 0.7
```
## Usage
After installation, you can use PyNamer directly from the command line:
Basic usage:
```bash
pynamer path/to/image.jpg
```
Process multiple images:
```bash
pynamer image1.jpg image2.png image3.jpg
```
Use a different config file:
```bash
pynamer -c custom_config.yaml image.jpg
```
Preview changes without renaming (dry run):
```bash
pynamer -d image.jpg
```
Enable verbose logging:
```bash
pynamer -v image.jpg
```
## Example
Input: `IMG_20230615_123456.jpg` (a photo of a cat sleeping on a window sill)
Output: `orange_cat_sleeping_on_sunny_windowsill.jpg`
## Development
### Building the package
```bash
pip install build
python -m build
```
### Installing in development mode
```bash
pip install -e .
```
## Requirements
- Python 3.7+
- LiteLLM
- PyYAML
- Access to a vision-capable LLM API (OpenAI, Anthropic, etc.)

View File

@ -1,15 +0,0 @@
LICENSE
MANIFEST.in
README.md
pyproject.toml
setup.py
src/pynamer/__init__.py
src/pynamer/cli.py
src/pynamer/config.yaml
src/pynamer/core.py
src/pynamer.egg-info/PKG-INFO
src/pynamer.egg-info/SOURCES.txt
src/pynamer.egg-info/dependency_links.txt
src/pynamer.egg-info/entry_points.txt
src/pynamer.egg-info/requires.txt
src/pynamer.egg-info/top_level.txt

View File

@ -1,2 +0,0 @@
litellm>=1.10.0
pyyaml>=6.0

View File

@ -1,3 +1,3 @@
"""PyNamer - Generate descriptive filenames for images using LLMs.""" """PyNamer - Generate descriptive filenames for images using LLMs."""
__version__ = "0.1.0" __version__ = "0.2.0"

View File

@ -1,4 +1,4 @@
"""Core functionality for PyNamer.""" "#""Core functionality for PyNamer."""
import argparse import argparse
import base64 import base64