pynamer/memory-bank/activeContext.md

26 lines
1.6 KiB
Markdown

# Active Context: PyNamer - Image Resizing Implementation
**Current Focus:** Implementing image resizing functionality to normalize image dimensions before sending them to the LLM.
**Decisions Made:**
- Use the `Pillow` library for image manipulation due to its robustness and ease of use in Python.
- Add `Pillow` to `requirements.txt`.
- Introduce configuration options in `config.yaml` under the `image` section:
- `resize_max_dimension`: Controls the maximum size (width or height) of the image sent to the LLM. Defaults to 1024.
- `resize_format`: Specifies the image format (e.g., 'JPEG', 'PNG') to use after resizing. Defaults to 'JPEG'.
- Modify the `_encode_image` method (renamed to `_resize_and_encode_image`) to perform resizing:
- Open the image using `PIL.Image.open()`.
- Check if the image's largest dimension exceeds `resize_max_dimension`.
- If it exceeds, calculate new dimensions maintaining aspect ratio and resize using `img.resize()` with `Image.Resampling.LANCZOS`.
- Save the (potentially resized) image to an in-memory buffer (`io.BytesIO`) using the configured `resize_format`.
- Handle potential transparency issues when saving formats like JPEG by converting the image mode to 'RGB' if necessary.
- Base64 encode the bytes from the buffer.
- Update the `generate_filename` method to call `_resize_and_encode_image` instead of `_encode_image`.
- Update the `generate_filename` method to dynamically set the `mime_type` in the LLM request based on `resize_format`.
- Load the new configuration options in `_setup_llm`.
**Next Steps:**
- Create `memory-bank/progress.md`.
- Create `.clinerules`.
- Final review and testing.