26 lines
1.6 KiB
Markdown
26 lines
1.6 KiB
Markdown
# Active Context: PyNamer - Image Resizing Implementation
|
|
|
|
**Current Focus:** Implementing image resizing functionality to normalize image dimensions before sending them to the LLM.
|
|
|
|
**Decisions Made:**
|
|
- Use the `Pillow` library for image manipulation due to its robustness and ease of use in Python.
|
|
- Add `Pillow` to `requirements.txt`.
|
|
- Introduce configuration options in `config.yaml` under the `image` section:
|
|
- `resize_max_dimension`: Controls the maximum size (width or height) of the image sent to the LLM. Defaults to 1024.
|
|
- `resize_format`: Specifies the image format (e.g., 'JPEG', 'PNG') to use after resizing. Defaults to 'JPEG'.
|
|
- Modify the `_encode_image` method (renamed to `_resize_and_encode_image`) to perform resizing:
|
|
- Open the image using `PIL.Image.open()`.
|
|
- Check if the image's largest dimension exceeds `resize_max_dimension`.
|
|
- If it exceeds, calculate new dimensions maintaining aspect ratio and resize using `img.resize()` with `Image.Resampling.LANCZOS`.
|
|
- Save the (potentially resized) image to an in-memory buffer (`io.BytesIO`) using the configured `resize_format`.
|
|
- Handle potential transparency issues when saving formats like JPEG by converting the image mode to 'RGB' if necessary.
|
|
- Base64 encode the bytes from the buffer.
|
|
- Update the `generate_filename` method to call `_resize_and_encode_image` instead of `_encode_image`.
|
|
- Update the `generate_filename` method to dynamically set the `mime_type` in the LLM request based on `resize_format`.
|
|
- Load the new configuration options in `_setup_llm`.
|
|
|
|
**Next Steps:**
|
|
- Create `memory-bank/progress.md`.
|
|
- Create `.clinerules`.
|
|
- Final review and testing.
|