pynamer/memory-bank/activeContext.md

1.6 KiB

Active Context: PyNamer - Image Resizing Implementation

Current Focus: Implementing image resizing functionality to normalize image dimensions before sending them to the LLM.

Decisions Made:

  • Use the Pillow library for image manipulation due to its robustness and ease of use in Python.
  • Add Pillow to requirements.txt.
  • Introduce configuration options in config.yaml under the image section:
    • resize_max_dimension: Controls the maximum size (width or height) of the image sent to the LLM. Defaults to 1024.
    • resize_format: Specifies the image format (e.g., 'JPEG', 'PNG') to use after resizing. Defaults to 'JPEG'.
  • Modify the _encode_image method (renamed to _resize_and_encode_image) to perform resizing:
    • Open the image using PIL.Image.open().
    • Check if the image's largest dimension exceeds resize_max_dimension.
    • If it exceeds, calculate new dimensions maintaining aspect ratio and resize using img.resize() with Image.Resampling.LANCZOS.
    • Save the (potentially resized) image to an in-memory buffer (io.BytesIO) using the configured resize_format.
    • Handle potential transparency issues when saving formats like JPEG by converting the image mode to 'RGB' if necessary.
    • Base64 encode the bytes from the buffer.
  • Update the generate_filename method to call _resize_and_encode_image instead of _encode_image.
  • Update the generate_filename method to dynamically set the mime_type in the LLM request based on resize_format.
  • Load the new configuration options in _setup_llm.

Next Steps:

  • Create memory-bank/progress.md.
  • Create .clinerules.
  • Final review and testing.