ira/.note/decision_log.md

# Decision Log

## 2025-02-27: Initial Project Setup

### Decision: Use Jina AI APIs for Semantic Search
- **Context**: Need for semantic search capabilities that understand context beyond keywords
- **Options Considered**:
  1. Build custom embedding solution
  2. Use open-source models locally
  3. Use Jina AI's APIs
- **Decision**: Use Jina AI's APIs for embedding generation and similarity computation
- **Rationale**:
  - High-quality embeddings with state-of-the-art models
  - No need to manage model deployment and infrastructure
  - Simple API integration with reasonable pricing
  - Support for long texts through segmentation

### Decision: Separate Markdown Segmentation from Similarity Computation
- **Context**: Need to handle potentially long markdown documents
- **Options Considered**:
  1. Integrate segmentation directly into the similarity module
  2. Create a separate module for segmentation
- **Decision**: Create a separate module (markdown_segmenter.py) for document segmentation
- **Rationale**:
  - Better separation of concerns
  - More modular design allows for independent use of components
  - Easier to maintain and extend each component separately

### Decision: Use Environment Variables for API Keys
- **Context**: Need to securely manage API credentials
- **Options Considered**:
  1. Configuration files
  2. Environment variables
  3. Secret management service
- **Decision**: Use environment variables (JINA_API_KEY)
- **Rationale**:
  - Simple to implement
  - Standard practice for managing secrets
  - Works well across different environments
  - Prevents accidental commit of credentials to version control

### Decision: Use Cosine Similarity with Normalized Vectors
- **Context**: Need a metric for comparing semantic similarity between text embeddings
- **Options Considered**:
  1. Euclidean distance
  2. Cosine similarity
  3. Dot product
- **Decision**: Use cosine similarity with normalized vectors
- **Rationale**:
  - Standard approach for semantic similarity
  - Normalized vectors simplify computation (dot product equals cosine similarity)
  - Less sensitive to embedding magnitude, focusing on direction (meaning)

## 2025-02-27: Research System Architecture

### Decision: Implement a Multi-Stage Research Pipeline
- **Context**: Need to define the overall architecture for the intelligent research system
- **Options Considered**:
  1. Monolithic application with tightly coupled components
  2. Microservices architecture with independent services
  3. Pipeline architecture with distinct processing stages
- **Decision**: Implement an 8-stage pipeline architecture
- **Rationale**:
  - Clear separation of concerns with each stage having a specific responsibility
  - Easier to develop and test individual components
  - Flexibility to swap or enhance specific stages without affecting others
  - Natural flow of data through the system matches the research process

### Decision: Use Multiple Search Sources
- **Context**: Need to gather comprehensive information from various sources
- **Options Considered**:
  1. Use a single search API for simplicity
  2. Implement custom web scraping for all sources
  3. Use multiple specialized search APIs
- **Decision**: Integrate multiple search sources (Google, Serper, Jina Search, Google Scholar, arXiv)
- **Rationale**:
  - Different sources provide different types of information (academic, general, etc.)
  - Increases the breadth and diversity of search results
  - Specialized APIs like arXiv provide domain-specific information
  - Redundancy ensures more comprehensive coverage

### Decision: Use Jina AI for Semantic Processing
- **Context**: Need for advanced semantic understanding in document processing
- **Options Considered**:
  1. Use simple keyword matching
  2. Implement custom embedding models
  3. Use Jina AI's suite of APIs
- **Decision**: Use Jina AI's APIs for embedding generation, similarity computation, and reranking
- **Rationale**:
  - High-quality embeddings with state-of-the-art models
  - Comprehensive API suite covering multiple needs (embeddings, segmentation, reranking)
  - Simple integration with reasonable pricing
  - Consistent approach across different semantic processing tasks

## 2025-02-27: Search Execution Architecture

### Decision: Search Execution Architecture
- **Context**: We needed to implement a search execution module that could execute search queries across multiple search engines and process the results in a standardized way.

- **Decision**:
  1. Create a modular search execution architecture:
    - Implement a base handler interface (`BaseSearchHandler`) for all search API handlers
    - Create specific handlers for each search engine (Google, Serper, Scholar, arXiv)
    - Develop a central `SearchExecutor` class to manage execution across multiple engines
    - Implement a `ResultCollector` class for processing and organizing results

  2. Use parallel execution for search queries:
    - Implement thread-based parallelism using `concurrent.futures`
    - Add support for both synchronous and asynchronous execution
    - Include timeout management and error handling

  3. Standardize search results:
    - Define a common result format across all search engines
    - Include metadata specific to each search engine in a standardized way
    - Implement deduplication and scoring for result ranking

- **Rationale**:
  - A modular architecture allows for easy addition of new search engines
  - Parallel execution significantly improves search performance
  - Standardized result format simplifies downstream processing
  - Separation of concerns between execution and result processing

- **Alternatives Considered**:
  1. Sequential execution of search queries:
    - Simpler implementation
    - Much slower performance
    - Would not scale well with additional search engines

  2. Separate modules for each search engine:
    - Would lead to code duplication
    - More difficult to maintain
    - Less consistent result format

  3. Using a third-party search aggregation service:
    - Would introduce additional dependencies
    - Less control over the search process
    - Potential cost implications

- **Impact**:
  - Efficient execution of search queries across multiple engines
  - Consistent result format for downstream processing
  - Flexible architecture that can be extended with new search engines
  - Clear separation of concerns between different components

## 2025-02-27: Search Execution Module Refinements

### Decision: Remove Google Search Handler
- **Context**: Both Google and Serper handlers were implemented, but Serper is essentially a front-end for Google search
- **Options Considered**:
  1. Keep both handlers for redundancy
  2. Remove the Google handler and only use Serper
- **Decision**: Remove the Google search handler
- **Rationale**:
  - Redundant functionality as Serper provides the same results
  - Simplifies the codebase and reduces maintenance
  - Reduces API costs by avoiding duplicate searches
  - Serper provides a more reliable and consistent API for Google search

### Decision: Modify LLM Query Enhancement Prompt
- **Context**: The LLM was returning enhanced queries with explanations, which caused issues with search APIs
- **Options Considered**:
  1. Post-process the LLM output to extract just the query
  2. Modify the prompt to request only the enhanced query
- **Decision**: Modify the LLM prompt to request only the enhanced query without explanations
- **Rationale**:
  - More reliable than post-processing, which could be error-prone
  - Cleaner implementation that addresses the root cause
  - Ensures consistent output format for downstream processing
  - Reduces the risk of exceeding API character limits

### Decision: Implement Query Truncation
- **Context**: Enhanced queries could exceed the Serper API's 2048 character limit
- **Options Considered**:
  1. Limit the LLM's output length
  2. Truncate queries before sending to the API
  3. Split long queries into multiple searches
- **Decision**: Implement query truncation in the search executor
- **Rationale**:
  - Simple and effective solution
  - Preserves as much of the enhanced query as possible
  - Ensures API requests don't fail due to length constraints
  - Can be easily adjusted if API limits change

## 2025-02-27: Testing Strategy for Query Processor

### Context
After integrating Groq and OpenRouter as additional LLM providers, we needed to verify that the query processor module functions correctly with these new providers.

### Decision
1. Create dedicated test scripts to validate the query processor functionality:
   - A basic test script for the core processing pipeline
   - A comprehensive test script for detailed component testing

2. Use monkey patching to ensure tests consistently use the Groq model:
   - Create a global LLM interface with the Groq model
   - Override the `get_llm_interface` function to always return this interface
   - This approach allows testing without modifying the core code

3. Test all key functionality of the query processor:
   - Query enhancement
   - Query classification
   - Search query generation
   - End-to-end processing pipeline

### Rationale
- Dedicated test scripts provide a repeatable way to verify functionality
- Monkey patching allows testing with specific models without changing the core code
- Comprehensive testing ensures all components work correctly with the new providers
- Saving test results to a JSON file provides a reference for future development

### Alternatives Considered
1. Modifying the query processor to accept a model parameter:
   - Would require changing the core code
   - Could introduce bugs in the production code

2. Using environment variables to control model selection:
   - Less precise control over which model is used
   - Could interfere with other tests or production use

### Impact
- Verified that the query processor works correctly with Groq models
- Established a testing approach that can be used for other modules
- Created reusable test scripts for future development