ira/.note/decision_log.md

224 lines
9.9 KiB
Markdown

# Decision Log
## 2025-02-27: Initial Project Setup
### Decision: Use Jina AI APIs for Semantic Search
- **Context**: Need for semantic search capabilities that understand context beyond keywords
- **Options Considered**:
1. Build custom embedding solution
2. Use open-source models locally
3. Use Jina AI's APIs
- **Decision**: Use Jina AI's APIs for embedding generation and similarity computation
- **Rationale**:
- High-quality embeddings with state-of-the-art models
- No need to manage model deployment and infrastructure
- Simple API integration with reasonable pricing
- Support for long texts through segmentation
### Decision: Separate Markdown Segmentation from Similarity Computation
- **Context**: Need to handle potentially long markdown documents
- **Options Considered**:
1. Integrate segmentation directly into the similarity module
2. Create a separate module for segmentation
- **Decision**: Create a separate module (markdown_segmenter.py) for document segmentation
- **Rationale**:
- Better separation of concerns
- More modular design allows for independent use of components
- Easier to maintain and extend each component separately
### Decision: Use Environment Variables for API Keys
- **Context**: Need to securely manage API credentials
- **Options Considered**:
1. Configuration files
2. Environment variables
3. Secret management service
- **Decision**: Use environment variables (JINA_API_KEY)
- **Rationale**:
- Simple to implement
- Standard practice for managing secrets
- Works well across different environments
- Prevents accidental commit of credentials to version control
### Decision: Use Cosine Similarity with Normalized Vectors
- **Context**: Need a metric for comparing semantic similarity between text embeddings
- **Options Considered**:
1. Euclidean distance
2. Cosine similarity
3. Dot product
- **Decision**: Use cosine similarity with normalized vectors
- **Rationale**:
- Standard approach for semantic similarity
- Normalized vectors simplify computation (dot product equals cosine similarity)
- Less sensitive to embedding magnitude, focusing on direction (meaning)
## 2025-02-27: Research System Architecture
### Decision: Implement a Multi-Stage Research Pipeline
- **Context**: Need to define the overall architecture for the intelligent research system
- **Options Considered**:
1. Monolithic application with tightly coupled components
2. Microservices architecture with independent services
3. Pipeline architecture with distinct processing stages
- **Decision**: Implement an 8-stage pipeline architecture
- **Rationale**:
- Clear separation of concerns with each stage having a specific responsibility
- Easier to develop and test individual components
- Flexibility to swap or enhance specific stages without affecting others
- Natural flow of data through the system matches the research process
### Decision: Use Multiple Search Sources
- **Context**: Need to gather comprehensive information from various sources
- **Options Considered**:
1. Use a single search API for simplicity
2. Implement custom web scraping for all sources
3. Use multiple specialized search APIs
- **Decision**: Integrate multiple search sources (Google, Serper, Jina Search, Google Scholar, arXiv)
- **Rationale**:
- Different sources provide different types of information (academic, general, etc.)
- Increases the breadth and diversity of search results
- Specialized APIs like arXiv provide domain-specific information
- Redundancy ensures more comprehensive coverage
### Decision: Use Jina AI for Semantic Processing
- **Context**: Need for advanced semantic understanding in document processing
- **Options Considered**:
1. Use simple keyword matching
2. Implement custom embedding models
3. Use Jina AI's suite of APIs
- **Decision**: Use Jina AI's APIs for embedding generation, similarity computation, and reranking
- **Rationale**:
- High-quality embeddings with state-of-the-art models
- Comprehensive API suite covering multiple needs (embeddings, segmentation, reranking)
- Simple integration with reasonable pricing
- Consistent approach across different semantic processing tasks
## 2025-02-27: Search Execution Architecture
### Decision: Search Execution Architecture
- **Context**: We needed to implement a search execution module that could execute search queries across multiple search engines and process the results in a standardized way.
- **Decision**:
1. Create a modular search execution architecture:
- Implement a base handler interface (`BaseSearchHandler`) for all search API handlers
- Create specific handlers for each search engine (Google, Serper, Scholar, arXiv)
- Develop a central `SearchExecutor` class to manage execution across multiple engines
- Implement a `ResultCollector` class for processing and organizing results
2. Use parallel execution for search queries:
- Implement thread-based parallelism using `concurrent.futures`
- Add support for both synchronous and asynchronous execution
- Include timeout management and error handling
3. Standardize search results:
- Define a common result format across all search engines
- Include metadata specific to each search engine in a standardized way
- Implement deduplication and scoring for result ranking
- **Rationale**:
- A modular architecture allows for easy addition of new search engines
- Parallel execution significantly improves search performance
- Standardized result format simplifies downstream processing
- Separation of concerns between execution and result processing
- **Alternatives Considered**:
1. Sequential execution of search queries:
- Simpler implementation
- Much slower performance
- Would not scale well with additional search engines
2. Separate modules for each search engine:
- Would lead to code duplication
- More difficult to maintain
- Less consistent result format
3. Using a third-party search aggregation service:
- Would introduce additional dependencies
- Less control over the search process
- Potential cost implications
- **Impact**:
- Efficient execution of search queries across multiple engines
- Consistent result format for downstream processing
- Flexible architecture that can be extended with new search engines
- Clear separation of concerns between different components
## 2025-02-27: Search Execution Module Refinements
### Decision: Remove Google Search Handler
- **Context**: Both Google and Serper handlers were implemented, but Serper is essentially a front-end for Google search
- **Options Considered**:
1. Keep both handlers for redundancy
2. Remove the Google handler and only use Serper
- **Decision**: Remove the Google search handler
- **Rationale**:
- Redundant functionality as Serper provides the same results
- Simplifies the codebase and reduces maintenance
- Reduces API costs by avoiding duplicate searches
- Serper provides a more reliable and consistent API for Google search
### Decision: Modify LLM Query Enhancement Prompt
- **Context**: The LLM was returning enhanced queries with explanations, which caused issues with search APIs
- **Options Considered**:
1. Post-process the LLM output to extract just the query
2. Modify the prompt to request only the enhanced query
- **Decision**: Modify the LLM prompt to request only the enhanced query without explanations
- **Rationale**:
- More reliable than post-processing, which could be error-prone
- Cleaner implementation that addresses the root cause
- Ensures consistent output format for downstream processing
- Reduces the risk of exceeding API character limits
### Decision: Implement Query Truncation
- **Context**: Enhanced queries could exceed the Serper API's 2048 character limit
- **Options Considered**:
1. Limit the LLM's output length
2. Truncate queries before sending to the API
3. Split long queries into multiple searches
- **Decision**: Implement query truncation in the search executor
- **Rationale**:
- Simple and effective solution
- Preserves as much of the enhanced query as possible
- Ensures API requests don't fail due to length constraints
- Can be easily adjusted if API limits change
## 2025-02-27: Testing Strategy for Query Processor
### Context
After integrating Groq and OpenRouter as additional LLM providers, we needed to verify that the query processor module functions correctly with these new providers.
### Decision
1. Create dedicated test scripts to validate the query processor functionality:
- A basic test script for the core processing pipeline
- A comprehensive test script for detailed component testing
2. Use monkey patching to ensure tests consistently use the Groq model:
- Create a global LLM interface with the Groq model
- Override the `get_llm_interface` function to always return this interface
- This approach allows testing without modifying the core code
3. Test all key functionality of the query processor:
- Query enhancement
- Query classification
- Search query generation
- End-to-end processing pipeline
### Rationale
- Dedicated test scripts provide a repeatable way to verify functionality
- Monkey patching allows testing with specific models without changing the core code
- Comprehensive testing ensures all components work correctly with the new providers
- Saving test results to a JSON file provides a reference for future development
### Alternatives Considered
1. Modifying the query processor to accept a model parameter:
- Would require changing the core code
- Could introduce bugs in the production code
2. Using environment variables to control model selection:
- Less precise control over which model is used
- Could interfere with other tests or production use
### Impact
- Verified that the query processor works correctly with Groq models
- Established a testing approach that can be used for other modules
- Created reusable test scripts for future development