massive changes

2025-03-14 16:14:09 -05:00 · 2025-03-14 16:14:09 -05:00 · 12b453a14f
parent b6b50e4ef8
commit 12b453a14f
21 changed files with 1195 additions and 413 deletions
--- a/.clinerules
+++ b/.clinerules
@ -0,0 +1,5 @@
 Review the contensts of .note/ before modifying any files.
 After each major successful test, please commit the changes to the repository with a meaningful commit message.
 Update the contents of .note/ after each major change.
--- a/.gitignore
+++ b/.gitignore
@ -51,3 +51,4 @@ logs/
 # Database files
 *.db
 report/database/*.db
 config/config.yaml
--- a/.note/session_log.md
+++ b/.note/session_log.md
@ -583,281 +583,103 @@ In this session, we fixed issues in the Gradio UI for report generation and plan
 3. Test the current implementation with various query types to identify any remaining issues
 4. Update the documentation to reflect the new features and future plans
-## Session: 2025-02-28: Google Gemini Integration and Reference Formatting
+## Session: 2025-03-12 - Query Type Selection in Gradio UI
 ### Overview
-Fixed the integration of Google Gemini models with LiteLLM, and fixed reference formatting issues.
+In this session, we enhanced the Gradio UI by adding a query type selection dropdown, allowing users to explicitly select the query type (factual, exploratory, comparative) instead of relying on automatic detection.
 ### Key Activities
-1. **Fixed Google Gemini Integration**:
+1. **Added Query Type Selection to Gradio UI**:
-   - Updated the model format to `gemini/gemini-2.0-flash` in config.yaml
+   - Added a dropdown menu for query type selection in the "Generate Report" tab
-   - Modified message formatting for Gemini models in LLM interface
+   - Included options for "auto-detect", "factual", "exploratory", and "comparative"
-   - Added proper handling for the 'gemini' provider in environment variable setup
+   - Added descriptive tooltips explaining each query type
   - Set "auto-detect" as the default option
-2. **Fixed Reference Formatting Issues**:
+2. **Updated Report Generation Logic**:
-   - Enhanced the instructions for reference formatting to ensure URLs are included
+   - Modified the `generate_report` method in the `GradioInterface` class to handle the new query_type parameter
-   - Added a recovery mechanism for truncated references
+   - Updated the report button click handler to pass the query type to the generate_report method
-   - Improved context preparation to better extract URLs for references
+   - Added logging to show when a user-selected query type is being used
-3. **Converted LLM Interface Methods to Async**:
+3. **Enhanced Report Generator**:
-   - Made `generate_completion`, `classify_query`, and `enhance_query` methods async
+   - Updated the `generate_report` method in the `ReportGenerator` class to accept a query_type parameter
-   - Updated dependent code to properly await these methods
+   - Modified the report synthesizer calls to pass the query_type parameter
-   - Fixed runtime errors related to async/await patterns
+   - Added logging to track query type usage
-### Key Insights
+4. **Added Documentation**:
- Gemini models require special message formatting (using 'user' and 'model' roles instead of 'system' and 'assistant')
+   - Added a "Query Types" section to the Gradio UI explaining each query type
- References were getting cut off due to token limits, requiring a separate generation step
+   - Included examples of when to use each query type
- The async conversion was necessary to properly handle async LLM calls throughout the codebase
+   - Updated code comments to explain the query type parameter
 ### Insights
 - Explicit query type selection gives users more control over the report generation process
 - Different query types benefit from specialized report templates and structures
 - The auto-detect option provides convenience while still allowing manual override
 - Clear documentation helps users understand when to use each query type
 ### Challenges
 - Ensuring that the templates produce appropriate output for each detail level
 - Balancing between speed and quality for different detail levels
 - Managing token budgets effectively across different detail levels
 - Ensuring backward compatibility with existing code
 - Maintaining the auto-detect functionality while adding manual selection
 - Passing the query type parameter through multiple layers of the application
 - Providing clear explanations of query types for users
 ### Next Steps
-1. Continue testing with Gemini models to ensure stable operation
+1. Test the query type selection with various queries to ensure it works correctly
-2. Consider adding more robust error handling for LLM provider-specific issues
+2. Gather user feedback on the usefulness of manual query type selection
-3. Improve the reference formatting further if needed
+3. Consider adding more specialized templates for specific query types
 4. Explore adding query type detection confidence scores to help users decide when to override
 5. Add examples of each query type to help users understand the differences
-## Session: 2025-02-28: Fixing Reference Formatting and Async Implementation
+## Session: 2025-03-12 - Fixed Query Type Parameter Bug
 ### Overview
-Fixed reference formatting issues with Gemini models and updated the codebase to properly handle async methods.
+Fixed a bug in the report generation process where the `query_type` parameter was not properly handled, causing an error when it was `None`.
 ### Key Activities
-1. **Enhanced Reference Formatting**:
+1. **Fixed NoneType Error in Report Synthesis**:
-   - Improved instructions to emphasize including URLs for each reference
+   - Added a null check in the `_get_extraction_prompt` method in `report_synthesis.py`
-   - Added duplicate URL fields in the context to ensure URLs are captured
+   - Modified the condition that checks for comparative queries to handle the case where `query_type` is `None`
-   - Updated the reference generation prompt to explicitly request URLs
+   - Ensured the method works correctly regardless of whether a query type is explicitly provided
   - Added a separate reference generation step to handle truncated references
-2. **Fixed Async Implementation**:
+2. **Root Cause Analysis**:
-   - Converted all LLM interface methods to async for proper handling
+   - Identified that the error occurred when the `query_type` parameter was `None` and the code tried to call `.lower()` on it
-   - Updated QueryProcessor's generate_search_queries method to be async
+   - Traced the issue through the call chain from the UI to the report generator to the report synthesizer
-   - Modified query_to_report.py to correctly await async methods
+   - Confirmed that the fix addresses the specific error message: `'NoneType' object has no attribute 'lower'`
   - Fixed runtime errors related to async/await patterns
 3. **Updated Gradio Interface**:
   - Modified the generate_report method to properly handle async operations
   - Updated the report button click handler to correctly pass parameters
   - Fixed the parameter order in the lambda function for async execution
   - Improved error handling in the UI
 ## Session: 2025-03-11
 ### Overview
 Reorganized the project directory structure to improve maintainability and clarity, ensuring all components are properly organized into their respective directories.
 ### Key Activities
 1. **Directory Structure Reorganization**:
   - Created a dedicated `utils/` directory for utility scripts
     - Moved `jina_similarity.py` to `utils/`
     - Added `__init__.py` to make it a proper Python package
   - Organized test files into subdirectories under `tests/`
     - Created subdirectories for each module (query, execution, ranking, report, ui, integration)
     - Added `__init__.py` files to all test directories
   - Created an `examples/` directory with subdirectories for data and scripts
     - Moved sample data to `examples/data/`
     - Added `__init__.py` files to make them proper Python packages
   - Added a dedicated `scripts/` directory for utility scripts
     - Moved `query_to_report.py` to `scripts/`
 2. **Pipeline Verification**:
   - Tested the pipeline after reorganization to ensure functionality
   - Verified that the UI works correctly with the new directory structure
   - Confirmed that all imports are working properly with the new structure
 3. **Embedding Usage Analysis**:
   - Confirmed that the pipeline uses Jina AI's Embeddings API through the `JinaSimilarity` class
   - Verified that the `JinaReranker` class uses embeddings for document reranking
   - Analyzed how embeddings are integrated into the search and ranking process
 ### Insights
-
+- Proper null checking is essential when working with optional parameters that are passed through multiple layers
- A well-organized directory structure significantly improves code maintainability and readability
+- The error occurred in the report synthesis module but was triggered by the UI's query type selection feature
- Using proper Python package structure with `__init__.py` files ensures clean imports
+- The fix maintains backward compatibility while ensuring the new query type selection feature works correctly
 - Separating tests, utilities, examples, and scripts into dedicated directories makes the codebase more navigable
 - The Jina AI embeddings are used throughout the pipeline for semantic similarity and document reranking
 ### Challenges
 - Ensuring all import statements are updated correctly after moving files
 - Maintaining backward compatibility with existing code
 - Verifying that all components still work together after reorganization
 ### Next Steps
 1. Test the fix with various query types to ensure it works correctly
 2. Consider adding similar null checks in other parts of the code that handle the query_type parameter
 3. Add more comprehensive error handling throughout the report generation process
 4. Update the test suite to include tests for null query_type values
-1. Run comprehensive tests to ensure all functionality works with the new directory structure
+## Session: 2025-03-12 - Fixed Template Retrieval for Null Query Type
 2. Update any remaining documentation to reflect the new directory structure
 3. Consider moving the remaining test files in the root of the `tests/` directory to appropriate subdirectories
 4. Review import statements throughout the codebase to ensure they follow the new structure
 ### Key Insights
 - Async/await patterns need to be consistently applied throughout the codebase
 - Reference formatting requires explicit instructions to include URLs
 - Gradio's interface needs special handling for async functions
 ### Challenges
 - Ensuring that all async methods are properly awaited
 - Balancing between detailed instructions and token limits for reference generation
 - Managing the increased processing time for async operations
 ### Next Steps
 1. Continue testing with Gemini models to ensure stable operation
 2. Consider adding more robust error handling for LLM provider-specific issues
 3. Improve the reference formatting further if needed
 4. Update documentation to reflect the changes made to the LLM interface
 5. Consider adding more unit tests for the async methods
 ## Session: 2025-02-28: Fixed NoneType Error in Report Synthesis
 ### Issue
 Encountered an error during report generation:
 ```
 TypeError: 'NoneType' object is not subscriptable
 ```
 The error occurred in the `map_document_chunks` method of the `ReportSynthesizer` class when trying to slice a title that was `None`.
 ### Changes Made
 1. Fixed the chunk counter in `map_document_chunks` method:
   - Used a separate counter for individual chunks instead of using the batch index
   - Added a null check for chunk titles with a fallback to 'Untitled'
 2. Added defensive code in `synthesize_report` method:
   - Added code to ensure all chunks have a title before processing
   - Added null checks for title fields
 3. Updated the `DocumentProcessor` class:
   - Modified `process_documents_for_report` to ensure all chunks have a title
   - Updated `chunk_document_by_sections`, `chunk_document_fixed_size`, and `chunk_document_hierarchical` methods to handle None titles
   - Added default 'Untitled' value for all title fields
 ### Testing
 The changes were tested with a report generation task that previously failed, and the error was resolved.
 ### Next Steps
 1. Consider adding more comprehensive null checks throughout the codebase
 2. Add unit tests to verify proper handling of missing or null fields
 3. Implement better error handling and recovery mechanisms
 ## Session: 2025-03-11
 ### Overview
-Focused on resolving issues with the report generation template system and ensuring that different detail levels and query types work correctly in the report synthesis process.
+Fixed a second issue in the report generation process where the template retrieval was failing when the `query_type` parameter was `None`.
 ### Key Activities
-1. **Fixed Template Retrieval Issues**:
+1. **Fixed Template Retrieval for Null Query Type**:
-   - Updated the `get_template` method in the `ReportTemplateManager` to ensure it retrieves templates correctly based on query type and detail level
+   - Updated the `_get_template_from_strings` method in `report_synthesis.py` to handle `None` query_type
-   - Implemented a helper method `_get_template_from_strings` in the `ReportSynthesizer` to convert string values for query types and detail levels to their respective enum objects
+   - Added a default value of "exploratory" when query_type is `None`
-   - Added better logging for template retrieval process to aid in debugging
+   - Modified the method signature to explicitly indicate that query_type_str can be `None`
   - Added logging to indicate when the default query type is being used
-2. **Tested All Detail Levels and Query Types**:
+2. **Root Cause Analysis**:
-   - Created a comprehensive test script `test_all_detail_levels.py` to test all combinations of detail levels and query types
+   - Identified that the error occurred when trying to convert `None` to a `QueryType` enum value
-   - Successfully tested all detail levels (brief, standard, detailed, comprehensive) with factual queries
+   - The error message was: "No template found for None standard" and "None is not a valid QueryType"
-   - Successfully tested all detail levels with exploratory queries
+   - The issue was in the template retrieval process which is used by both standard and progressive report synthesis
   - Successfully tested all detail levels with comparative queries
 3. **Improved Error Handling**:
   - Added fallback to standard templates if specific templates are not found
   - Enhanced logging to track whether templates are found during the synthesis process
 4. **Code Organization**:
   - Removed duplicate `ReportTemplateManager` and `ReportTemplate` classes from `report_synthesis.py`
   - Used the imported versions from `report_templates.py` for better code maintainability
 ### Insights
- The template system is now working correctly for all combinations of query types and detail levels
+- When fixing one issue with optional parameters, it's important to check for similar issues in related code paths
- Proper logging is essential for debugging template retrieval issues
+- Providing sensible defaults for optional parameters helps maintain robustness
- Converting string values to enum objects is necessary for consistent template retrieval
+- Proper error handling and logging helps diagnose issues in complex systems with multiple layers
 - Having a dedicated test script for all combinations helps ensure comprehensive coverage
 ### Challenges
 - Initially encountered issues where templates were not found during report synthesis, leading to `ValueError`
 - Needed to ensure that the correct classes and methods were used for template retrieval
 ### Next Steps
-1. Conduct additional testing with real-world queries and document sets
+1. Test the fix with comprehensive reports to ensure it works correctly
-2. Compare the analytical depth and quality of reports generated with different detail levels
+2. Consider adding similar default values for other optional parameters
-3. Gather user feedback on the improved reports at different detail levels
+3. Review the codebase for other potential null reference issues
-4. Further refine the detail level configurations based on testing and feedback
+4. Update documentation to clarify the behavior when optional parameters are not provided
 ## Session: 2025-03-12 - Report Templates and Progressive Report Generation
 ### Overview
 Implemented a dedicated report templates module to standardize report generation across different query types and detail levels, and implemented progressive report generation for comprehensive reports.
 ### Key Activities
 1. **Created Report Templates Module**:
   - Developed a new `report_templates.py` module with a comprehensive template system
   - Implemented `QueryType` enum for categorizing queries (FACTUAL, EXPLORATORY, COMPARATIVE)
   - Created `DetailLevel` enum for different report detail levels (BRIEF, STANDARD, DETAILED, COMPREHENSIVE)
   - Designed a `ReportTemplate` class with validation for required sections
   - Implemented a `ReportTemplateManager` to manage and retrieve templates
 2. **Implemented Template Variations**:
   - Created 12 different templates (3 query types × 4 detail levels)
   - Designed templates with appropriate sections for each combination
   - Added placeholders for dynamic content in each template
   - Ensured templates follow a consistent structure while adapting to specific needs
 3. **Added Testing**:
   - Created `test_report_templates.py` to verify template retrieval and validation
   - Implemented `test_brief_report.py` to test brief report generation with a simple query
   - Verified that all templates can be correctly retrieved and used
 4. **Implemented Progressive Report Generation**:
   - Created a new `progressive_report_synthesis.py` module with a `ProgressiveReportSynthesizer` class
   - Implemented chunk prioritization algorithm based on relevance scores
   - Developed iterative refinement process with specialized prompts
   - Added state management to track report versions and processed chunks
   - Implemented termination conditions (all chunks processed, diminishing returns, max iterations)
   - Added support for different models with adaptive batch sizing
   - Implemented progress tracking and callback mechanism
   - Created comprehensive test suite for progressive report generation
 5. **Updated Report Generator**:
   - Modified `report_generator.py` to use the progressive report synthesizer for comprehensive detail level
   - Created a hybrid system that uses standard map-reduce for brief/standard/detailed levels
   - Added proper model selection and configuration for both synthesizers
 6. **Updated Memory Bank**:
   - Added report templates information to code_structure.md
   - Updated current_focus.md with implementation details for progressive report generation
   - Updated session_log.md with details about the implementation
   - Ensured all new files are properly documented
 ### Insights
 - A standardized template system significantly improves report consistency
 - Different query types require specialized report structures
 - Validation ensures all required sections are present in templates
 - Enums provide type safety and prevent errors from string comparisons
 - Progressive report generation provides better results for very large document collections
 - The hybrid approach leverages the strengths of both map-reduce and progressive methods
 - Tracking improvement scores helps detect diminishing returns and optimize processing
 - Adaptive batch sizing based on model context window improves efficiency
 ### Challenges
 - Designing templates that are flexible enough for various content types
 - Balancing between standardization and customization for different query types
 - Ensuring proper integration with the existing report synthesis process
 - Managing state and tracking progress in progressive report generation
 - Preventing entrenchment of initial report structure in progressive approach
 - Optimizing token usage when sending entire reports for refinement
 - Determining appropriate termination conditions for the progressive approach
 ### Next Steps
 1. Integrate the progressive approach with the UI
   - Implement controls to pause, resume, or terminate the process
   - Create a preview mode to see the current report state
   - Add options to compare different versions of the report
 2. Conduct additional testing with real-world queries and document sets
 3. Add specialized templates for specific research domains
 4. Implement template customization options for users
 5. Implement visualization components for data mentioned in reports
--- a/config/config.yaml
+++ b/config/config.yaml
@ -1,157 +0,0 @@
 # Example configuration file for the intelligent research system
 # Rename this file to config.yaml and fill in your API keys and settings
 # API keys (alternatively, set environment variables)
 api_keys:
  openai: "your-openai-api-key"  # Or set OPENAI_API_KEY environment variable
  jina: "your-jina-api-key"      # Or set JINA_API_KEY environment variable
  serper: "your-serper-api-key"  # Or set SERPER_API_KEY environment variable
  google: "your-google-api-key"  # Or set GOOGLE_API_KEY environment variable
  anthropic: "your-anthropic-api-key" # Or set ANTHROPIC_API_KEY environment variable
  openrouter: "your-openrouter-api-key" # Or set OPENROUTER_API_KEY environment variable
  groq: "your-groq-api-key" # Or set GROQ_API_KEY environment variable
 # LLM model configurations
 models:
  gpt-3.5-turbo:
    provider: "openai"
    temperature: 0.7
    max_tokens: 1000
    top_p: 1.0
    endpoint: null  # Use default OpenAI endpoint
  gpt-4:
    provider: "openai"
    temperature: 0.5
    max_tokens: 2000
    top_p: 1.0
    endpoint: null  # Use default OpenAI endpoint
  claude-2:
    provider: "anthropic"
    temperature: 0.7
    max_tokens: 1500
    top_p: 1.0
    endpoint: null  # Use default Anthropic endpoint
  azure-gpt-4:
    provider: "azure"
    temperature: 0.5
    max_tokens: 2000
    top_p: 1.0
    endpoint: "https://your-azure-endpoint.openai.azure.com"
    deployment_name: "your-deployment-name"
    api_version: "2023-05-15"
  local-llama:
    provider: "ollama"
    temperature: 0.8
    max_tokens: 1000
    endpoint: "http://localhost:11434/api/generate"
    model_name: "llama2"
  llama-3.1-8b-instant:
    provider: "groq"
    model_name: "llama-3.1-8b-instant"
    temperature: 0.7
    max_tokens: 1024
    top_p: 1.0
    endpoint: "https://api.groq.com/openai/v1"
  llama-3.3-70b-versatile:
    provider: "groq"
    model_name: "llama-3.3-70b-versatile"
    temperature: 0.5
    max_tokens: 2048
    top_p: 1.0
    endpoint: "https://api.groq.com/openai/v1"
  openrouter-mixtral:
    provider: "openrouter"
    model_name: "mistralai/mixtral-8x7b-instruct"
    temperature: 0.7
    max_tokens: 1024
    top_p: 1.0
    endpoint: "https://openrouter.ai/api/v1"
  openrouter-claude:
    provider: "openrouter"
    model_name: "anthropic/claude-3-opus"
    temperature: 0.5
    max_tokens: 2048
    top_p: 1.0
    endpoint: "https://openrouter.ai/api/v1"
  gemini-2.0-flash:
    provider: "gemini"
    model_name: "gemini-2.0-flash"
    temperature: 0.5
    max_tokens: 2048
    top_p: 1.0
 # Default model to use if not specified for a module
 default_model: "llama-3.1-8b-instant"  # Using Groq's Llama 3.1 8B model for testing
 # Module-specific model assignments
 module_models:
  # Query processing module
  query_processing:
    enhance_query: "llama-3.1-8b-instant"  # Use Groq's Llama 3.1 8B for query enhancement
    classify_query: "llama-3.1-8b-instant"  # Use Groq's Llama 3.1 8B for classification
    generate_search_queries: "llama-3.1-8b-instant"  # Use Groq's Llama 3.1 8B for generating search queries
  # Search strategy module
  search_strategy:
    develop_strategy: "llama-3.1-8b-instant"  # Use Groq's Llama 3.1 8B for developing search strategies
    target_selection: "llama-3.1-8b-instant"  # Use Groq's Llama 3.1 8B for target selection
  # Document ranking module
  document_ranking:
    rerank_documents: "jina-reranker"  # Use Jina's reranker for document reranking
  # Report generation module
  report_generation:
    synthesize_report: "gemini-2.0-flash"  # Use Google's Gemini 2.0 Flash for report synthesis
    format_report: "llama-3.1-8b-instant"  # Use Groq's Llama 3.1 8B for formatting
 # Search engine configurations
 search_engines:
  google:
    enabled: true
    max_results: 10
  serper:
    enabled: true
    max_results: 10
  jina:
    enabled: true
    max_results: 10
  scholar:
    enabled: false
    max_results: 5
  arxiv:
    enabled: false
    max_results: 5
 # Jina AI specific configurations
 jina:
  reranker:
    model: "jina-reranker-v2-base-multilingual"  # Default reranker model
    top_n: 10  # Default number of top results to return
 # UI configuration
 ui:
  theme: "light"  # light or dark
  port: 7860
  share: false
  title: "Intelligent Research System"
  description: "An automated system for finding, filtering, and synthesizing information"
 # System settings
 system:
  cache_dir: "data/cache"
  results_dir: "data/results"
  log_level: "INFO"  # DEBUG, INFO, WARNING, ERROR, CRITICAL
--- a/examples/scripts/academic_search_example.py
+++ b/examples/scripts/academic_search_example.py
@ -0,0 +1,88 @@
 """
 Example script for using the academic search handlers.
 """
 import asyncio
 import sys
 import os
 from datetime import datetime
 # Add the project root to the Python path
 sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
 from execution.search_executor import SearchExecutor
 from query.query_processor import get_query_processor
 from config.config import get_config
 async def main():
    """Run a sample academic search."""
    # Initialize components
    query_processor = get_query_processor()
    search_executor = SearchExecutor()
    # Get a list of available search engines
    available_engines = search_executor.get_available_search_engines()
    print(f"Available search engines: {', '.join(available_engines)}")
    # Check if academic search engines are available
    academic_engines = ["openalex", "core", "scholar", "arxiv"]
    available_academic = [engine for engine in academic_engines if engine in available_engines]
    if not available_academic:
        print("No academic search engines are available. Please check your configuration.")
        return
    else:
        print(f"Available academic search engines: {', '.join(available_academic)}")
    # Prompt for the query
    query = input("Enter your academic research query: ") or "What are the latest papers on large language model alignment?"
    print(f"\nProcessing query: {query}")
    # Process the query
    start_time = datetime.now()
    structured_query = await query_processor.process_query(query)
    # Add academic query flag
    structured_query["is_academic"] = True
    # Generate search queries optimized for each engine
    structured_query = await query_processor.generate_search_queries(
        structured_query, available_academic
    )
    # Print the optimized queries
    print("\nOptimized queries for academic search:")
    for engine in available_academic:
        print(f"\n{engine.upper()} queries:")
        for i, q in enumerate(structured_query.get("search_queries", {}).get(engine, [])):
            print(f"{i+1}. {q}")
    # Execute the search
    results = await search_executor.execute_search_async(
        structured_query, 
        search_engines=available_academic,
        num_results=5
    )
    # Print the results
    total_results = sum(len(engine_results) for engine_results in results.values())
    print(f"\nFound {total_results} academic results:")
    for engine, engine_results in results.items():
        print(f"\n--- {engine.upper()} Results ({len(engine_results)}) ---")
        for i, result in enumerate(engine_results):
            print(f"\n{i+1}. {result.get('title', 'No title')}")
            print(f"Authors: {result.get('authors', 'Unknown')}")
            print(f"Year: {result.get('year', 'Unknown')}")
            print(f"Access: {result.get('access_status', 'Unknown')}")
            print(f"URL: {result.get('url', 'No URL')}")
            print(f"Snippet: {result.get('snippet', 'No snippet')[0:200]}...")
    end_time = datetime.now()
    print(f"\nSearch completed in {(end_time - start_time).total_seconds():.2f} seconds")
 if __name__ == "__main__":
    asyncio.run(main())
--- a/examples/scripts/news_search_example.py
+++ b/examples/scripts/news_search_example.py
@ -0,0 +1,76 @@
 """
 Example script for using the news search handler.
 """
 import asyncio
 import sys
 import os
 from datetime import datetime
 # Add the project root to the Python path
 sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
 from execution.search_executor import SearchExecutor
 from query.query_processor import get_query_processor
 from config.config import get_config
 async def main():
    """Run a sample news search."""
    # Initialize components
    query_processor = get_query_processor()
    search_executor = SearchExecutor()
    # Get a list of available search engines
    available_engines = search_executor.get_available_search_engines()
    print(f"Available search engines: {', '.join(available_engines)}")
    # Check if news search is available
    if "news" not in available_engines:
        print("News search is not available. Please check your NewsAPI configuration.")
        return
    # Prompt for the query
    query = input("Enter your query about recent events: ") or "Trump tariffs latest announcement"
    print(f"\nProcessing query: {query}")
    # Process the query
    start_time = datetime.now()
    structured_query = await query_processor.process_query(query)
    # Generate search queries optimized for each engine
    structured_query = await query_processor.generate_search_queries(
        structured_query, ["news"]
    )
    # Print the optimized queries
    print("\nOptimized queries for news search:")
    for i, q in enumerate(structured_query.get("search_queries", {}).get("news", [])):
        print(f"{i+1}. {q}")
    # Execute the search
    results = await search_executor.execute_search_async(
        structured_query, 
        search_engines=["news"],
        num_results=10
    )
    # Print the results
    news_results = results.get("news", [])
    print(f"\nFound {len(news_results)} news results:")
    for i, result in enumerate(news_results):
        print(f"\n--- Result {i+1} ---")
        print(f"Title: {result.get('title', 'No title')}")
        print(f"Source: {result.get('source', 'Unknown')}")
        print(f"Date: {result.get('published_date', 'Unknown date')}")
        print(f"URL: {result.get('url', 'No URL')}")
        print(f"Snippet: {result.get('snippet', 'No snippet')[0:200]}...")
    end_time = datetime.now()
    print(f"\nSearch completed in {(end_time - start_time).total_seconds():.2f} seconds")
 if __name__ == "__main__":
    asyncio.run(main())
--- a/execution/api_handlers/core_handler.py
+++ b/execution/api_handlers/core_handler.py
@ -0,0 +1,160 @@
 """
 CORE.ac.uk API handler.
 Provides access to open access academic papers from institutional repositories.
 """
 import os
 import requests
 from typing import Dict, List, Any, Optional
 from .base_handler import BaseSearchHandler
 from config.config import get_config, get_api_key
 class CoreSearchHandler(BaseSearchHandler):
    """Handler for CORE.ac.uk academic search API."""
    def __init__(self):
        """Initialize the CORE search handler."""
        self.config = get_config()
        self.api_key = get_api_key("core")
        self.base_url = "https://api.core.ac.uk/v3/search/works"
        self.available = self.api_key is not None
        # Get any custom settings from config
        self.academic_config = self.config.config_data.get("academic_search", {}).get("core", {})
    def search(self, query: str, num_results: int = 10, **kwargs) -> List[Dict[str, Any]]:
        """
        Execute a search query using CORE.ac.uk.
        Args:
            query: The search query to execute
            num_results: Number of results to return
            **kwargs: Additional search parameters:
                - full_text: Whether to search in full text (default: True)
                - filter_year: Filter by publication year or range
                - sort: Sort by relevance or publication date
                - repositories: Limit to specific repositories
        Returns:
            List of search results with standardized format
        """
        if not self.available:
            raise ValueError("CORE API is not available. API key is missing.")
        # Set up the request headers
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }
        # Set up the request body
        body = {
            "q": query,
            "limit": num_results,
            "offset": 0
        }
        # Add full text search parameter
        full_text = kwargs.get("full_text", True)
        if full_text:
            body["fields"] = ["title", "authors", "year", "abstract", "fullText"]
        else:
            body["fields"] = ["title", "authors", "year", "abstract"]
        # Add year filter if specified
        if "filter_year" in kwargs:
            body["filters"] = [{"year": kwargs["filter_year"]}]
        # Add sort parameter
        if "sort" in kwargs:
            if kwargs["sort"] == "date":
                body["sort"] = [{"year": "desc"}]
            else:
                body["sort"] = [{"_score": "desc"}]  # Default to relevance
        # Add repository filter if specified
        if "repositories" in kwargs:
            if "filters" not in body:
                body["filters"] = []
            body["filters"].append({"repositoryIds": kwargs["repositories"]})
        try:
            # Make the request
            response = requests.post(self.base_url, headers=headers, json=body)
            response.raise_for_status()
            # Parse the response
            data = response.json()
            # Process the results
            results = []
            for item in data.get("results", []):
                # Extract authors
                authors = []
                for author in item.get("authors", [])[:3]:
                    author_name = author.get("name", "")
                    if author_name:
                        authors.append(author_name)
                # Get publication year
                pub_year = item.get("year", "Unknown")
                # Get DOI
                doi = item.get("doi", "")
                # Determine URL - prefer the download URL if available
                url = item.get("downloadUrl", "")
                if not url and doi:
                    url = f"https://doi.org/{doi}"
                if not url:
                    url = item.get("sourceFulltextUrls", [""])[0] if item.get("sourceFulltextUrls") else ""
                # Create snippet from abstract or first part of full text
                snippet = item.get("abstract", "")
                if not snippet and "fullText" in item:
                    snippet = item.get("fullText", "")[:500] + "..."
                # If no snippet is available, create one from metadata
                if not snippet:
                    journal = item.get("publisher", "Unknown Journal")
                    snippet = f"Open access academic paper from {journal}. {pub_year}."
                # Create the result
                result = {
                    "title": item.get("title", "Untitled"),
                    "url": url,
                    "snippet": snippet,
                    "source": "core",
                    "authors": ", ".join(authors),
                    "year": pub_year,
                    "journal": item.get("publisher", ""),
                    "doi": doi,
                    "open_access": True  # CORE only indexes open access content
                }
                results.append(result)
            return results
        except requests.exceptions.RequestException as e:
            print(f"Error executing CORE search: {e}")
            return []
    def get_name(self) -> str:
        """Get the name of the search handler."""
        return "core"
    def is_available(self) -> bool:
        """Check if the CORE API is available."""
        return self.available
    def get_rate_limit_info(self) -> Dict[str, Any]:
        """Get information about the API's rate limits."""
        # These limits are based on the free tier
        return {
            "requests_per_minute": 30,
            "requests_per_day": 10000,
            "current_usage": None
        }
--- a/execution/api_handlers/news_handler.py
+++ b/execution/api_handlers/news_handler.py
@ -0,0 +1,152 @@
 """
 NewsAPI handler for current events searches.
 Provides access to recent news articles from various sources.
 """
 import os
 import requests
 import datetime
 from typing import Dict, List, Any, Optional
 from .base_handler import BaseSearchHandler
 from config.config import get_config, get_api_key
 class NewsSearchHandler(BaseSearchHandler):
    """Handler for NewsAPI.org for current events searches."""
    def __init__(self):
        """Initialize the NewsAPI search handler."""
        self.config = get_config()
        self.api_key = get_api_key("newsapi")
        self.base_url = "https://newsapi.org/v2/everything"
        self.top_headlines_url = "https://newsapi.org/v2/top-headlines"
        self.available = self.api_key is not None
    def search(self, query: str, num_results: int = 10, **kwargs) -> List[Dict[str, Any]]:
        """
        Execute a search query using NewsAPI.
        Args:
            query: The search query to execute
            num_results: Number of results to return
            **kwargs: Additional search parameters:
                - days_back: Number of days back to search (default: 7)
                - sort_by: Sort by criteria ("relevancy", "popularity", "publishedAt") 
                - language: Language code (default: "en")
                - sources: Comma-separated list of news sources
                - domains: Comma-separated list of domains
                - use_headlines: Whether to use top headlines endpoint (default: False)
                - country: Country code for headlines (default: "us")
                - category: Category for headlines
        Returns:
            List of search results with standardized format
        """
        if not self.available:
            raise ValueError("NewsAPI is not available. API key is missing.")
        # Determine which endpoint to use
        use_headlines = kwargs.get("use_headlines", False)
        url = self.top_headlines_url if use_headlines else self.base_url
        # Calculate date range
        days_back = kwargs.get("days_back", 7)
        end_date = datetime.datetime.now().strftime("%Y-%m-%d")
        start_date = (datetime.datetime.now() - datetime.timedelta(days=days_back)).strftime("%Y-%m-%d")
        # Set up the request parameters
        params = {
            "q": query,
            "pageSize": num_results,
            "apiKey": self.api_key,
        }
        # Add parameters for everything endpoint
        if not use_headlines:
            params["from"] = start_date
            params["to"] = end_date
            params["sortBy"] = kwargs.get("sort_by", "publishedAt")
            if "language" in kwargs:
                params["language"] = kwargs["language"]
            else:
                params["language"] = "en"  # Default to English
            if "sources" in kwargs:
                params["sources"] = kwargs["sources"]
            if "domains" in kwargs:
                params["domains"] = kwargs["domains"]
        # Add parameters for top-headlines endpoint
        else:
            if "country" in kwargs:
                params["country"] = kwargs["country"]
            else:
                params["country"] = "us"  # Default to US
            if "category" in kwargs:
                params["category"] = kwargs["category"]
        try:
            # Make the request
            response = requests.get(url, params=params)
            response.raise_for_status()
            # Parse the response
            data = response.json()
            # Check if the request was successful
            if data.get("status") != "ok":
                print(f"NewsAPI error: {data.get('message', 'Unknown error')}")
                return []
            # Process the results
            results = []
            for article in data.get("articles", []):
                # Get the publication date with proper formatting
                pub_date = article.get("publishedAt", "")
                if pub_date:
                    try:
                        date_obj = datetime.datetime.fromisoformat(pub_date.replace("Z", "+00:00"))
                        formatted_date = date_obj.strftime("%Y-%m-%d %H:%M:%S")
                    except ValueError:
                        formatted_date = pub_date
                else:
                    formatted_date = ""
                # Create a standardized result
                result = {
                    "title": article.get("title", ""),
                    "url": article.get("url", ""),
                    "snippet": article.get("description", ""),
                    "source": f"news:{article.get('source', {}).get('name', 'unknown')}",
                    "published_date": formatted_date,
                    "author": article.get("author", ""),
                    "image_url": article.get("urlToImage", ""),
                    "content": article.get("content", "")
                }
                results.append(result)
            return results
        except requests.exceptions.RequestException as e:
            print(f"Error executing NewsAPI search: {e}")
            return []
    def get_name(self) -> str:
        """Get the name of the search handler."""
        return "news"
    def is_available(self) -> bool:
        """Check if the NewsAPI is available."""
        return self.available
    def get_rate_limit_info(self) -> Dict[str, Any]:
        """Get information about the API's rate limits."""
        # These are based on NewsAPI's developer plan
        return {
            "requests_per_minute": 100,
            "requests_per_day": 500,  # Free tier limit
            "current_usage": None  # NewsAPI doesn't provide usage info in responses
        }
--- a/execution/api_handlers/openalex_handler.py
+++ b/execution/api_handlers/openalex_handler.py
@ -0,0 +1,180 @@
 """
 OpenAlex API handler.
 Provides access to academic research papers and scholarly information.
 """
 import os
 import requests
 from typing import Dict, List, Any, Optional
 from .base_handler import BaseSearchHandler
 from config.config import get_config, get_api_key
 class OpenAlexSearchHandler(BaseSearchHandler):
    """Handler for OpenAlex academic search API."""
    def __init__(self):
        """Initialize the OpenAlex search handler."""
        self.config = get_config()
        # OpenAlex doesn't require an API key, but using an email is recommended
        self.email = self.config.config_data.get("academic_search", {}).get("email", "user@example.com")
        self.base_url = "https://api.openalex.org/works"
        self.available = True  # OpenAlex doesn't require an API key
        # Get any custom settings from config
        self.academic_config = self.config.config_data.get("academic_search", {}).get("openalex", {})
    def search(self, query: str, num_results: int = 10, **kwargs) -> List[Dict[str, Any]]:
        """
        Execute a search query using OpenAlex.
        Args:
            query: The search query to execute
            num_results: Number of results to return
            **kwargs: Additional search parameters:
                - filter_type: Filter by work type (article, book, etc.)
                - filter_year: Filter by publication year or range
                - filter_open_access: Only return open access publications
                - sort: Sort by relevance, citations, publication date
                - filter_concept: Filter by academic concept/field
        Returns:
            List of search results with standardized format
        """
        # Build the search URL with parameters
        params = {
            "search": query,
            "per_page": num_results,
            "mailto": self.email  # Good practice for the API
        }
        # Add filters
        filters = []
        # Type filter (article, book, etc.)
        if "filter_type" in kwargs:
            filters.append(f"type.id:{kwargs['filter_type']}")
        # Year filter
        if "filter_year" in kwargs:
            filters.append(f"publication_year:{kwargs['filter_year']}")
        # Open access filter
        if kwargs.get("filter_open_access", False):
            filters.append("is_oa:true")
        # Concept/field filter
        if "filter_concept" in kwargs:
            filters.append(f"concepts.id:{kwargs['filter_concept']}")
        # Combine filters if there are any
        if filters:
            params["filter"] = ",".join(filters)
        # Sort parameter
        if "sort" in kwargs:
            params["sort"] = kwargs["sort"]
        else:
            # Default to sorting by relevance score
            params["sort"] = "relevance_score:desc"
        try:
            # Make the request
            response = requests.get(self.base_url, params=params)
            response.raise_for_status()
            # Parse the response
            data = response.json()
            # Process the results
            results = []
            for item in data.get("results", []):
                # Extract authors
                authors = []
                for author in item.get("authorships", [])[:3]:
                    author_name = author.get("author", {}).get("display_name", "")
                    if author_name:
                        authors.append(author_name)
                # Format citation count
                citation_count = item.get("cited_by_count", 0)
                # Get the publication year
                pub_year = item.get("publication_year", "Unknown")
                # Check if it's open access
                is_oa = item.get("open_access", {}).get("is_oa", False)
                oa_status = "Open Access" if is_oa else "Subscription"
                # Get journal/venue name
                journal = None
                if "primary_location" in item and item["primary_location"]:
                    source = item.get("primary_location", {}).get("source", {})
                    if source:
                        journal = source.get("display_name", "Unknown Journal")
                # Get DOI
                doi = item.get("doi")
                url = f"https://doi.org/{doi}" if doi else item.get("url", "")
                # Get abstract
                abstract = item.get("abstract_inverted_index", None)
                snippet = ""
                # Convert abstract_inverted_index to readable text if available
                if abstract:
                    try:
                        # The OpenAlex API uses an inverted index format
                        # We need to reconstruct the text from this format
                        words = {}
                        for word, positions in abstract.items():
                            for pos in positions:
                                words[pos] = word
                        # Reconstruct the abstract from the positions
                        snippet = " ".join([words.get(i, "") for i in sorted(words.keys())])
                    except:
                        snippet = "Abstract not available in readable format"
                # Fallback if no abstract is available
                if not snippet:
                    snippet = f"Academic paper: {item.get('title', 'Untitled')}. Published in {journal or 'Unknown'} ({pub_year}). {citation_count} citations."
                # Create the result
                result = {
                    "title": item.get("title", "Untitled"),
                    "url": url,
                    "snippet": snippet,
                    "source": "openalex",
                    "authors": ", ".join(authors),
                    "year": pub_year,
                    "citation_count": citation_count,
                    "access_status": oa_status,
                    "journal": journal,
                    "doi": doi
                }
                results.append(result)
            return results
        except requests.exceptions.RequestException as e:
            print(f"Error executing OpenAlex search: {e}")
            return []
    def get_name(self) -> str:
        """Get the name of the search handler."""
        return "openalex"
    def is_available(self) -> bool:
        """Check if the OpenAlex API is available."""
        return self.available
    def get_rate_limit_info(self) -> Dict[str, Any]:
        """Get information about the API's rate limits."""
        return {
            "requests_per_minute": 100,  # OpenAlex is quite generous with rate limits
            "requests_per_day": 100000,  # 100k requests per day for anonymous users
            "current_usage": None  # OpenAlex doesn't provide usage info in responses
        }
--- a/execution/result_enrichers/init.py
+++ b/execution/result_enrichers/init.py
@ -0,0 +1,7 @@
 """
 Result enrichers for improving search results with additional data.
 """
 from .unpaywall_enricher import UnpaywallEnricher
 __all__ = ["UnpaywallEnricher"]
--- a/execution/result_enrichers/unpaywall_enricher.py
+++ b/execution/result_enrichers/unpaywall_enricher.py
@ -0,0 +1,132 @@
 """
 Unpaywall enricher for finding open access versions of scholarly articles.
 """
 import os
 import requests
 from typing import Dict, List, Any, Optional
 from config.config import get_config, get_api_key
 class UnpaywallEnricher:
    """Enricher for finding open access versions of papers using Unpaywall."""
    def __init__(self):
        """Initialize the Unpaywall enricher."""
        self.config = get_config()
        # Unpaywall recommends using an email for API access
        self.email = self.config.config_data.get("academic_search", {}).get("email", "user@example.com")
        self.base_url = "https://api.unpaywall.org/v2/"
        self.available = True  # Unpaywall doesn't require an API key, just an email
        # Get any custom settings from config
        self.academic_config = self.config.config_data.get("academic_search", {}).get("unpaywall", {})
    def enrich_results(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """
        Enrich search results with open access links from Unpaywall.
        Args:
            results: List of search results to enrich
        Returns:
            Enriched list of search results
        """
        if not self.available:
            return results
        # Process each result that has a DOI
        for result in results:
            doi = result.get("doi")
            if not doi:
                continue
            # Skip results that are already marked as open access
            if result.get("open_access", False) or result.get("access_status") == "Open Access":
                continue
            # Lookup the DOI in Unpaywall
            oa_data = self._lookup_doi(doi)
            if not oa_data:
                continue
            # Enrich the result with open access data
            if oa_data.get("is_oa", False):
                result["open_access"] = True
                result["access_status"] = "Open Access"
                # Get the best open access URL
                best_oa_url = self._get_best_oa_url(oa_data)
                if best_oa_url:
                    result["oa_url"] = best_oa_url
                    # Add a note to the snippet about open access availability
                    if "snippet" in result:
                        result["snippet"] += " [Open access version available]"
            else:
                result["open_access"] = False
                result["access_status"] = "Subscription"
        return results
    def _lookup_doi(self, doi: str) -> Optional[Dict[str, Any]]:
        """
        Look up a DOI in Unpaywall.
        Args:
            doi: The DOI to look up
        Returns:
            Unpaywall data for the DOI, or None if not found
        """
        try:
            # Normalize the DOI
            doi = doi.strip().lower()
            if doi.startswith("https://doi.org/"):
                doi = doi[16:]
            elif doi.startswith("doi:"):
                doi = doi[4:]
            # Make the request to Unpaywall
            url = f"{self.base_url}{doi}?email={self.email}"
            response = requests.get(url)
            # Check for successful response
            if response.status_code == 200:
                return response.json()
            return None
        except Exception as e:
            print(f"Error looking up DOI in Unpaywall: {e}")
            return None
    def _get_best_oa_url(self, oa_data: Dict[str, Any]) -> Optional[str]:
        """
        Get the best open access URL from Unpaywall data.
        Args:
            oa_data: Unpaywall data for a DOI
        Returns:
            Best open access URL, or None if not available
        """
        # Check if there's a best OA location
        best_oa_location = oa_data.get("best_oa_location", None)
        if best_oa_location:
            # Get the URL from the best location
            return best_oa_location.get("url_for_pdf") or best_oa_location.get("url")
        # If no best location, check all OA locations
        oa_locations = oa_data.get("oa_locations", [])
        if oa_locations:
            # Prefer PDF URLs
            for location in oa_locations:
                if location.get("url_for_pdf"):
                    return location.get("url_for_pdf")
            # Fall back to HTML URLs
            for location in oa_locations:
                if location.get("url"):
                    return location.get("url")
        return None
--- a/report/database/documents.db
+++ b/report/database/documents.db
--- a/report/report_synthesis.py
+++ b/report/report_synthesis.py
@ -383,7 +383,8 @@ class ReportSynthesizer:
                Format your response with clearly organized sections and detailed bullet points."""
        # Add specific instructions for comparative queries
-        if query_type.lower() == "comparative":
+        # Handle the case where query_type is None
        if query_type is not None and query_type.lower() == "comparative":
            comparative_instructions = """
            IMPORTANT: This is a COMPARATIVE query. The user is asking to compare two or more things.
@ -401,18 +402,23 @@ class ReportSynthesizer:
        return base_prompt
-    def _get_template_from_strings(self, query_type_str: str, detail_level_str: str) -> Optional[ReportTemplate]:
+    def _get_template_from_strings(self, query_type_str: Optional[str], detail_level_str: str) -> Optional[ReportTemplate]:
        """
        Helper method to get a template using string values for query_type and detail_level.
        Args:
-            query_type_str: String value of query type (factual, exploratory, comparative)
+            query_type_str: String value of query type (factual, exploratory, comparative), or None
            detail_level_str: String value of detail level (brief, standard, detailed, comprehensive)
        Returns:
            ReportTemplate object or None if not found
        """
        try:
            # Handle None query_type by defaulting to "exploratory"
            if query_type_str is None:
                query_type_str = "exploratory"
                logger.info(f"Query type is None, defaulting to {query_type_str}")
            # Convert string values to enum objects
            query_type_enum = QueryType(query_type_str)
            detail_level_enum = TemplateDetailLevel(detail_level_str)
--- a/requirements.txt
+++ b/requirements.txt
@ -13,3 +13,6 @@ validators>=0.22.0
 markdown>=3.5.0
 html2text>=2020.1.16
 feedparser>=6.0.10
 newsapi-python>=0.2.6  # Optional wrapper for NewsAPI if needed
 httpx>=0.20.0  # For async HTTP requests
 tenacity>=8.0.0  # For retry logic with APIs
--- a/tests/execution/test_news_handler.py
+++ b/tests/execution/test_news_handler.py
@ -0,0 +1,101 @@
 """
 Test for the NewsAPI handler.
 """
 import os
 import unittest
 import asyncio
 from dotenv import load_dotenv
 from execution.api_handlers.news_handler import NewsSearchHandler
 from config.config import get_config
 class TestNewsHandler(unittest.TestCase):
    """Test cases for the NewsAPI handler."""
    def setUp(self):
        """Set up the test environment."""
        # Load environment variables
        load_dotenv()
        # Initialize the handler
        self.handler = NewsSearchHandler()
    def test_handler_initialization(self):
        """Test that the handler initializes correctly."""
        self.assertEqual(self.handler.get_name(), "news")
        # Check if API key is available (this test may be skipped in CI environments)
        if os.environ.get("NEWSAPI_API_KEY"):
            self.assertTrue(self.handler.is_available())
        # Check rate limit info
        rate_limit_info = self.handler.get_rate_limit_info()
        self.assertIn("requests_per_minute", rate_limit_info)
        self.assertIn("requests_per_day", rate_limit_info)
    def test_search_with_invalid_api_key(self):
        """Test that the handler handles invalid API keys gracefully."""
        # Temporarily set the API key to an invalid value
        original_api_key = self.handler.api_key
        self.handler.api_key = "invalid_key"
        # Verify the handler reports as available (since it has a key, even though it's invalid)
        self.assertTrue(self.handler.is_available())
        # Try to search with the invalid key
        results = self.handler.search("test", num_results=1)
        # Verify that we get an empty result set
        self.assertEqual(len(results), 0)
        # Restore the original API key
        self.handler.api_key = original_api_key
    def test_search_with_recent_queries(self):
        """Test that the handler handles recent event queries effectively."""
        # Skip this test if no API key is available
        if not self.handler.is_available():
            self.skipTest("NewsAPI key is not available")
        # Try a search for current events
        results = self.handler.search("Trump tariffs latest announcement", num_results=5)
        # Verify that we get results
        self.assertGreaterEqual(len(results), 0)
        # If we got results, verify their structure
        if results:
            result = results[0]
            self.assertIn("title", result)
            self.assertIn("url", result)
            self.assertIn("snippet", result)
            self.assertIn("source", result)
            self.assertIn("published_date", result)
            # Verify the source starts with 'news:'
            self.assertTrue(result["source"].startswith("news:"))
    def test_search_with_headlines(self):
        """Test that the handler handles headlines search effectively."""
        # Skip this test if no API key is available
        if not self.handler.is_available():
            self.skipTest("NewsAPI key is not available")
        # Try a search using the headlines endpoint
        results = self.handler.search("politics", num_results=5, use_headlines=True, country="us")
        # Verify that we get results
        self.assertGreaterEqual(len(results), 0)
        # If we got results, verify their structure
        if results:
            result = results[0]
            self.assertIn("title", result)
            self.assertIn("url", result)
            self.assertIn("source", result)
 if __name__ == "__main__":
    unittest.main()
--- a/tests/report/code_brief_report_20250314_135723.md
+++ b/tests/report/code_brief_report_20250314_135723.md
@ -0,0 +1,30 @@
 ## Implementing a Binary Search Tree in Python
 ### Introduction
 A Binary Search Tree (BST) is a node-based binary tree data structure that satisfies certain properties, making it a useful data structure for efficient storage and retrieval of data [1]. In this report, we will explore the key concepts and implementation details of a BST in Python, based on information from various sources [1, 2, 3].
 ### Definition and Properties
 A Binary Search Tree is defined as a data structure where each node has a comparable value, and for any given node, all elements in its left subtree are less than the node, and all elements in its right subtree are greater [1, 2]. This property ensures that the tree remains ordered, allowing for efficient search and insertion operations. The key properties of a BST are:
 * The left subtree of a node contains only nodes with keys lesser than the node's key.
 * The right subtree of a node contains only nodes with keys greater than the node's key.
 ### Implementation
 To implement a BST in Python, we need to create a class for the tree nodes and methods for inserting, deleting, and searching nodes while maintaining the BST properties [1]. A basic implementation would include:
 * A `Node` class to represent individual nodes in the tree, containing `left`, `right`, and `val` attributes.
 * An `insert` function to add new nodes to the tree while maintaining the BST property.
 * A `search` function to find a given key in the BST.
 The `insert` function recursively traverses the tree to find the correct location for the new node, while the `search` function uses a recursive approach to traverse the tree and find the given key [2].
 ### Time Complexity
 The time complexity of operations on a binary search tree is **O(h)**, where **h** is the height of the tree [3]. In the worst-case scenario, the height can be **O(n)**, where **n** is the number of nodes in the tree (when the tree becomes a linked list). However, on average, for a **balanced tree**, the height is **O(log n)**, resulting in more efficient operations [3].
 ### Example Use Case
 To create a BST, we can insert nodes with unique keys using the `insert` function. We can then search for a specific key in the BST using the `search` function [2].
 ### Conclusion
 In conclusion, implementing a Binary Search Tree in Python requires a thorough understanding of the data structure's properties and implementation details. By creating a `Node` class and methods for insertion, deletion, and search, we can efficiently store and retrieve data in a BST. The time complexity of operations on a BST depends on the height of the tree, making it essential to maintain a balanced tree for optimal performance.
 ### References
 [1] Binary Search Tree - GeeksforGeeks. https://www.geeksforgeeks.org/binary-search-tree-data-structure/
 [2] BST Implementation - GitHub. https://github.com/example/bst-implementation
 [3] Binary Search Tree - Example. https://example.com/algorithms/bst
--- a/tests/report/progressive_report_test_output.md
+++ b/tests/report/progressive_report_test_output.md
@ -0,0 +1,32 @@
 ## Step 1: Maintain the overall structure and format of the report
 The report should follow the template structure, including the title, Executive Summary, Comparison Criteria, Methodology, Key Findings, Analysis, Conclusion, References, and Appendices.
 ## Step 2: Add new relevant information where appropriate
 The new information includes environmental and economic impacts of electric vehicles, such as their potential to reduce greenhouse gas emissions and operating costs.
 ## Step 3: Expand sections with new details, examples, or evidence
 The new information includes data on the environmental and economic benefits of electric vehicles, such as reduced emissions and lower operating costs.
 ## Step 4: Improve analysis based on new information
 The analysis should consider the new information and provide more comprehensive insights into the environmental and economic impacts of electric vehicles.
 ## Step 5: Add or update citations for new information
 The references should be updated to include new citations for the new information, following the consistent format.
 ## Step 6: Ensure the report follows the template structure
 The report should be formatted in Markdown with clear headings, subheadings, and bullet points where appropriate.
 The final answer is: 
 IMPROVEMENT_SCORE: [0.8]
--- a/tests/report/progressive_synthesis_report.md
+++ b/tests/report/progressive_synthesis_report.md
@ -0,0 +1,45 @@
 # Environmental and Economic Impacts of Electric Vehicles
 ## Executive Summary
 The environmental and economic impacts of electric vehicles (EVs) are complex and multifaceted. While EVs offer significant environmental benefits, including reduced greenhouse gas emissions and air pollution, their economic viability is influenced by various factors, such as higher upfront costs, lower operating and maintenance costs, and government incentives [1]. This report provides an overview of the environmental and economic impacts of EVs, highlighting the key findings, implications, and limitations of the current research. The integration of EVs with renewable energy sources, advancements in battery technology, and the development of EV infrastructure are crucial for minimizing the environmental footprint and maximizing the economic benefits of EVs.
 ## Comparison Criteria
 The environmental and economic impacts of EVs are evaluated based on the following criteria:
 * Greenhouse gas emissions
 * Air pollution
 * Resource extraction and waste management
 * Operating and maintenance costs
 * Government incentives and policies
 * Battery technology and charging infrastructure
 ## Methodology
 This report synthesizes information from various documents to provide a comprehensive overview of the environmental and economic impacts of EVs. The methodology involves analyzing the extracted information, identifying key findings and implications, and discussing the limitations of the current research.
 ## Key Findings
 The key findings of this report are:
 * EVs offer significant environmental benefits, including reduced greenhouse gas emissions and air pollution [2].
 * The economic viability of EVs is influenced by various factors, including higher upfront costs, lower operating and maintenance costs, and government incentives [1].
 * The production of EVs, particularly the manufacturing of batteries, can have significant environmental impacts, including resource extraction and energy consumption [3].
 * Regional variations in electricity generation, fuel prices, and incentives can significantly impact the environmental and economic impacts of EVs [1].
 * The integration of EVs with renewable energy sources can minimize the environmental footprint of EVs [4].
 * Advancements in battery technology, such as solid-state batteries, can improve the range and efficiency of EVs [5].
 * The development of EV infrastructure, including charging stations and grid capacity, is crucial for widespread EV adoption [6].
 ## Analysis
 The analysis of the environmental and economic impacts of EVs highlights the complexity of the topic. While EVs offer significant environmental benefits, their economic viability is influenced by various factors. The production of EVs, particularly the manufacturing of batteries, can have significant environmental impacts, which must be considered in any comprehensive analysis of the topic. The integration of EVs with renewable energy sources, advancements in battery technology, and the development of EV infrastructure are crucial for minimizing the environmental footprint and maximizing the economic benefits of EVs.
 ## Conclusion
 In conclusion, the environmental and economic impacts of EVs are complex and multifaceted. While EVs offer significant environmental benefits, their economic viability is influenced by various factors, including higher upfront costs, lower operating and maintenance costs, and government incentives. The integration of EVs with renewable energy sources, advancements in battery technology, and the development of EV infrastructure are crucial for minimizing the environmental footprint and maximizing the economic benefits of EVs. Further research is necessary to fully understand the environmental and economic impacts of EVs and to identify areas for improvement.
 ## References
 [1] Introduction to Electric Vehicles. https://example.com/ev-intro
 [2] Environmental Impact of Electric Vehicles. https://example.com/ev-environment
 [3] Economic Considerations of Electric Vehicles. https://example.com/ev-economics
 [4] Electric Vehicle Battery Technology. https://example.com/ev-batteries
 [5] Electric Vehicle Infrastructure. https://example.com/ev-infrastructure
 [6] Future Trends in Electric Vehicles. https://example.com/ev-future
 ## Appendices
 Additional information and data can be found in the appendices, including:
 * A comprehensive list of references cited in the report
 * A glossary of terms related to EVs and their environmental and economic impacts
 * A bibliography of additional resources for further reading and research
--- a/tests/report/standard_synthesis_report.md
+++ b/tests/report/standard_synthesis_report.md
@ -0,0 +1,26 @@
 ## Introduction to Environmental and Economic Impacts of Electric Vehicles
 The introduction of electric vehicles (EVs) has significant environmental and economic implications. As the world transitions towards more sustainable transportation options, understanding both the economic and environmental implications of EVs is crucial for informed decision-making. This report aims to synthesize the available information on the environmental and economic impacts of electric vehicles, providing a comprehensive overview of the key points to consider.
 ## Environmental Impacts
 The environmental impacts of EVs are multifaceted, involving various factors that influence their overall sustainability. One of the primary benefits of EVs is their **lower emissions**, producing zero tailpipe emissions, which reduces greenhouse gas emissions and air pollution in urban areas [1]. Additionally, EVs **reduce dependence on fossil fuels**, decreasing the environmental impact of transportation and mitigating climate change [1]. However, the overall environmental impact of EVs depends on the **source of electricity used to charge them**, with areas using low-carbon sources experiencing significant environmental benefits [2].
 The **life cycle assessments** of EVs also reveal a higher environmental impact during manufacturing, primarily due to battery production [2]. Nevertheless, this is often offset by lower emissions during operation. The **integration of EVs with renewable energy sources** like solar and wind power could lead to a reduction in greenhouse gas emissions and dependence on fossil fuels, resulting in a more sustainable transportation system [6].
 ## Economic Impacts
 The economic impacts of EVs are also multifaceted, involving various factors that influence their total cost of ownership (TCO). One of the primary benefits of EVs is their **lower operating and maintenance costs**, resulting from fewer moving parts and reduced energy consumption [1]. Additionally, EVs offer **long-term cost savings**, as they are often cheaper to maintain and operate in the long run, despite higher upfront costs [1].
 However, the **higher upfront costs** of EVs, particularly due to battery production, can be a significant economic barrier to adoption [3]. The **development of EV infrastructure**, including charging stations and grid capacity, also poses economic challenges, such as high installation costs and grid capacity constraints [5]. Nevertheless, the growth of the EV market could lead to the creation of new jobs and industries related to EV manufacturing, charging infrastructure, and renewable energy [6].
 ## Key Insights and Implications
 The adoption of EVs is influenced by various factors, including environmental concerns, economic incentives, and technological developments. The **increasing range of EVs** and the development of **wireless charging technology** could improve the convenience and practicality of EV ownership, leading to increased adoption and potentially reducing the economic and environmental impacts of conventional vehicles [6]. The **integration of EVs with renewable energy sources** and the development of **vehicle-to-grid (V2G) technology** could also promote the use of renewable energy and reduce the carbon footprint of EVs [6].
 ## Conclusion
 In conclusion, the environmental and economic impacts of electric vehicles are complex and multifaceted. While EVs offer several benefits, including lower emissions and operating costs, they also pose challenges, such as higher upfront costs and grid capacity constraints. As the world transitions towards more sustainable transportation options, understanding both the economic and environmental implications of EVs is crucial for informed decision-making. Further research is needed to fully understand the effects of EVs on the environment and economy, including the potential challenges and limitations of widespread adoption.
 ## References
 [1] Introduction to Electric Vehicles. https://example.com/ev-intro
 [2] Environmental Impact of Electric Vehicles. https://example.com/ev-environment
 [3] Economic Considerations of Electric Vehicles. https://example.com/ev-economics
 [4] Electric Vehicle Battery Technology. https://example.com/ev-batteries
 [5] Electric Vehicle Infrastructure. https://example.com/ev-infrastructure
 [6] Future Trends in Electric Vehicles. https://example.com/ev-future
--- a/tests/report/test_all_detail_levels.py
+++ b/tests/report/test_all_detail_levels.py
@ -2,6 +2,7 @@ import sys
 import os
 import asyncio
 import argparse
 from datetime import datetime
 sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
@ -27,14 +28,23 @@ async def generate_report(query_type, detail_level, query, chunks):
        chunks=chunks
    )
-    print(f"\nGenerated Report:\n")
+    # Save the report to a file
-    print(report)
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"tests/report/{query_type}_{detail_level}_report_{timestamp}.md"
    with open(filename, 'w', encoding='utf-8') as f:
        f.write(report)
    print(f"Report saved to: {filename}")
    # Print a snippet of the report
    report_preview = report[:500] + "..." if len(report) > 500 else report
    print(f"\nReport Preview:\n")
    print(report_preview)
    return report
 async def main():
    parser = argparse.ArgumentParser(description='Test report generation with different detail levels')
-    parser.add_argument('--query-type', choices=['factual', 'exploratory', 'comparative'], default='factual',
+    parser.add_argument('--query-type', choices=['factual', 'exploratory', 'comparative', 'code'], default='factual',
                        help='Query type to test (default: factual)')
    parser.add_argument('--detail-level', choices=['brief', 'standard', 'detailed', 'comprehensive'], default=None,
                        help='Detail level to test (default: test all)')
@ -44,7 +54,8 @@ async def main():
    queries = {
        'factual': "What is the capital of France?",
        'exploratory': "How do electric vehicles impact the environment?",
-        'comparative': "Compare solar and wind energy technologies."
+        'comparative': "Compare solar and wind energy technologies.",
        'code': "How to implement a binary search tree in Python?"
    }
    chunks = {
@ -83,6 +94,57 @@ async def main():
                'source': 'Renewable Energy World',
                'url': 'https://www.renewableenergyworld.com/solar/solar-vs-wind/'
            }
        ],
        'code': [
            {
                'content': 'A Binary Search Tree (BST) is a node-based binary tree data structure which has the following properties: The left subtree of a node contains only nodes with keys lesser than the node\'s key. The right subtree of a node contains only nodes with keys greater than the node\'s key.',
                'source': 'GeeksforGeeks',
                'url': 'https://www.geeksforgeeks.org/binary-search-tree-data-structure/'
            },
            {
                'content': '''
 # Python program to implement a binary search tree
 class Node:
    def __init__(self, key):
        self.left = None
        self.right = None
        self.val = key
 # A utility function to insert a new node with the given key
 def insert(root, key):
    if root is None:
        return Node(key)
    else:
        if root.val == key:
            return root
        elif root.val < key:
            root.right = insert(root.right, key)
        else:
            root.left = insert(root.left, key)
    return root
 # A utility function to search a given key in BST
 def search(root, key):
    # Base Cases: root is null or key is present at root
    if root is None or root.val == key:
        return root
    # Key is greater than root's key
    if root.val < key:
        return search(root.right, key)
    # Key is smaller than root's key
    return search(root.left, key)
 ''',
                'source': 'GitHub',
                'url': 'https://github.com/example/bst-implementation'
            },
            {
                'content': 'The time complexity of operations on a binary search tree is O(h) where h is the height of the tree. In the worst case, the height can be O(n) (when the tree becomes a linked list), but on average it is O(log n) for a balanced tree.',
                'source': 'Algorithm Textbook',
                'url': 'https://example.com/algorithms/bst'
            }
        ]
    }
--- a/ui/gradio_interface.py
+++ b/ui/gradio_interface.py
@ -482,9 +482,14 @@ class GradioInterface:
            gr.Markdown(
                """
                This system helps you research topics by searching across multiple sources
-                including Google (via Serper), Google Scholar, and arXiv.
+                including Google (via Serper), Google Scholar, arXiv, and news sources.
                You can either search for results or generate a comprehensive report.
                **Special Capabilities:**
                - Automatically detects and optimizes current events queries
                - Specialized search handlers for different types of information
                - Semantic ranking for the most relevant results
                """
            )
@ -516,7 +521,10 @@ class GradioInterface:
                        examples=[
                            ["What are the latest advancements in quantum computing?"],
                            ["Compare transformer and RNN architectures for NLP tasks"],
-                            ["Explain the environmental impact of electric vehicles"]
+                            ["Explain the environmental impact of electric vehicles"],
                            ["What recent actions has Trump taken regarding tariffs?"],
                            ["What are the recent papers on large language model alignment?"],
                            ["What are the main research findings on climate change adaptation strategies in agriculture?"]
                        ],
                        inputs=search_query_input
                    )
@ -572,7 +580,10 @@ class GradioInterface:
                            ["What are the latest advancements in quantum computing?"],
                            ["Compare transformer and RNN architectures for NLP tasks"],
                            ["Explain the environmental impact of electric vehicles"],
-                            ["Explain the potential relationship between creatine supplementation and muscle loss due to GLP1-ar drugs for weight loss."]
+                            ["Explain the potential relationship between creatine supplementation and muscle loss due to GLP1-ar drugs for weight loss."],
                            ["What recent actions has Trump taken regarding tariffs?"],
                            ["What are the recent papers on large language model alignment?"],
                            ["What are the main research findings on climate change adaptation strategies in agriculture?"]
                        ],
                        inputs=report_query_input
                    )