massive changes
This commit is contained in:
parent
b6b50e4ef8
commit
12b453a14f
|
@ -0,0 +1,5 @@
|
||||||
|
Review the contensts of .note/ before modifying any files.
|
||||||
|
|
||||||
|
After each major successful test, please commit the changes to the repository with a meaningful commit message.
|
||||||
|
|
||||||
|
Update the contents of .note/ after each major change.
|
|
@ -51,3 +51,4 @@ logs/
|
||||||
# Database files
|
# Database files
|
||||||
*.db
|
*.db
|
||||||
report/database/*.db
|
report/database/*.db
|
||||||
|
config/config.yaml
|
||||||
|
|
|
@ -583,281 +583,103 @@ In this session, we fixed issues in the Gradio UI for report generation and plan
|
||||||
3. Test the current implementation with various query types to identify any remaining issues
|
3. Test the current implementation with various query types to identify any remaining issues
|
||||||
4. Update the documentation to reflect the new features and future plans
|
4. Update the documentation to reflect the new features and future plans
|
||||||
|
|
||||||
## Session: 2025-02-28: Google Gemini Integration and Reference Formatting
|
## Session: 2025-03-12 - Query Type Selection in Gradio UI
|
||||||
|
|
||||||
### Overview
|
### Overview
|
||||||
Fixed the integration of Google Gemini models with LiteLLM, and fixed reference formatting issues.
|
In this session, we enhanced the Gradio UI by adding a query type selection dropdown, allowing users to explicitly select the query type (factual, exploratory, comparative) instead of relying on automatic detection.
|
||||||
|
|
||||||
### Key Activities
|
### Key Activities
|
||||||
1. **Fixed Google Gemini Integration**:
|
1. **Added Query Type Selection to Gradio UI**:
|
||||||
- Updated the model format to `gemini/gemini-2.0-flash` in config.yaml
|
- Added a dropdown menu for query type selection in the "Generate Report" tab
|
||||||
- Modified message formatting for Gemini models in LLM interface
|
- Included options for "auto-detect", "factual", "exploratory", and "comparative"
|
||||||
- Added proper handling for the 'gemini' provider in environment variable setup
|
- Added descriptive tooltips explaining each query type
|
||||||
|
- Set "auto-detect" as the default option
|
||||||
|
|
||||||
2. **Fixed Reference Formatting Issues**:
|
2. **Updated Report Generation Logic**:
|
||||||
- Enhanced the instructions for reference formatting to ensure URLs are included
|
- Modified the `generate_report` method in the `GradioInterface` class to handle the new query_type parameter
|
||||||
- Added a recovery mechanism for truncated references
|
- Updated the report button click handler to pass the query type to the generate_report method
|
||||||
- Improved context preparation to better extract URLs for references
|
- Added logging to show when a user-selected query type is being used
|
||||||
|
|
||||||
3. **Converted LLM Interface Methods to Async**:
|
3. **Enhanced Report Generator**:
|
||||||
- Made `generate_completion`, `classify_query`, and `enhance_query` methods async
|
- Updated the `generate_report` method in the `ReportGenerator` class to accept a query_type parameter
|
||||||
- Updated dependent code to properly await these methods
|
- Modified the report synthesizer calls to pass the query_type parameter
|
||||||
- Fixed runtime errors related to async/await patterns
|
- Added logging to track query type usage
|
||||||
|
|
||||||
### Key Insights
|
4. **Added Documentation**:
|
||||||
- Gemini models require special message formatting (using 'user' and 'model' roles instead of 'system' and 'assistant')
|
- Added a "Query Types" section to the Gradio UI explaining each query type
|
||||||
- References were getting cut off due to token limits, requiring a separate generation step
|
- Included examples of when to use each query type
|
||||||
- The async conversion was necessary to properly handle async LLM calls throughout the codebase
|
- Updated code comments to explain the query type parameter
|
||||||
|
|
||||||
|
### Insights
|
||||||
|
- Explicit query type selection gives users more control over the report generation process
|
||||||
|
- Different query types benefit from specialized report templates and structures
|
||||||
|
- The auto-detect option provides convenience while still allowing manual override
|
||||||
|
- Clear documentation helps users understand when to use each query type
|
||||||
|
|
||||||
### Challenges
|
### Challenges
|
||||||
- Ensuring that the templates produce appropriate output for each detail level
|
|
||||||
- Balancing between speed and quality for different detail levels
|
|
||||||
- Managing token budgets effectively across different detail levels
|
|
||||||
- Ensuring backward compatibility with existing code
|
- Ensuring backward compatibility with existing code
|
||||||
|
- Maintaining the auto-detect functionality while adding manual selection
|
||||||
|
- Passing the query type parameter through multiple layers of the application
|
||||||
|
- Providing clear explanations of query types for users
|
||||||
|
|
||||||
### Next Steps
|
### Next Steps
|
||||||
1. Continue testing with Gemini models to ensure stable operation
|
1. Test the query type selection with various queries to ensure it works correctly
|
||||||
2. Consider adding more robust error handling for LLM provider-specific issues
|
2. Gather user feedback on the usefulness of manual query type selection
|
||||||
3. Improve the reference formatting further if needed
|
3. Consider adding more specialized templates for specific query types
|
||||||
|
4. Explore adding query type detection confidence scores to help users decide when to override
|
||||||
|
5. Add examples of each query type to help users understand the differences
|
||||||
|
|
||||||
## Session: 2025-02-28: Fixing Reference Formatting and Async Implementation
|
## Session: 2025-03-12 - Fixed Query Type Parameter Bug
|
||||||
|
|
||||||
### Overview
|
### Overview
|
||||||
Fixed reference formatting issues with Gemini models and updated the codebase to properly handle async methods.
|
Fixed a bug in the report generation process where the `query_type` parameter was not properly handled, causing an error when it was `None`.
|
||||||
|
|
||||||
### Key Activities
|
### Key Activities
|
||||||
1. **Enhanced Reference Formatting**:
|
1. **Fixed NoneType Error in Report Synthesis**:
|
||||||
- Improved instructions to emphasize including URLs for each reference
|
- Added a null check in the `_get_extraction_prompt` method in `report_synthesis.py`
|
||||||
- Added duplicate URL fields in the context to ensure URLs are captured
|
- Modified the condition that checks for comparative queries to handle the case where `query_type` is `None`
|
||||||
- Updated the reference generation prompt to explicitly request URLs
|
- Ensured the method works correctly regardless of whether a query type is explicitly provided
|
||||||
- Added a separate reference generation step to handle truncated references
|
|
||||||
|
|
||||||
2. **Fixed Async Implementation**:
|
2. **Root Cause Analysis**:
|
||||||
- Converted all LLM interface methods to async for proper handling
|
- Identified that the error occurred when the `query_type` parameter was `None` and the code tried to call `.lower()` on it
|
||||||
- Updated QueryProcessor's generate_search_queries method to be async
|
- Traced the issue through the call chain from the UI to the report generator to the report synthesizer
|
||||||
- Modified query_to_report.py to correctly await async methods
|
- Confirmed that the fix addresses the specific error message: `'NoneType' object has no attribute 'lower'`
|
||||||
- Fixed runtime errors related to async/await patterns
|
|
||||||
|
|
||||||
3. **Updated Gradio Interface**:
|
|
||||||
- Modified the generate_report method to properly handle async operations
|
|
||||||
- Updated the report button click handler to correctly pass parameters
|
|
||||||
- Fixed the parameter order in the lambda function for async execution
|
|
||||||
- Improved error handling in the UI
|
|
||||||
|
|
||||||
## Session: 2025-03-11
|
|
||||||
|
|
||||||
### Overview
|
|
||||||
|
|
||||||
Reorganized the project directory structure to improve maintainability and clarity, ensuring all components are properly organized into their respective directories.
|
|
||||||
|
|
||||||
### Key Activities
|
|
||||||
|
|
||||||
1. **Directory Structure Reorganization**:
|
|
||||||
|
|
||||||
- Created a dedicated `utils/` directory for utility scripts
|
|
||||||
- Moved `jina_similarity.py` to `utils/`
|
|
||||||
- Added `__init__.py` to make it a proper Python package
|
|
||||||
- Organized test files into subdirectories under `tests/`
|
|
||||||
- Created subdirectories for each module (query, execution, ranking, report, ui, integration)
|
|
||||||
- Added `__init__.py` files to all test directories
|
|
||||||
- Created an `examples/` directory with subdirectories for data and scripts
|
|
||||||
- Moved sample data to `examples/data/`
|
|
||||||
- Added `__init__.py` files to make them proper Python packages
|
|
||||||
- Added a dedicated `scripts/` directory for utility scripts
|
|
||||||
- Moved `query_to_report.py` to `scripts/`
|
|
||||||
|
|
||||||
2. **Pipeline Verification**:
|
|
||||||
|
|
||||||
- Tested the pipeline after reorganization to ensure functionality
|
|
||||||
- Verified that the UI works correctly with the new directory structure
|
|
||||||
- Confirmed that all imports are working properly with the new structure
|
|
||||||
|
|
||||||
3. **Embedding Usage Analysis**:
|
|
||||||
|
|
||||||
- Confirmed that the pipeline uses Jina AI's Embeddings API through the `JinaSimilarity` class
|
|
||||||
- Verified that the `JinaReranker` class uses embeddings for document reranking
|
|
||||||
- Analyzed how embeddings are integrated into the search and ranking process
|
|
||||||
|
|
||||||
### Insights
|
### Insights
|
||||||
|
- Proper null checking is essential when working with optional parameters that are passed through multiple layers
|
||||||
- A well-organized directory structure significantly improves code maintainability and readability
|
- The error occurred in the report synthesis module but was triggered by the UI's query type selection feature
|
||||||
- Using proper Python package structure with `__init__.py` files ensures clean imports
|
- The fix maintains backward compatibility while ensuring the new query type selection feature works correctly
|
||||||
- Separating tests, utilities, examples, and scripts into dedicated directories makes the codebase more navigable
|
|
||||||
- The Jina AI embeddings are used throughout the pipeline for semantic similarity and document reranking
|
|
||||||
|
|
||||||
### Challenges
|
|
||||||
|
|
||||||
- Ensuring all import statements are updated correctly after moving files
|
|
||||||
- Maintaining backward compatibility with existing code
|
|
||||||
- Verifying that all components still work together after reorganization
|
|
||||||
|
|
||||||
### Next Steps
|
### Next Steps
|
||||||
|
1. Test the fix with various query types to ensure it works correctly
|
||||||
|
2. Consider adding similar null checks in other parts of the code that handle the query_type parameter
|
||||||
|
3. Add more comprehensive error handling throughout the report generation process
|
||||||
|
4. Update the test suite to include tests for null query_type values
|
||||||
|
|
||||||
1. Run comprehensive tests to ensure all functionality works with the new directory structure
|
## Session: 2025-03-12 - Fixed Template Retrieval for Null Query Type
|
||||||
2. Update any remaining documentation to reflect the new directory structure
|
|
||||||
3. Consider moving the remaining test files in the root of the `tests/` directory to appropriate subdirectories
|
|
||||||
4. Review import statements throughout the codebase to ensure they follow the new structure
|
|
||||||
|
|
||||||
### Key Insights
|
|
||||||
- Async/await patterns need to be consistently applied throughout the codebase
|
|
||||||
- Reference formatting requires explicit instructions to include URLs
|
|
||||||
- Gradio's interface needs special handling for async functions
|
|
||||||
|
|
||||||
### Challenges
|
|
||||||
- Ensuring that all async methods are properly awaited
|
|
||||||
- Balancing between detailed instructions and token limits for reference generation
|
|
||||||
- Managing the increased processing time for async operations
|
|
||||||
|
|
||||||
### Next Steps
|
|
||||||
1. Continue testing with Gemini models to ensure stable operation
|
|
||||||
2. Consider adding more robust error handling for LLM provider-specific issues
|
|
||||||
3. Improve the reference formatting further if needed
|
|
||||||
4. Update documentation to reflect the changes made to the LLM interface
|
|
||||||
5. Consider adding more unit tests for the async methods
|
|
||||||
|
|
||||||
## Session: 2025-02-28: Fixed NoneType Error in Report Synthesis
|
|
||||||
|
|
||||||
### Issue
|
|
||||||
Encountered an error during report generation:
|
|
||||||
```
|
|
||||||
TypeError: 'NoneType' object is not subscriptable
|
|
||||||
```
|
|
||||||
|
|
||||||
The error occurred in the `map_document_chunks` method of the `ReportSynthesizer` class when trying to slice a title that was `None`.
|
|
||||||
|
|
||||||
### Changes Made
|
|
||||||
1. Fixed the chunk counter in `map_document_chunks` method:
|
|
||||||
- Used a separate counter for individual chunks instead of using the batch index
|
|
||||||
- Added a null check for chunk titles with a fallback to 'Untitled'
|
|
||||||
|
|
||||||
2. Added defensive code in `synthesize_report` method:
|
|
||||||
- Added code to ensure all chunks have a title before processing
|
|
||||||
- Added null checks for title fields
|
|
||||||
|
|
||||||
3. Updated the `DocumentProcessor` class:
|
|
||||||
- Modified `process_documents_for_report` to ensure all chunks have a title
|
|
||||||
- Updated `chunk_document_by_sections`, `chunk_document_fixed_size`, and `chunk_document_hierarchical` methods to handle None titles
|
|
||||||
- Added default 'Untitled' value for all title fields
|
|
||||||
|
|
||||||
### Testing
|
|
||||||
The changes were tested with a report generation task that previously failed, and the error was resolved.
|
|
||||||
|
|
||||||
### Next Steps
|
|
||||||
1. Consider adding more comprehensive null checks throughout the codebase
|
|
||||||
2. Add unit tests to verify proper handling of missing or null fields
|
|
||||||
3. Implement better error handling and recovery mechanisms
|
|
||||||
|
|
||||||
## Session: 2025-03-11
|
|
||||||
|
|
||||||
### Overview
|
### Overview
|
||||||
Focused on resolving issues with the report generation template system and ensuring that different detail levels and query types work correctly in the report synthesis process.
|
Fixed a second issue in the report generation process where the template retrieval was failing when the `query_type` parameter was `None`.
|
||||||
|
|
||||||
### Key Activities
|
### Key Activities
|
||||||
1. **Fixed Template Retrieval Issues**:
|
1. **Fixed Template Retrieval for Null Query Type**:
|
||||||
- Updated the `get_template` method in the `ReportTemplateManager` to ensure it retrieves templates correctly based on query type and detail level
|
- Updated the `_get_template_from_strings` method in `report_synthesis.py` to handle `None` query_type
|
||||||
- Implemented a helper method `_get_template_from_strings` in the `ReportSynthesizer` to convert string values for query types and detail levels to their respective enum objects
|
- Added a default value of "exploratory" when query_type is `None`
|
||||||
- Added better logging for template retrieval process to aid in debugging
|
- Modified the method signature to explicitly indicate that query_type_str can be `None`
|
||||||
|
- Added logging to indicate when the default query type is being used
|
||||||
|
|
||||||
2. **Tested All Detail Levels and Query Types**:
|
2. **Root Cause Analysis**:
|
||||||
- Created a comprehensive test script `test_all_detail_levels.py` to test all combinations of detail levels and query types
|
- Identified that the error occurred when trying to convert `None` to a `QueryType` enum value
|
||||||
- Successfully tested all detail levels (brief, standard, detailed, comprehensive) with factual queries
|
- The error message was: "No template found for None standard" and "None is not a valid QueryType"
|
||||||
- Successfully tested all detail levels with exploratory queries
|
- The issue was in the template retrieval process which is used by both standard and progressive report synthesis
|
||||||
- Successfully tested all detail levels with comparative queries
|
|
||||||
|
|
||||||
3. **Improved Error Handling**:
|
|
||||||
- Added fallback to standard templates if specific templates are not found
|
|
||||||
- Enhanced logging to track whether templates are found during the synthesis process
|
|
||||||
|
|
||||||
4. **Code Organization**:
|
|
||||||
- Removed duplicate `ReportTemplateManager` and `ReportTemplate` classes from `report_synthesis.py`
|
|
||||||
- Used the imported versions from `report_templates.py` for better code maintainability
|
|
||||||
|
|
||||||
### Insights
|
### Insights
|
||||||
- The template system is now working correctly for all combinations of query types and detail levels
|
- When fixing one issue with optional parameters, it's important to check for similar issues in related code paths
|
||||||
- Proper logging is essential for debugging template retrieval issues
|
- Providing sensible defaults for optional parameters helps maintain robustness
|
||||||
- Converting string values to enum objects is necessary for consistent template retrieval
|
- Proper error handling and logging helps diagnose issues in complex systems with multiple layers
|
||||||
- Having a dedicated test script for all combinations helps ensure comprehensive coverage
|
|
||||||
|
|
||||||
### Challenges
|
|
||||||
- Initially encountered issues where templates were not found during report synthesis, leading to `ValueError`
|
|
||||||
- Needed to ensure that the correct classes and methods were used for template retrieval
|
|
||||||
|
|
||||||
### Next Steps
|
### Next Steps
|
||||||
1. Conduct additional testing with real-world queries and document sets
|
1. Test the fix with comprehensive reports to ensure it works correctly
|
||||||
2. Compare the analytical depth and quality of reports generated with different detail levels
|
2. Consider adding similar default values for other optional parameters
|
||||||
3. Gather user feedback on the improved reports at different detail levels
|
3. Review the codebase for other potential null reference issues
|
||||||
4. Further refine the detail level configurations based on testing and feedback
|
4. Update documentation to clarify the behavior when optional parameters are not provided
|
||||||
|
|
||||||
## Session: 2025-03-12 - Report Templates and Progressive Report Generation
|
|
||||||
|
|
||||||
### Overview
|
|
||||||
Implemented a dedicated report templates module to standardize report generation across different query types and detail levels, and implemented progressive report generation for comprehensive reports.
|
|
||||||
|
|
||||||
### Key Activities
|
|
||||||
1. **Created Report Templates Module**:
|
|
||||||
- Developed a new `report_templates.py` module with a comprehensive template system
|
|
||||||
- Implemented `QueryType` enum for categorizing queries (FACTUAL, EXPLORATORY, COMPARATIVE)
|
|
||||||
- Created `DetailLevel` enum for different report detail levels (BRIEF, STANDARD, DETAILED, COMPREHENSIVE)
|
|
||||||
- Designed a `ReportTemplate` class with validation for required sections
|
|
||||||
- Implemented a `ReportTemplateManager` to manage and retrieve templates
|
|
||||||
|
|
||||||
2. **Implemented Template Variations**:
|
|
||||||
- Created 12 different templates (3 query types × 4 detail levels)
|
|
||||||
- Designed templates with appropriate sections for each combination
|
|
||||||
- Added placeholders for dynamic content in each template
|
|
||||||
- Ensured templates follow a consistent structure while adapting to specific needs
|
|
||||||
|
|
||||||
3. **Added Testing**:
|
|
||||||
- Created `test_report_templates.py` to verify template retrieval and validation
|
|
||||||
- Implemented `test_brief_report.py` to test brief report generation with a simple query
|
|
||||||
- Verified that all templates can be correctly retrieved and used
|
|
||||||
|
|
||||||
4. **Implemented Progressive Report Generation**:
|
|
||||||
- Created a new `progressive_report_synthesis.py` module with a `ProgressiveReportSynthesizer` class
|
|
||||||
- Implemented chunk prioritization algorithm based on relevance scores
|
|
||||||
- Developed iterative refinement process with specialized prompts
|
|
||||||
- Added state management to track report versions and processed chunks
|
|
||||||
- Implemented termination conditions (all chunks processed, diminishing returns, max iterations)
|
|
||||||
- Added support for different models with adaptive batch sizing
|
|
||||||
- Implemented progress tracking and callback mechanism
|
|
||||||
- Created comprehensive test suite for progressive report generation
|
|
||||||
|
|
||||||
5. **Updated Report Generator**:
|
|
||||||
- Modified `report_generator.py` to use the progressive report synthesizer for comprehensive detail level
|
|
||||||
- Created a hybrid system that uses standard map-reduce for brief/standard/detailed levels
|
|
||||||
- Added proper model selection and configuration for both synthesizers
|
|
||||||
|
|
||||||
6. **Updated Memory Bank**:
|
|
||||||
- Added report templates information to code_structure.md
|
|
||||||
- Updated current_focus.md with implementation details for progressive report generation
|
|
||||||
- Updated session_log.md with details about the implementation
|
|
||||||
- Ensured all new files are properly documented
|
|
||||||
|
|
||||||
### Insights
|
|
||||||
- A standardized template system significantly improves report consistency
|
|
||||||
- Different query types require specialized report structures
|
|
||||||
- Validation ensures all required sections are present in templates
|
|
||||||
- Enums provide type safety and prevent errors from string comparisons
|
|
||||||
- Progressive report generation provides better results for very large document collections
|
|
||||||
- The hybrid approach leverages the strengths of both map-reduce and progressive methods
|
|
||||||
- Tracking improvement scores helps detect diminishing returns and optimize processing
|
|
||||||
- Adaptive batch sizing based on model context window improves efficiency
|
|
||||||
|
|
||||||
### Challenges
|
|
||||||
- Designing templates that are flexible enough for various content types
|
|
||||||
- Balancing between standardization and customization for different query types
|
|
||||||
- Ensuring proper integration with the existing report synthesis process
|
|
||||||
- Managing state and tracking progress in progressive report generation
|
|
||||||
- Preventing entrenchment of initial report structure in progressive approach
|
|
||||||
- Optimizing token usage when sending entire reports for refinement
|
|
||||||
- Determining appropriate termination conditions for the progressive approach
|
|
||||||
|
|
||||||
### Next Steps
|
|
||||||
1. Integrate the progressive approach with the UI
|
|
||||||
- Implement controls to pause, resume, or terminate the process
|
|
||||||
- Create a preview mode to see the current report state
|
|
||||||
- Add options to compare different versions of the report
|
|
||||||
2. Conduct additional testing with real-world queries and document sets
|
|
||||||
3. Add specialized templates for specific research domains
|
|
||||||
4. Implement template customization options for users
|
|
||||||
5. Implement visualization components for data mentioned in reports
|
|
||||||
|
|
|
@ -1,157 +0,0 @@
|
||||||
# Example configuration file for the intelligent research system
|
|
||||||
# Rename this file to config.yaml and fill in your API keys and settings
|
|
||||||
|
|
||||||
# API keys (alternatively, set environment variables)
|
|
||||||
api_keys:
|
|
||||||
openai: "your-openai-api-key" # Or set OPENAI_API_KEY environment variable
|
|
||||||
jina: "your-jina-api-key" # Or set JINA_API_KEY environment variable
|
|
||||||
serper: "your-serper-api-key" # Or set SERPER_API_KEY environment variable
|
|
||||||
google: "your-google-api-key" # Or set GOOGLE_API_KEY environment variable
|
|
||||||
anthropic: "your-anthropic-api-key" # Or set ANTHROPIC_API_KEY environment variable
|
|
||||||
openrouter: "your-openrouter-api-key" # Or set OPENROUTER_API_KEY environment variable
|
|
||||||
groq: "your-groq-api-key" # Or set GROQ_API_KEY environment variable
|
|
||||||
|
|
||||||
# LLM model configurations
|
|
||||||
models:
|
|
||||||
gpt-3.5-turbo:
|
|
||||||
provider: "openai"
|
|
||||||
temperature: 0.7
|
|
||||||
max_tokens: 1000
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: null # Use default OpenAI endpoint
|
|
||||||
|
|
||||||
gpt-4:
|
|
||||||
provider: "openai"
|
|
||||||
temperature: 0.5
|
|
||||||
max_tokens: 2000
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: null # Use default OpenAI endpoint
|
|
||||||
|
|
||||||
claude-2:
|
|
||||||
provider: "anthropic"
|
|
||||||
temperature: 0.7
|
|
||||||
max_tokens: 1500
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: null # Use default Anthropic endpoint
|
|
||||||
|
|
||||||
azure-gpt-4:
|
|
||||||
provider: "azure"
|
|
||||||
temperature: 0.5
|
|
||||||
max_tokens: 2000
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: "https://your-azure-endpoint.openai.azure.com"
|
|
||||||
deployment_name: "your-deployment-name"
|
|
||||||
api_version: "2023-05-15"
|
|
||||||
|
|
||||||
local-llama:
|
|
||||||
provider: "ollama"
|
|
||||||
temperature: 0.8
|
|
||||||
max_tokens: 1000
|
|
||||||
endpoint: "http://localhost:11434/api/generate"
|
|
||||||
model_name: "llama2"
|
|
||||||
|
|
||||||
llama-3.1-8b-instant:
|
|
||||||
provider: "groq"
|
|
||||||
model_name: "llama-3.1-8b-instant"
|
|
||||||
temperature: 0.7
|
|
||||||
max_tokens: 1024
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: "https://api.groq.com/openai/v1"
|
|
||||||
|
|
||||||
llama-3.3-70b-versatile:
|
|
||||||
provider: "groq"
|
|
||||||
model_name: "llama-3.3-70b-versatile"
|
|
||||||
temperature: 0.5
|
|
||||||
max_tokens: 2048
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: "https://api.groq.com/openai/v1"
|
|
||||||
|
|
||||||
openrouter-mixtral:
|
|
||||||
provider: "openrouter"
|
|
||||||
model_name: "mistralai/mixtral-8x7b-instruct"
|
|
||||||
temperature: 0.7
|
|
||||||
max_tokens: 1024
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: "https://openrouter.ai/api/v1"
|
|
||||||
|
|
||||||
openrouter-claude:
|
|
||||||
provider: "openrouter"
|
|
||||||
model_name: "anthropic/claude-3-opus"
|
|
||||||
temperature: 0.5
|
|
||||||
max_tokens: 2048
|
|
||||||
top_p: 1.0
|
|
||||||
endpoint: "https://openrouter.ai/api/v1"
|
|
||||||
|
|
||||||
gemini-2.0-flash:
|
|
||||||
provider: "gemini"
|
|
||||||
model_name: "gemini-2.0-flash"
|
|
||||||
temperature: 0.5
|
|
||||||
max_tokens: 2048
|
|
||||||
top_p: 1.0
|
|
||||||
|
|
||||||
# Default model to use if not specified for a module
|
|
||||||
default_model: "llama-3.1-8b-instant" # Using Groq's Llama 3.1 8B model for testing
|
|
||||||
|
|
||||||
# Module-specific model assignments
|
|
||||||
module_models:
|
|
||||||
# Query processing module
|
|
||||||
query_processing:
|
|
||||||
enhance_query: "llama-3.1-8b-instant" # Use Groq's Llama 3.1 8B for query enhancement
|
|
||||||
classify_query: "llama-3.1-8b-instant" # Use Groq's Llama 3.1 8B for classification
|
|
||||||
generate_search_queries: "llama-3.1-8b-instant" # Use Groq's Llama 3.1 8B for generating search queries
|
|
||||||
|
|
||||||
# Search strategy module
|
|
||||||
search_strategy:
|
|
||||||
develop_strategy: "llama-3.1-8b-instant" # Use Groq's Llama 3.1 8B for developing search strategies
|
|
||||||
target_selection: "llama-3.1-8b-instant" # Use Groq's Llama 3.1 8B for target selection
|
|
||||||
|
|
||||||
# Document ranking module
|
|
||||||
document_ranking:
|
|
||||||
rerank_documents: "jina-reranker" # Use Jina's reranker for document reranking
|
|
||||||
|
|
||||||
# Report generation module
|
|
||||||
report_generation:
|
|
||||||
synthesize_report: "gemini-2.0-flash" # Use Google's Gemini 2.0 Flash for report synthesis
|
|
||||||
format_report: "llama-3.1-8b-instant" # Use Groq's Llama 3.1 8B for formatting
|
|
||||||
|
|
||||||
# Search engine configurations
|
|
||||||
search_engines:
|
|
||||||
google:
|
|
||||||
enabled: true
|
|
||||||
max_results: 10
|
|
||||||
|
|
||||||
serper:
|
|
||||||
enabled: true
|
|
||||||
max_results: 10
|
|
||||||
|
|
||||||
jina:
|
|
||||||
enabled: true
|
|
||||||
max_results: 10
|
|
||||||
|
|
||||||
scholar:
|
|
||||||
enabled: false
|
|
||||||
max_results: 5
|
|
||||||
|
|
||||||
arxiv:
|
|
||||||
enabled: false
|
|
||||||
max_results: 5
|
|
||||||
|
|
||||||
# Jina AI specific configurations
|
|
||||||
jina:
|
|
||||||
reranker:
|
|
||||||
model: "jina-reranker-v2-base-multilingual" # Default reranker model
|
|
||||||
top_n: 10 # Default number of top results to return
|
|
||||||
|
|
||||||
# UI configuration
|
|
||||||
ui:
|
|
||||||
theme: "light" # light or dark
|
|
||||||
port: 7860
|
|
||||||
share: false
|
|
||||||
title: "Intelligent Research System"
|
|
||||||
description: "An automated system for finding, filtering, and synthesizing information"
|
|
||||||
|
|
||||||
# System settings
|
|
||||||
system:
|
|
||||||
cache_dir: "data/cache"
|
|
||||||
results_dir: "data/results"
|
|
||||||
log_level: "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
|
|
|
@ -0,0 +1,88 @@
|
||||||
|
"""
|
||||||
|
Example script for using the academic search handlers.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
# Add the project root to the Python path
|
||||||
|
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
|
||||||
|
|
||||||
|
from execution.search_executor import SearchExecutor
|
||||||
|
from query.query_processor import get_query_processor
|
||||||
|
from config.config import get_config
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
"""Run a sample academic search."""
|
||||||
|
# Initialize components
|
||||||
|
query_processor = get_query_processor()
|
||||||
|
search_executor = SearchExecutor()
|
||||||
|
|
||||||
|
# Get a list of available search engines
|
||||||
|
available_engines = search_executor.get_available_search_engines()
|
||||||
|
print(f"Available search engines: {', '.join(available_engines)}")
|
||||||
|
|
||||||
|
# Check if academic search engines are available
|
||||||
|
academic_engines = ["openalex", "core", "scholar", "arxiv"]
|
||||||
|
available_academic = [engine for engine in academic_engines if engine in available_engines]
|
||||||
|
|
||||||
|
if not available_academic:
|
||||||
|
print("No academic search engines are available. Please check your configuration.")
|
||||||
|
return
|
||||||
|
else:
|
||||||
|
print(f"Available academic search engines: {', '.join(available_academic)}")
|
||||||
|
|
||||||
|
# Prompt for the query
|
||||||
|
query = input("Enter your academic research query: ") or "What are the latest papers on large language model alignment?"
|
||||||
|
|
||||||
|
print(f"\nProcessing query: {query}")
|
||||||
|
|
||||||
|
# Process the query
|
||||||
|
start_time = datetime.now()
|
||||||
|
structured_query = await query_processor.process_query(query)
|
||||||
|
|
||||||
|
# Add academic query flag
|
||||||
|
structured_query["is_academic"] = True
|
||||||
|
|
||||||
|
# Generate search queries optimized for each engine
|
||||||
|
structured_query = await query_processor.generate_search_queries(
|
||||||
|
structured_query, available_academic
|
||||||
|
)
|
||||||
|
|
||||||
|
# Print the optimized queries
|
||||||
|
print("\nOptimized queries for academic search:")
|
||||||
|
for engine in available_academic:
|
||||||
|
print(f"\n{engine.upper()} queries:")
|
||||||
|
for i, q in enumerate(structured_query.get("search_queries", {}).get(engine, [])):
|
||||||
|
print(f"{i+1}. {q}")
|
||||||
|
|
||||||
|
# Execute the search
|
||||||
|
results = await search_executor.execute_search_async(
|
||||||
|
structured_query,
|
||||||
|
search_engines=available_academic,
|
||||||
|
num_results=5
|
||||||
|
)
|
||||||
|
|
||||||
|
# Print the results
|
||||||
|
total_results = sum(len(engine_results) for engine_results in results.values())
|
||||||
|
print(f"\nFound {total_results} academic results:")
|
||||||
|
|
||||||
|
for engine, engine_results in results.items():
|
||||||
|
print(f"\n--- {engine.upper()} Results ({len(engine_results)}) ---")
|
||||||
|
for i, result in enumerate(engine_results):
|
||||||
|
print(f"\n{i+1}. {result.get('title', 'No title')}")
|
||||||
|
print(f"Authors: {result.get('authors', 'Unknown')}")
|
||||||
|
print(f"Year: {result.get('year', 'Unknown')}")
|
||||||
|
print(f"Access: {result.get('access_status', 'Unknown')}")
|
||||||
|
print(f"URL: {result.get('url', 'No URL')}")
|
||||||
|
print(f"Snippet: {result.get('snippet', 'No snippet')[0:200]}...")
|
||||||
|
|
||||||
|
end_time = datetime.now()
|
||||||
|
print(f"\nSearch completed in {(end_time - start_time).total_seconds():.2f} seconds")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
|
@ -0,0 +1,76 @@
|
||||||
|
"""
|
||||||
|
Example script for using the news search handler.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
# Add the project root to the Python path
|
||||||
|
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
|
||||||
|
|
||||||
|
from execution.search_executor import SearchExecutor
|
||||||
|
from query.query_processor import get_query_processor
|
||||||
|
from config.config import get_config
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
"""Run a sample news search."""
|
||||||
|
# Initialize components
|
||||||
|
query_processor = get_query_processor()
|
||||||
|
search_executor = SearchExecutor()
|
||||||
|
|
||||||
|
# Get a list of available search engines
|
||||||
|
available_engines = search_executor.get_available_search_engines()
|
||||||
|
print(f"Available search engines: {', '.join(available_engines)}")
|
||||||
|
|
||||||
|
# Check if news search is available
|
||||||
|
if "news" not in available_engines:
|
||||||
|
print("News search is not available. Please check your NewsAPI configuration.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Prompt for the query
|
||||||
|
query = input("Enter your query about recent events: ") or "Trump tariffs latest announcement"
|
||||||
|
|
||||||
|
print(f"\nProcessing query: {query}")
|
||||||
|
|
||||||
|
# Process the query
|
||||||
|
start_time = datetime.now()
|
||||||
|
structured_query = await query_processor.process_query(query)
|
||||||
|
|
||||||
|
# Generate search queries optimized for each engine
|
||||||
|
structured_query = await query_processor.generate_search_queries(
|
||||||
|
structured_query, ["news"]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Print the optimized queries
|
||||||
|
print("\nOptimized queries for news search:")
|
||||||
|
for i, q in enumerate(structured_query.get("search_queries", {}).get("news", [])):
|
||||||
|
print(f"{i+1}. {q}")
|
||||||
|
|
||||||
|
# Execute the search
|
||||||
|
results = await search_executor.execute_search_async(
|
||||||
|
structured_query,
|
||||||
|
search_engines=["news"],
|
||||||
|
num_results=10
|
||||||
|
)
|
||||||
|
|
||||||
|
# Print the results
|
||||||
|
news_results = results.get("news", [])
|
||||||
|
print(f"\nFound {len(news_results)} news results:")
|
||||||
|
|
||||||
|
for i, result in enumerate(news_results):
|
||||||
|
print(f"\n--- Result {i+1} ---")
|
||||||
|
print(f"Title: {result.get('title', 'No title')}")
|
||||||
|
print(f"Source: {result.get('source', 'Unknown')}")
|
||||||
|
print(f"Date: {result.get('published_date', 'Unknown date')}")
|
||||||
|
print(f"URL: {result.get('url', 'No URL')}")
|
||||||
|
print(f"Snippet: {result.get('snippet', 'No snippet')[0:200]}...")
|
||||||
|
|
||||||
|
end_time = datetime.now()
|
||||||
|
print(f"\nSearch completed in {(end_time - start_time).total_seconds():.2f} seconds")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
|
@ -0,0 +1,160 @@
|
||||||
|
"""
|
||||||
|
CORE.ac.uk API handler.
|
||||||
|
Provides access to open access academic papers from institutional repositories.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import requests
|
||||||
|
from typing import Dict, List, Any, Optional
|
||||||
|
|
||||||
|
from .base_handler import BaseSearchHandler
|
||||||
|
from config.config import get_config, get_api_key
|
||||||
|
|
||||||
|
|
||||||
|
class CoreSearchHandler(BaseSearchHandler):
|
||||||
|
"""Handler for CORE.ac.uk academic search API."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize the CORE search handler."""
|
||||||
|
self.config = get_config()
|
||||||
|
self.api_key = get_api_key("core")
|
||||||
|
self.base_url = "https://api.core.ac.uk/v3/search/works"
|
||||||
|
self.available = self.api_key is not None
|
||||||
|
|
||||||
|
# Get any custom settings from config
|
||||||
|
self.academic_config = self.config.config_data.get("academic_search", {}).get("core", {})
|
||||||
|
|
||||||
|
def search(self, query: str, num_results: int = 10, **kwargs) -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Execute a search query using CORE.ac.uk.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: The search query to execute
|
||||||
|
num_results: Number of results to return
|
||||||
|
**kwargs: Additional search parameters:
|
||||||
|
- full_text: Whether to search in full text (default: True)
|
||||||
|
- filter_year: Filter by publication year or range
|
||||||
|
- sort: Sort by relevance or publication date
|
||||||
|
- repositories: Limit to specific repositories
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of search results with standardized format
|
||||||
|
"""
|
||||||
|
if not self.available:
|
||||||
|
raise ValueError("CORE API is not available. API key is missing.")
|
||||||
|
|
||||||
|
# Set up the request headers
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {self.api_key}",
|
||||||
|
"Content-Type": "application/json"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Set up the request body
|
||||||
|
body = {
|
||||||
|
"q": query,
|
||||||
|
"limit": num_results,
|
||||||
|
"offset": 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add full text search parameter
|
||||||
|
full_text = kwargs.get("full_text", True)
|
||||||
|
if full_text:
|
||||||
|
body["fields"] = ["title", "authors", "year", "abstract", "fullText"]
|
||||||
|
else:
|
||||||
|
body["fields"] = ["title", "authors", "year", "abstract"]
|
||||||
|
|
||||||
|
# Add year filter if specified
|
||||||
|
if "filter_year" in kwargs:
|
||||||
|
body["filters"] = [{"year": kwargs["filter_year"]}]
|
||||||
|
|
||||||
|
# Add sort parameter
|
||||||
|
if "sort" in kwargs:
|
||||||
|
if kwargs["sort"] == "date":
|
||||||
|
body["sort"] = [{"year": "desc"}]
|
||||||
|
else:
|
||||||
|
body["sort"] = [{"_score": "desc"}] # Default to relevance
|
||||||
|
|
||||||
|
# Add repository filter if specified
|
||||||
|
if "repositories" in kwargs:
|
||||||
|
if "filters" not in body:
|
||||||
|
body["filters"] = []
|
||||||
|
body["filters"].append({"repositoryIds": kwargs["repositories"]})
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Make the request
|
||||||
|
response = requests.post(self.base_url, headers=headers, json=body)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
# Parse the response
|
||||||
|
data = response.json()
|
||||||
|
|
||||||
|
# Process the results
|
||||||
|
results = []
|
||||||
|
for item in data.get("results", []):
|
||||||
|
# Extract authors
|
||||||
|
authors = []
|
||||||
|
for author in item.get("authors", [])[:3]:
|
||||||
|
author_name = author.get("name", "")
|
||||||
|
if author_name:
|
||||||
|
authors.append(author_name)
|
||||||
|
|
||||||
|
# Get publication year
|
||||||
|
pub_year = item.get("year", "Unknown")
|
||||||
|
|
||||||
|
# Get DOI
|
||||||
|
doi = item.get("doi", "")
|
||||||
|
|
||||||
|
# Determine URL - prefer the download URL if available
|
||||||
|
url = item.get("downloadUrl", "")
|
||||||
|
if not url and doi:
|
||||||
|
url = f"https://doi.org/{doi}"
|
||||||
|
if not url:
|
||||||
|
url = item.get("sourceFulltextUrls", [""])[0] if item.get("sourceFulltextUrls") else ""
|
||||||
|
|
||||||
|
# Create snippet from abstract or first part of full text
|
||||||
|
snippet = item.get("abstract", "")
|
||||||
|
if not snippet and "fullText" in item:
|
||||||
|
snippet = item.get("fullText", "")[:500] + "..."
|
||||||
|
|
||||||
|
# If no snippet is available, create one from metadata
|
||||||
|
if not snippet:
|
||||||
|
journal = item.get("publisher", "Unknown Journal")
|
||||||
|
snippet = f"Open access academic paper from {journal}. {pub_year}."
|
||||||
|
|
||||||
|
# Create the result
|
||||||
|
result = {
|
||||||
|
"title": item.get("title", "Untitled"),
|
||||||
|
"url": url,
|
||||||
|
"snippet": snippet,
|
||||||
|
"source": "core",
|
||||||
|
"authors": ", ".join(authors),
|
||||||
|
"year": pub_year,
|
||||||
|
"journal": item.get("publisher", ""),
|
||||||
|
"doi": doi,
|
||||||
|
"open_access": True # CORE only indexes open access content
|
||||||
|
}
|
||||||
|
|
||||||
|
results.append(result)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
print(f"Error executing CORE search: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_name(self) -> str:
|
||||||
|
"""Get the name of the search handler."""
|
||||||
|
return "core"
|
||||||
|
|
||||||
|
def is_available(self) -> bool:
|
||||||
|
"""Check if the CORE API is available."""
|
||||||
|
return self.available
|
||||||
|
|
||||||
|
def get_rate_limit_info(self) -> Dict[str, Any]:
|
||||||
|
"""Get information about the API's rate limits."""
|
||||||
|
# These limits are based on the free tier
|
||||||
|
return {
|
||||||
|
"requests_per_minute": 30,
|
||||||
|
"requests_per_day": 10000,
|
||||||
|
"current_usage": None
|
||||||
|
}
|
|
@ -0,0 +1,152 @@
|
||||||
|
"""
|
||||||
|
NewsAPI handler for current events searches.
|
||||||
|
Provides access to recent news articles from various sources.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import requests
|
||||||
|
import datetime
|
||||||
|
from typing import Dict, List, Any, Optional
|
||||||
|
|
||||||
|
from .base_handler import BaseSearchHandler
|
||||||
|
from config.config import get_config, get_api_key
|
||||||
|
|
||||||
|
|
||||||
|
class NewsSearchHandler(BaseSearchHandler):
|
||||||
|
"""Handler for NewsAPI.org for current events searches."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize the NewsAPI search handler."""
|
||||||
|
self.config = get_config()
|
||||||
|
self.api_key = get_api_key("newsapi")
|
||||||
|
self.base_url = "https://newsapi.org/v2/everything"
|
||||||
|
self.top_headlines_url = "https://newsapi.org/v2/top-headlines"
|
||||||
|
self.available = self.api_key is not None
|
||||||
|
|
||||||
|
def search(self, query: str, num_results: int = 10, **kwargs) -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Execute a search query using NewsAPI.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: The search query to execute
|
||||||
|
num_results: Number of results to return
|
||||||
|
**kwargs: Additional search parameters:
|
||||||
|
- days_back: Number of days back to search (default: 7)
|
||||||
|
- sort_by: Sort by criteria ("relevancy", "popularity", "publishedAt")
|
||||||
|
- language: Language code (default: "en")
|
||||||
|
- sources: Comma-separated list of news sources
|
||||||
|
- domains: Comma-separated list of domains
|
||||||
|
- use_headlines: Whether to use top headlines endpoint (default: False)
|
||||||
|
- country: Country code for headlines (default: "us")
|
||||||
|
- category: Category for headlines
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of search results with standardized format
|
||||||
|
"""
|
||||||
|
if not self.available:
|
||||||
|
raise ValueError("NewsAPI is not available. API key is missing.")
|
||||||
|
|
||||||
|
# Determine which endpoint to use
|
||||||
|
use_headlines = kwargs.get("use_headlines", False)
|
||||||
|
url = self.top_headlines_url if use_headlines else self.base_url
|
||||||
|
|
||||||
|
# Calculate date range
|
||||||
|
days_back = kwargs.get("days_back", 7)
|
||||||
|
end_date = datetime.datetime.now().strftime("%Y-%m-%d")
|
||||||
|
start_date = (datetime.datetime.now() - datetime.timedelta(days=days_back)).strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
# Set up the request parameters
|
||||||
|
params = {
|
||||||
|
"q": query,
|
||||||
|
"pageSize": num_results,
|
||||||
|
"apiKey": self.api_key,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add parameters for everything endpoint
|
||||||
|
if not use_headlines:
|
||||||
|
params["from"] = start_date
|
||||||
|
params["to"] = end_date
|
||||||
|
params["sortBy"] = kwargs.get("sort_by", "publishedAt")
|
||||||
|
|
||||||
|
if "language" in kwargs:
|
||||||
|
params["language"] = kwargs["language"]
|
||||||
|
else:
|
||||||
|
params["language"] = "en" # Default to English
|
||||||
|
|
||||||
|
if "sources" in kwargs:
|
||||||
|
params["sources"] = kwargs["sources"]
|
||||||
|
|
||||||
|
if "domains" in kwargs:
|
||||||
|
params["domains"] = kwargs["domains"]
|
||||||
|
# Add parameters for top-headlines endpoint
|
||||||
|
else:
|
||||||
|
if "country" in kwargs:
|
||||||
|
params["country"] = kwargs["country"]
|
||||||
|
else:
|
||||||
|
params["country"] = "us" # Default to US
|
||||||
|
|
||||||
|
if "category" in kwargs:
|
||||||
|
params["category"] = kwargs["category"]
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Make the request
|
||||||
|
response = requests.get(url, params=params)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
# Parse the response
|
||||||
|
data = response.json()
|
||||||
|
|
||||||
|
# Check if the request was successful
|
||||||
|
if data.get("status") != "ok":
|
||||||
|
print(f"NewsAPI error: {data.get('message', 'Unknown error')}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Process the results
|
||||||
|
results = []
|
||||||
|
for article in data.get("articles", []):
|
||||||
|
# Get the publication date with proper formatting
|
||||||
|
pub_date = article.get("publishedAt", "")
|
||||||
|
if pub_date:
|
||||||
|
try:
|
||||||
|
date_obj = datetime.datetime.fromisoformat(pub_date.replace("Z", "+00:00"))
|
||||||
|
formatted_date = date_obj.strftime("%Y-%m-%d %H:%M:%S")
|
||||||
|
except ValueError:
|
||||||
|
formatted_date = pub_date
|
||||||
|
else:
|
||||||
|
formatted_date = ""
|
||||||
|
|
||||||
|
# Create a standardized result
|
||||||
|
result = {
|
||||||
|
"title": article.get("title", ""),
|
||||||
|
"url": article.get("url", ""),
|
||||||
|
"snippet": article.get("description", ""),
|
||||||
|
"source": f"news:{article.get('source', {}).get('name', 'unknown')}",
|
||||||
|
"published_date": formatted_date,
|
||||||
|
"author": article.get("author", ""),
|
||||||
|
"image_url": article.get("urlToImage", ""),
|
||||||
|
"content": article.get("content", "")
|
||||||
|
}
|
||||||
|
results.append(result)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
print(f"Error executing NewsAPI search: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_name(self) -> str:
|
||||||
|
"""Get the name of the search handler."""
|
||||||
|
return "news"
|
||||||
|
|
||||||
|
def is_available(self) -> bool:
|
||||||
|
"""Check if the NewsAPI is available."""
|
||||||
|
return self.available
|
||||||
|
|
||||||
|
def get_rate_limit_info(self) -> Dict[str, Any]:
|
||||||
|
"""Get information about the API's rate limits."""
|
||||||
|
# These are based on NewsAPI's developer plan
|
||||||
|
return {
|
||||||
|
"requests_per_minute": 100,
|
||||||
|
"requests_per_day": 500, # Free tier limit
|
||||||
|
"current_usage": None # NewsAPI doesn't provide usage info in responses
|
||||||
|
}
|
|
@ -0,0 +1,180 @@
|
||||||
|
"""
|
||||||
|
OpenAlex API handler.
|
||||||
|
Provides access to academic research papers and scholarly information.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import requests
|
||||||
|
from typing import Dict, List, Any, Optional
|
||||||
|
|
||||||
|
from .base_handler import BaseSearchHandler
|
||||||
|
from config.config import get_config, get_api_key
|
||||||
|
|
||||||
|
|
||||||
|
class OpenAlexSearchHandler(BaseSearchHandler):
|
||||||
|
"""Handler for OpenAlex academic search API."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize the OpenAlex search handler."""
|
||||||
|
self.config = get_config()
|
||||||
|
# OpenAlex doesn't require an API key, but using an email is recommended
|
||||||
|
self.email = self.config.config_data.get("academic_search", {}).get("email", "user@example.com")
|
||||||
|
self.base_url = "https://api.openalex.org/works"
|
||||||
|
self.available = True # OpenAlex doesn't require an API key
|
||||||
|
|
||||||
|
# Get any custom settings from config
|
||||||
|
self.academic_config = self.config.config_data.get("academic_search", {}).get("openalex", {})
|
||||||
|
|
||||||
|
def search(self, query: str, num_results: int = 10, **kwargs) -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Execute a search query using OpenAlex.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: The search query to execute
|
||||||
|
num_results: Number of results to return
|
||||||
|
**kwargs: Additional search parameters:
|
||||||
|
- filter_type: Filter by work type (article, book, etc.)
|
||||||
|
- filter_year: Filter by publication year or range
|
||||||
|
- filter_open_access: Only return open access publications
|
||||||
|
- sort: Sort by relevance, citations, publication date
|
||||||
|
- filter_concept: Filter by academic concept/field
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of search results with standardized format
|
||||||
|
"""
|
||||||
|
# Build the search URL with parameters
|
||||||
|
params = {
|
||||||
|
"search": query,
|
||||||
|
"per_page": num_results,
|
||||||
|
"mailto": self.email # Good practice for the API
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add filters
|
||||||
|
filters = []
|
||||||
|
|
||||||
|
# Type filter (article, book, etc.)
|
||||||
|
if "filter_type" in kwargs:
|
||||||
|
filters.append(f"type.id:{kwargs['filter_type']}")
|
||||||
|
|
||||||
|
# Year filter
|
||||||
|
if "filter_year" in kwargs:
|
||||||
|
filters.append(f"publication_year:{kwargs['filter_year']}")
|
||||||
|
|
||||||
|
# Open access filter
|
||||||
|
if kwargs.get("filter_open_access", False):
|
||||||
|
filters.append("is_oa:true")
|
||||||
|
|
||||||
|
# Concept/field filter
|
||||||
|
if "filter_concept" in kwargs:
|
||||||
|
filters.append(f"concepts.id:{kwargs['filter_concept']}")
|
||||||
|
|
||||||
|
# Combine filters if there are any
|
||||||
|
if filters:
|
||||||
|
params["filter"] = ",".join(filters)
|
||||||
|
|
||||||
|
# Sort parameter
|
||||||
|
if "sort" in kwargs:
|
||||||
|
params["sort"] = kwargs["sort"]
|
||||||
|
else:
|
||||||
|
# Default to sorting by relevance score
|
||||||
|
params["sort"] = "relevance_score:desc"
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Make the request
|
||||||
|
response = requests.get(self.base_url, params=params)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
# Parse the response
|
||||||
|
data = response.json()
|
||||||
|
|
||||||
|
# Process the results
|
||||||
|
results = []
|
||||||
|
for item in data.get("results", []):
|
||||||
|
# Extract authors
|
||||||
|
authors = []
|
||||||
|
for author in item.get("authorships", [])[:3]:
|
||||||
|
author_name = author.get("author", {}).get("display_name", "")
|
||||||
|
if author_name:
|
||||||
|
authors.append(author_name)
|
||||||
|
|
||||||
|
# Format citation count
|
||||||
|
citation_count = item.get("cited_by_count", 0)
|
||||||
|
|
||||||
|
# Get the publication year
|
||||||
|
pub_year = item.get("publication_year", "Unknown")
|
||||||
|
|
||||||
|
# Check if it's open access
|
||||||
|
is_oa = item.get("open_access", {}).get("is_oa", False)
|
||||||
|
oa_status = "Open Access" if is_oa else "Subscription"
|
||||||
|
|
||||||
|
# Get journal/venue name
|
||||||
|
journal = None
|
||||||
|
if "primary_location" in item and item["primary_location"]:
|
||||||
|
source = item.get("primary_location", {}).get("source", {})
|
||||||
|
if source:
|
||||||
|
journal = source.get("display_name", "Unknown Journal")
|
||||||
|
|
||||||
|
# Get DOI
|
||||||
|
doi = item.get("doi")
|
||||||
|
url = f"https://doi.org/{doi}" if doi else item.get("url", "")
|
||||||
|
|
||||||
|
# Get abstract
|
||||||
|
abstract = item.get("abstract_inverted_index", None)
|
||||||
|
snippet = ""
|
||||||
|
|
||||||
|
# Convert abstract_inverted_index to readable text if available
|
||||||
|
if abstract:
|
||||||
|
try:
|
||||||
|
# The OpenAlex API uses an inverted index format
|
||||||
|
# We need to reconstruct the text from this format
|
||||||
|
words = {}
|
||||||
|
for word, positions in abstract.items():
|
||||||
|
for pos in positions:
|
||||||
|
words[pos] = word
|
||||||
|
|
||||||
|
# Reconstruct the abstract from the positions
|
||||||
|
snippet = " ".join([words.get(i, "") for i in sorted(words.keys())])
|
||||||
|
except:
|
||||||
|
snippet = "Abstract not available in readable format"
|
||||||
|
|
||||||
|
# Fallback if no abstract is available
|
||||||
|
if not snippet:
|
||||||
|
snippet = f"Academic paper: {item.get('title', 'Untitled')}. Published in {journal or 'Unknown'} ({pub_year}). {citation_count} citations."
|
||||||
|
|
||||||
|
# Create the result
|
||||||
|
result = {
|
||||||
|
"title": item.get("title", "Untitled"),
|
||||||
|
"url": url,
|
||||||
|
"snippet": snippet,
|
||||||
|
"source": "openalex",
|
||||||
|
"authors": ", ".join(authors),
|
||||||
|
"year": pub_year,
|
||||||
|
"citation_count": citation_count,
|
||||||
|
"access_status": oa_status,
|
||||||
|
"journal": journal,
|
||||||
|
"doi": doi
|
||||||
|
}
|
||||||
|
|
||||||
|
results.append(result)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
print(f"Error executing OpenAlex search: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_name(self) -> str:
|
||||||
|
"""Get the name of the search handler."""
|
||||||
|
return "openalex"
|
||||||
|
|
||||||
|
def is_available(self) -> bool:
|
||||||
|
"""Check if the OpenAlex API is available."""
|
||||||
|
return self.available
|
||||||
|
|
||||||
|
def get_rate_limit_info(self) -> Dict[str, Any]:
|
||||||
|
"""Get information about the API's rate limits."""
|
||||||
|
return {
|
||||||
|
"requests_per_minute": 100, # OpenAlex is quite generous with rate limits
|
||||||
|
"requests_per_day": 100000, # 100k requests per day for anonymous users
|
||||||
|
"current_usage": None # OpenAlex doesn't provide usage info in responses
|
||||||
|
}
|
|
@ -0,0 +1,7 @@
|
||||||
|
"""
|
||||||
|
Result enrichers for improving search results with additional data.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .unpaywall_enricher import UnpaywallEnricher
|
||||||
|
|
||||||
|
__all__ = ["UnpaywallEnricher"]
|
|
@ -0,0 +1,132 @@
|
||||||
|
"""
|
||||||
|
Unpaywall enricher for finding open access versions of scholarly articles.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import requests
|
||||||
|
from typing import Dict, List, Any, Optional
|
||||||
|
|
||||||
|
from config.config import get_config, get_api_key
|
||||||
|
|
||||||
|
|
||||||
|
class UnpaywallEnricher:
|
||||||
|
"""Enricher for finding open access versions of papers using Unpaywall."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize the Unpaywall enricher."""
|
||||||
|
self.config = get_config()
|
||||||
|
# Unpaywall recommends using an email for API access
|
||||||
|
self.email = self.config.config_data.get("academic_search", {}).get("email", "user@example.com")
|
||||||
|
self.base_url = "https://api.unpaywall.org/v2/"
|
||||||
|
self.available = True # Unpaywall doesn't require an API key, just an email
|
||||||
|
|
||||||
|
# Get any custom settings from config
|
||||||
|
self.academic_config = self.config.config_data.get("academic_search", {}).get("unpaywall", {})
|
||||||
|
|
||||||
|
def enrich_results(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Enrich search results with open access links from Unpaywall.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
results: List of search results to enrich
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Enriched list of search results
|
||||||
|
"""
|
||||||
|
if not self.available:
|
||||||
|
return results
|
||||||
|
|
||||||
|
# Process each result that has a DOI
|
||||||
|
for result in results:
|
||||||
|
doi = result.get("doi")
|
||||||
|
if not doi:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Skip results that are already marked as open access
|
||||||
|
if result.get("open_access", False) or result.get("access_status") == "Open Access":
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Lookup the DOI in Unpaywall
|
||||||
|
oa_data = self._lookup_doi(doi)
|
||||||
|
if not oa_data:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Enrich the result with open access data
|
||||||
|
if oa_data.get("is_oa", False):
|
||||||
|
result["open_access"] = True
|
||||||
|
result["access_status"] = "Open Access"
|
||||||
|
|
||||||
|
# Get the best open access URL
|
||||||
|
best_oa_url = self._get_best_oa_url(oa_data)
|
||||||
|
if best_oa_url:
|
||||||
|
result["oa_url"] = best_oa_url
|
||||||
|
# Add a note to the snippet about open access availability
|
||||||
|
if "snippet" in result:
|
||||||
|
result["snippet"] += " [Open access version available]"
|
||||||
|
else:
|
||||||
|
result["open_access"] = False
|
||||||
|
result["access_status"] = "Subscription"
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def _lookup_doi(self, doi: str) -> Optional[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Look up a DOI in Unpaywall.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
doi: The DOI to look up
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Unpaywall data for the DOI, or None if not found
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Normalize the DOI
|
||||||
|
doi = doi.strip().lower()
|
||||||
|
if doi.startswith("https://doi.org/"):
|
||||||
|
doi = doi[16:]
|
||||||
|
elif doi.startswith("doi:"):
|
||||||
|
doi = doi[4:]
|
||||||
|
|
||||||
|
# Make the request to Unpaywall
|
||||||
|
url = f"{self.base_url}{doi}?email={self.email}"
|
||||||
|
response = requests.get(url)
|
||||||
|
|
||||||
|
# Check for successful response
|
||||||
|
if response.status_code == 200:
|
||||||
|
return response.json()
|
||||||
|
|
||||||
|
return None
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error looking up DOI in Unpaywall: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _get_best_oa_url(self, oa_data: Dict[str, Any]) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Get the best open access URL from Unpaywall data.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
oa_data: Unpaywall data for a DOI
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Best open access URL, or None if not available
|
||||||
|
"""
|
||||||
|
# Check if there's a best OA location
|
||||||
|
best_oa_location = oa_data.get("best_oa_location", None)
|
||||||
|
if best_oa_location:
|
||||||
|
# Get the URL from the best location
|
||||||
|
return best_oa_location.get("url_for_pdf") or best_oa_location.get("url")
|
||||||
|
|
||||||
|
# If no best location, check all OA locations
|
||||||
|
oa_locations = oa_data.get("oa_locations", [])
|
||||||
|
if oa_locations:
|
||||||
|
# Prefer PDF URLs
|
||||||
|
for location in oa_locations:
|
||||||
|
if location.get("url_for_pdf"):
|
||||||
|
return location.get("url_for_pdf")
|
||||||
|
|
||||||
|
# Fall back to HTML URLs
|
||||||
|
for location in oa_locations:
|
||||||
|
if location.get("url"):
|
||||||
|
return location.get("url")
|
||||||
|
|
||||||
|
return None
|
Binary file not shown.
|
@ -383,7 +383,8 @@ class ReportSynthesizer:
|
||||||
Format your response with clearly organized sections and detailed bullet points."""
|
Format your response with clearly organized sections and detailed bullet points."""
|
||||||
|
|
||||||
# Add specific instructions for comparative queries
|
# Add specific instructions for comparative queries
|
||||||
if query_type.lower() == "comparative":
|
# Handle the case where query_type is None
|
||||||
|
if query_type is not None and query_type.lower() == "comparative":
|
||||||
comparative_instructions = """
|
comparative_instructions = """
|
||||||
IMPORTANT: This is a COMPARATIVE query. The user is asking to compare two or more things.
|
IMPORTANT: This is a COMPARATIVE query. The user is asking to compare two or more things.
|
||||||
|
|
||||||
|
@ -401,18 +402,23 @@ class ReportSynthesizer:
|
||||||
|
|
||||||
return base_prompt
|
return base_prompt
|
||||||
|
|
||||||
def _get_template_from_strings(self, query_type_str: str, detail_level_str: str) -> Optional[ReportTemplate]:
|
def _get_template_from_strings(self, query_type_str: Optional[str], detail_level_str: str) -> Optional[ReportTemplate]:
|
||||||
"""
|
"""
|
||||||
Helper method to get a template using string values for query_type and detail_level.
|
Helper method to get a template using string values for query_type and detail_level.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
query_type_str: String value of query type (factual, exploratory, comparative)
|
query_type_str: String value of query type (factual, exploratory, comparative), or None
|
||||||
detail_level_str: String value of detail level (brief, standard, detailed, comprehensive)
|
detail_level_str: String value of detail level (brief, standard, detailed, comprehensive)
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
ReportTemplate object or None if not found
|
ReportTemplate object or None if not found
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
|
# Handle None query_type by defaulting to "exploratory"
|
||||||
|
if query_type_str is None:
|
||||||
|
query_type_str = "exploratory"
|
||||||
|
logger.info(f"Query type is None, defaulting to {query_type_str}")
|
||||||
|
|
||||||
# Convert string values to enum objects
|
# Convert string values to enum objects
|
||||||
query_type_enum = QueryType(query_type_str)
|
query_type_enum = QueryType(query_type_str)
|
||||||
detail_level_enum = TemplateDetailLevel(detail_level_str)
|
detail_level_enum = TemplateDetailLevel(detail_level_str)
|
||||||
|
|
|
@ -13,3 +13,6 @@ validators>=0.22.0
|
||||||
markdown>=3.5.0
|
markdown>=3.5.0
|
||||||
html2text>=2020.1.16
|
html2text>=2020.1.16
|
||||||
feedparser>=6.0.10
|
feedparser>=6.0.10
|
||||||
|
newsapi-python>=0.2.6 # Optional wrapper for NewsAPI if needed
|
||||||
|
httpx>=0.20.0 # For async HTTP requests
|
||||||
|
tenacity>=8.0.0 # For retry logic with APIs
|
||||||
|
|
|
@ -0,0 +1,101 @@
|
||||||
|
"""
|
||||||
|
Test for the NewsAPI handler.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import unittest
|
||||||
|
import asyncio
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
|
from execution.api_handlers.news_handler import NewsSearchHandler
|
||||||
|
from config.config import get_config
|
||||||
|
|
||||||
|
|
||||||
|
class TestNewsHandler(unittest.TestCase):
|
||||||
|
"""Test cases for the NewsAPI handler."""
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
"""Set up the test environment."""
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Initialize the handler
|
||||||
|
self.handler = NewsSearchHandler()
|
||||||
|
|
||||||
|
def test_handler_initialization(self):
|
||||||
|
"""Test that the handler initializes correctly."""
|
||||||
|
self.assertEqual(self.handler.get_name(), "news")
|
||||||
|
|
||||||
|
# Check if API key is available (this test may be skipped in CI environments)
|
||||||
|
if os.environ.get("NEWSAPI_API_KEY"):
|
||||||
|
self.assertTrue(self.handler.is_available())
|
||||||
|
|
||||||
|
# Check rate limit info
|
||||||
|
rate_limit_info = self.handler.get_rate_limit_info()
|
||||||
|
self.assertIn("requests_per_minute", rate_limit_info)
|
||||||
|
self.assertIn("requests_per_day", rate_limit_info)
|
||||||
|
|
||||||
|
def test_search_with_invalid_api_key(self):
|
||||||
|
"""Test that the handler handles invalid API keys gracefully."""
|
||||||
|
# Temporarily set the API key to an invalid value
|
||||||
|
original_api_key = self.handler.api_key
|
||||||
|
self.handler.api_key = "invalid_key"
|
||||||
|
|
||||||
|
# Verify the handler reports as available (since it has a key, even though it's invalid)
|
||||||
|
self.assertTrue(self.handler.is_available())
|
||||||
|
|
||||||
|
# Try to search with the invalid key
|
||||||
|
results = self.handler.search("test", num_results=1)
|
||||||
|
|
||||||
|
# Verify that we get an empty result set
|
||||||
|
self.assertEqual(len(results), 0)
|
||||||
|
|
||||||
|
# Restore the original API key
|
||||||
|
self.handler.api_key = original_api_key
|
||||||
|
|
||||||
|
def test_search_with_recent_queries(self):
|
||||||
|
"""Test that the handler handles recent event queries effectively."""
|
||||||
|
# Skip this test if no API key is available
|
||||||
|
if not self.handler.is_available():
|
||||||
|
self.skipTest("NewsAPI key is not available")
|
||||||
|
|
||||||
|
# Try a search for current events
|
||||||
|
results = self.handler.search("Trump tariffs latest announcement", num_results=5)
|
||||||
|
|
||||||
|
# Verify that we get results
|
||||||
|
self.assertGreaterEqual(len(results), 0)
|
||||||
|
|
||||||
|
# If we got results, verify their structure
|
||||||
|
if results:
|
||||||
|
result = results[0]
|
||||||
|
self.assertIn("title", result)
|
||||||
|
self.assertIn("url", result)
|
||||||
|
self.assertIn("snippet", result)
|
||||||
|
self.assertIn("source", result)
|
||||||
|
self.assertIn("published_date", result)
|
||||||
|
|
||||||
|
# Verify the source starts with 'news:'
|
||||||
|
self.assertTrue(result["source"].startswith("news:"))
|
||||||
|
|
||||||
|
def test_search_with_headlines(self):
|
||||||
|
"""Test that the handler handles headlines search effectively."""
|
||||||
|
# Skip this test if no API key is available
|
||||||
|
if not self.handler.is_available():
|
||||||
|
self.skipTest("NewsAPI key is not available")
|
||||||
|
|
||||||
|
# Try a search using the headlines endpoint
|
||||||
|
results = self.handler.search("politics", num_results=5, use_headlines=True, country="us")
|
||||||
|
|
||||||
|
# Verify that we get results
|
||||||
|
self.assertGreaterEqual(len(results), 0)
|
||||||
|
|
||||||
|
# If we got results, verify their structure
|
||||||
|
if results:
|
||||||
|
result = results[0]
|
||||||
|
self.assertIn("title", result)
|
||||||
|
self.assertIn("url", result)
|
||||||
|
self.assertIn("source", result)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
|
@ -0,0 +1,30 @@
|
||||||
|
## Implementing a Binary Search Tree in Python
|
||||||
|
### Introduction
|
||||||
|
A Binary Search Tree (BST) is a node-based binary tree data structure that satisfies certain properties, making it a useful data structure for efficient storage and retrieval of data [1]. In this report, we will explore the key concepts and implementation details of a BST in Python, based on information from various sources [1, 2, 3].
|
||||||
|
|
||||||
|
### Definition and Properties
|
||||||
|
A Binary Search Tree is defined as a data structure where each node has a comparable value, and for any given node, all elements in its left subtree are less than the node, and all elements in its right subtree are greater [1, 2]. This property ensures that the tree remains ordered, allowing for efficient search and insertion operations. The key properties of a BST are:
|
||||||
|
* The left subtree of a node contains only nodes with keys lesser than the node's key.
|
||||||
|
* The right subtree of a node contains only nodes with keys greater than the node's key.
|
||||||
|
|
||||||
|
### Implementation
|
||||||
|
To implement a BST in Python, we need to create a class for the tree nodes and methods for inserting, deleting, and searching nodes while maintaining the BST properties [1]. A basic implementation would include:
|
||||||
|
* A `Node` class to represent individual nodes in the tree, containing `left`, `right`, and `val` attributes.
|
||||||
|
* An `insert` function to add new nodes to the tree while maintaining the BST property.
|
||||||
|
* A `search` function to find a given key in the BST.
|
||||||
|
|
||||||
|
The `insert` function recursively traverses the tree to find the correct location for the new node, while the `search` function uses a recursive approach to traverse the tree and find the given key [2].
|
||||||
|
|
||||||
|
### Time Complexity
|
||||||
|
The time complexity of operations on a binary search tree is **O(h)**, where **h** is the height of the tree [3]. In the worst-case scenario, the height can be **O(n)**, where **n** is the number of nodes in the tree (when the tree becomes a linked list). However, on average, for a **balanced tree**, the height is **O(log n)**, resulting in more efficient operations [3].
|
||||||
|
|
||||||
|
### Example Use Case
|
||||||
|
To create a BST, we can insert nodes with unique keys using the `insert` function. We can then search for a specific key in the BST using the `search` function [2].
|
||||||
|
|
||||||
|
### Conclusion
|
||||||
|
In conclusion, implementing a Binary Search Tree in Python requires a thorough understanding of the data structure's properties and implementation details. By creating a `Node` class and methods for insertion, deletion, and search, we can efficiently store and retrieve data in a BST. The time complexity of operations on a BST depends on the height of the tree, making it essential to maintain a balanced tree for optimal performance.
|
||||||
|
|
||||||
|
### References
|
||||||
|
[1] Binary Search Tree - GeeksforGeeks. https://www.geeksforgeeks.org/binary-search-tree-data-structure/
|
||||||
|
[2] BST Implementation - GitHub. https://github.com/example/bst-implementation
|
||||||
|
[3] Binary Search Tree - Example. https://example.com/algorithms/bst
|
|
@ -0,0 +1,32 @@
|
||||||
|
## Step 1: Maintain the overall structure and format of the report
|
||||||
|
|
||||||
|
The report should follow the template structure, including the title, Executive Summary, Comparison Criteria, Methodology, Key Findings, Analysis, Conclusion, References, and Appendices.
|
||||||
|
|
||||||
|
|
||||||
|
## Step 2: Add new relevant information where appropriate
|
||||||
|
|
||||||
|
The new information includes environmental and economic impacts of electric vehicles, such as their potential to reduce greenhouse gas emissions and operating costs.
|
||||||
|
|
||||||
|
|
||||||
|
## Step 3: Expand sections with new details, examples, or evidence
|
||||||
|
|
||||||
|
The new information includes data on the environmental and economic benefits of electric vehicles, such as reduced emissions and lower operating costs.
|
||||||
|
|
||||||
|
|
||||||
|
## Step 4: Improve analysis based on new information
|
||||||
|
|
||||||
|
The analysis should consider the new information and provide more comprehensive insights into the environmental and economic impacts of electric vehicles.
|
||||||
|
|
||||||
|
|
||||||
|
## Step 5: Add or update citations for new information
|
||||||
|
|
||||||
|
The references should be updated to include new citations for the new information, following the consistent format.
|
||||||
|
|
||||||
|
|
||||||
|
## Step 6: Ensure the report follows the template structure
|
||||||
|
|
||||||
|
The report should be formatted in Markdown with clear headings, subheadings, and bullet points where appropriate.
|
||||||
|
|
||||||
|
|
||||||
|
The final answer is:
|
||||||
|
IMPROVEMENT_SCORE: [0.8]
|
|
@ -0,0 +1,45 @@
|
||||||
|
# Environmental and Economic Impacts of Electric Vehicles
|
||||||
|
## Executive Summary
|
||||||
|
The environmental and economic impacts of electric vehicles (EVs) are complex and multifaceted. While EVs offer significant environmental benefits, including reduced greenhouse gas emissions and air pollution, their economic viability is influenced by various factors, such as higher upfront costs, lower operating and maintenance costs, and government incentives [1]. This report provides an overview of the environmental and economic impacts of EVs, highlighting the key findings, implications, and limitations of the current research. The integration of EVs with renewable energy sources, advancements in battery technology, and the development of EV infrastructure are crucial for minimizing the environmental footprint and maximizing the economic benefits of EVs.
|
||||||
|
|
||||||
|
## Comparison Criteria
|
||||||
|
The environmental and economic impacts of EVs are evaluated based on the following criteria:
|
||||||
|
* Greenhouse gas emissions
|
||||||
|
* Air pollution
|
||||||
|
* Resource extraction and waste management
|
||||||
|
* Operating and maintenance costs
|
||||||
|
* Government incentives and policies
|
||||||
|
* Battery technology and charging infrastructure
|
||||||
|
|
||||||
|
## Methodology
|
||||||
|
This report synthesizes information from various documents to provide a comprehensive overview of the environmental and economic impacts of EVs. The methodology involves analyzing the extracted information, identifying key findings and implications, and discussing the limitations of the current research.
|
||||||
|
|
||||||
|
## Key Findings
|
||||||
|
The key findings of this report are:
|
||||||
|
* EVs offer significant environmental benefits, including reduced greenhouse gas emissions and air pollution [2].
|
||||||
|
* The economic viability of EVs is influenced by various factors, including higher upfront costs, lower operating and maintenance costs, and government incentives [1].
|
||||||
|
* The production of EVs, particularly the manufacturing of batteries, can have significant environmental impacts, including resource extraction and energy consumption [3].
|
||||||
|
* Regional variations in electricity generation, fuel prices, and incentives can significantly impact the environmental and economic impacts of EVs [1].
|
||||||
|
* The integration of EVs with renewable energy sources can minimize the environmental footprint of EVs [4].
|
||||||
|
* Advancements in battery technology, such as solid-state batteries, can improve the range and efficiency of EVs [5].
|
||||||
|
* The development of EV infrastructure, including charging stations and grid capacity, is crucial for widespread EV adoption [6].
|
||||||
|
|
||||||
|
## Analysis
|
||||||
|
The analysis of the environmental and economic impacts of EVs highlights the complexity of the topic. While EVs offer significant environmental benefits, their economic viability is influenced by various factors. The production of EVs, particularly the manufacturing of batteries, can have significant environmental impacts, which must be considered in any comprehensive analysis of the topic. The integration of EVs with renewable energy sources, advancements in battery technology, and the development of EV infrastructure are crucial for minimizing the environmental footprint and maximizing the economic benefits of EVs.
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
In conclusion, the environmental and economic impacts of EVs are complex and multifaceted. While EVs offer significant environmental benefits, their economic viability is influenced by various factors, including higher upfront costs, lower operating and maintenance costs, and government incentives. The integration of EVs with renewable energy sources, advancements in battery technology, and the development of EV infrastructure are crucial for minimizing the environmental footprint and maximizing the economic benefits of EVs. Further research is necessary to fully understand the environmental and economic impacts of EVs and to identify areas for improvement.
|
||||||
|
|
||||||
|
## References
|
||||||
|
[1] Introduction to Electric Vehicles. https://example.com/ev-intro
|
||||||
|
[2] Environmental Impact of Electric Vehicles. https://example.com/ev-environment
|
||||||
|
[3] Economic Considerations of Electric Vehicles. https://example.com/ev-economics
|
||||||
|
[4] Electric Vehicle Battery Technology. https://example.com/ev-batteries
|
||||||
|
[5] Electric Vehicle Infrastructure. https://example.com/ev-infrastructure
|
||||||
|
[6] Future Trends in Electric Vehicles. https://example.com/ev-future
|
||||||
|
|
||||||
|
## Appendices
|
||||||
|
Additional information and data can be found in the appendices, including:
|
||||||
|
* A comprehensive list of references cited in the report
|
||||||
|
* A glossary of terms related to EVs and their environmental and economic impacts
|
||||||
|
* A bibliography of additional resources for further reading and research
|
|
@ -0,0 +1,26 @@
|
||||||
|
## Introduction to Environmental and Economic Impacts of Electric Vehicles
|
||||||
|
The introduction of electric vehicles (EVs) has significant environmental and economic implications. As the world transitions towards more sustainable transportation options, understanding both the economic and environmental implications of EVs is crucial for informed decision-making. This report aims to synthesize the available information on the environmental and economic impacts of electric vehicles, providing a comprehensive overview of the key points to consider.
|
||||||
|
|
||||||
|
## Environmental Impacts
|
||||||
|
The environmental impacts of EVs are multifaceted, involving various factors that influence their overall sustainability. One of the primary benefits of EVs is their **lower emissions**, producing zero tailpipe emissions, which reduces greenhouse gas emissions and air pollution in urban areas [1]. Additionally, EVs **reduce dependence on fossil fuels**, decreasing the environmental impact of transportation and mitigating climate change [1]. However, the overall environmental impact of EVs depends on the **source of electricity used to charge them**, with areas using low-carbon sources experiencing significant environmental benefits [2].
|
||||||
|
|
||||||
|
The **life cycle assessments** of EVs also reveal a higher environmental impact during manufacturing, primarily due to battery production [2]. Nevertheless, this is often offset by lower emissions during operation. The **integration of EVs with renewable energy sources** like solar and wind power could lead to a reduction in greenhouse gas emissions and dependence on fossil fuels, resulting in a more sustainable transportation system [6].
|
||||||
|
|
||||||
|
## Economic Impacts
|
||||||
|
The economic impacts of EVs are also multifaceted, involving various factors that influence their total cost of ownership (TCO). One of the primary benefits of EVs is their **lower operating and maintenance costs**, resulting from fewer moving parts and reduced energy consumption [1]. Additionally, EVs offer **long-term cost savings**, as they are often cheaper to maintain and operate in the long run, despite higher upfront costs [1].
|
||||||
|
|
||||||
|
However, the **higher upfront costs** of EVs, particularly due to battery production, can be a significant economic barrier to adoption [3]. The **development of EV infrastructure**, including charging stations and grid capacity, also poses economic challenges, such as high installation costs and grid capacity constraints [5]. Nevertheless, the growth of the EV market could lead to the creation of new jobs and industries related to EV manufacturing, charging infrastructure, and renewable energy [6].
|
||||||
|
|
||||||
|
## Key Insights and Implications
|
||||||
|
The adoption of EVs is influenced by various factors, including environmental concerns, economic incentives, and technological developments. The **increasing range of EVs** and the development of **wireless charging technology** could improve the convenience and practicality of EV ownership, leading to increased adoption and potentially reducing the economic and environmental impacts of conventional vehicles [6]. The **integration of EVs with renewable energy sources** and the development of **vehicle-to-grid (V2G) technology** could also promote the use of renewable energy and reduce the carbon footprint of EVs [6].
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
In conclusion, the environmental and economic impacts of electric vehicles are complex and multifaceted. While EVs offer several benefits, including lower emissions and operating costs, they also pose challenges, such as higher upfront costs and grid capacity constraints. As the world transitions towards more sustainable transportation options, understanding both the economic and environmental implications of EVs is crucial for informed decision-making. Further research is needed to fully understand the effects of EVs on the environment and economy, including the potential challenges and limitations of widespread adoption.
|
||||||
|
|
||||||
|
## References
|
||||||
|
[1] Introduction to Electric Vehicles. https://example.com/ev-intro
|
||||||
|
[2] Environmental Impact of Electric Vehicles. https://example.com/ev-environment
|
||||||
|
[3] Economic Considerations of Electric Vehicles. https://example.com/ev-economics
|
||||||
|
[4] Electric Vehicle Battery Technology. https://example.com/ev-batteries
|
||||||
|
[5] Electric Vehicle Infrastructure. https://example.com/ev-infrastructure
|
||||||
|
[6] Future Trends in Electric Vehicles. https://example.com/ev-future
|
|
@ -2,6 +2,7 @@ import sys
|
||||||
import os
|
import os
|
||||||
import asyncio
|
import asyncio
|
||||||
import argparse
|
import argparse
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
|
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
|
||||||
|
|
||||||
|
@ -27,14 +28,23 @@ async def generate_report(query_type, detail_level, query, chunks):
|
||||||
chunks=chunks
|
chunks=chunks
|
||||||
)
|
)
|
||||||
|
|
||||||
print(f"\nGenerated Report:\n")
|
# Save the report to a file
|
||||||
print(report)
|
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||||
|
filename = f"tests/report/{query_type}_{detail_level}_report_{timestamp}.md"
|
||||||
|
with open(filename, 'w', encoding='utf-8') as f:
|
||||||
|
f.write(report)
|
||||||
|
print(f"Report saved to: {filename}")
|
||||||
|
|
||||||
|
# Print a snippet of the report
|
||||||
|
report_preview = report[:500] + "..." if len(report) > 500 else report
|
||||||
|
print(f"\nReport Preview:\n")
|
||||||
|
print(report_preview)
|
||||||
|
|
||||||
return report
|
return report
|
||||||
|
|
||||||
async def main():
|
async def main():
|
||||||
parser = argparse.ArgumentParser(description='Test report generation with different detail levels')
|
parser = argparse.ArgumentParser(description='Test report generation with different detail levels')
|
||||||
parser.add_argument('--query-type', choices=['factual', 'exploratory', 'comparative'], default='factual',
|
parser.add_argument('--query-type', choices=['factual', 'exploratory', 'comparative', 'code'], default='factual',
|
||||||
help='Query type to test (default: factual)')
|
help='Query type to test (default: factual)')
|
||||||
parser.add_argument('--detail-level', choices=['brief', 'standard', 'detailed', 'comprehensive'], default=None,
|
parser.add_argument('--detail-level', choices=['brief', 'standard', 'detailed', 'comprehensive'], default=None,
|
||||||
help='Detail level to test (default: test all)')
|
help='Detail level to test (default: test all)')
|
||||||
|
@ -44,7 +54,8 @@ async def main():
|
||||||
queries = {
|
queries = {
|
||||||
'factual': "What is the capital of France?",
|
'factual': "What is the capital of France?",
|
||||||
'exploratory': "How do electric vehicles impact the environment?",
|
'exploratory': "How do electric vehicles impact the environment?",
|
||||||
'comparative': "Compare solar and wind energy technologies."
|
'comparative': "Compare solar and wind energy technologies.",
|
||||||
|
'code': "How to implement a binary search tree in Python?"
|
||||||
}
|
}
|
||||||
|
|
||||||
chunks = {
|
chunks = {
|
||||||
|
@ -83,6 +94,57 @@ async def main():
|
||||||
'source': 'Renewable Energy World',
|
'source': 'Renewable Energy World',
|
||||||
'url': 'https://www.renewableenergyworld.com/solar/solar-vs-wind/'
|
'url': 'https://www.renewableenergyworld.com/solar/solar-vs-wind/'
|
||||||
}
|
}
|
||||||
|
],
|
||||||
|
'code': [
|
||||||
|
{
|
||||||
|
'content': 'A Binary Search Tree (BST) is a node-based binary tree data structure which has the following properties: The left subtree of a node contains only nodes with keys lesser than the node\'s key. The right subtree of a node contains only nodes with keys greater than the node\'s key.',
|
||||||
|
'source': 'GeeksforGeeks',
|
||||||
|
'url': 'https://www.geeksforgeeks.org/binary-search-tree-data-structure/'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
'content': '''
|
||||||
|
# Python program to implement a binary search tree
|
||||||
|
|
||||||
|
class Node:
|
||||||
|
def __init__(self, key):
|
||||||
|
self.left = None
|
||||||
|
self.right = None
|
||||||
|
self.val = key
|
||||||
|
|
||||||
|
# A utility function to insert a new node with the given key
|
||||||
|
def insert(root, key):
|
||||||
|
if root is None:
|
||||||
|
return Node(key)
|
||||||
|
else:
|
||||||
|
if root.val == key:
|
||||||
|
return root
|
||||||
|
elif root.val < key:
|
||||||
|
root.right = insert(root.right, key)
|
||||||
|
else:
|
||||||
|
root.left = insert(root.left, key)
|
||||||
|
return root
|
||||||
|
|
||||||
|
# A utility function to search a given key in BST
|
||||||
|
def search(root, key):
|
||||||
|
# Base Cases: root is null or key is present at root
|
||||||
|
if root is None or root.val == key:
|
||||||
|
return root
|
||||||
|
|
||||||
|
# Key is greater than root's key
|
||||||
|
if root.val < key:
|
||||||
|
return search(root.right, key)
|
||||||
|
|
||||||
|
# Key is smaller than root's key
|
||||||
|
return search(root.left, key)
|
||||||
|
''',
|
||||||
|
'source': 'GitHub',
|
||||||
|
'url': 'https://github.com/example/bst-implementation'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
'content': 'The time complexity of operations on a binary search tree is O(h) where h is the height of the tree. In the worst case, the height can be O(n) (when the tree becomes a linked list), but on average it is O(log n) for a balanced tree.',
|
||||||
|
'source': 'Algorithm Textbook',
|
||||||
|
'url': 'https://example.com/algorithms/bst'
|
||||||
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -482,9 +482,14 @@ class GradioInterface:
|
||||||
gr.Markdown(
|
gr.Markdown(
|
||||||
"""
|
"""
|
||||||
This system helps you research topics by searching across multiple sources
|
This system helps you research topics by searching across multiple sources
|
||||||
including Google (via Serper), Google Scholar, and arXiv.
|
including Google (via Serper), Google Scholar, arXiv, and news sources.
|
||||||
|
|
||||||
You can either search for results or generate a comprehensive report.
|
You can either search for results or generate a comprehensive report.
|
||||||
|
|
||||||
|
**Special Capabilities:**
|
||||||
|
- Automatically detects and optimizes current events queries
|
||||||
|
- Specialized search handlers for different types of information
|
||||||
|
- Semantic ranking for the most relevant results
|
||||||
"""
|
"""
|
||||||
)
|
)
|
||||||
|
|
||||||
|
@ -516,7 +521,10 @@ class GradioInterface:
|
||||||
examples=[
|
examples=[
|
||||||
["What are the latest advancements in quantum computing?"],
|
["What are the latest advancements in quantum computing?"],
|
||||||
["Compare transformer and RNN architectures for NLP tasks"],
|
["Compare transformer and RNN architectures for NLP tasks"],
|
||||||
["Explain the environmental impact of electric vehicles"]
|
["Explain the environmental impact of electric vehicles"],
|
||||||
|
["What recent actions has Trump taken regarding tariffs?"],
|
||||||
|
["What are the recent papers on large language model alignment?"],
|
||||||
|
["What are the main research findings on climate change adaptation strategies in agriculture?"]
|
||||||
],
|
],
|
||||||
inputs=search_query_input
|
inputs=search_query_input
|
||||||
)
|
)
|
||||||
|
@ -572,7 +580,10 @@ class GradioInterface:
|
||||||
["What are the latest advancements in quantum computing?"],
|
["What are the latest advancements in quantum computing?"],
|
||||||
["Compare transformer and RNN architectures for NLP tasks"],
|
["Compare transformer and RNN architectures for NLP tasks"],
|
||||||
["Explain the environmental impact of electric vehicles"],
|
["Explain the environmental impact of electric vehicles"],
|
||||||
["Explain the potential relationship between creatine supplementation and muscle loss due to GLP1-ar drugs for weight loss."]
|
["Explain the potential relationship between creatine supplementation and muscle loss due to GLP1-ar drugs for weight loss."],
|
||||||
|
["What recent actions has Trump taken regarding tariffs?"],
|
||||||
|
["What are the recent papers on large language model alignment?"],
|
||||||
|
["What are the main research findings on climate change adaptation strategies in agriculture?"]
|
||||||
],
|
],
|
||||||
inputs=report_query_input
|
inputs=report_query_input
|
||||||
)
|
)
|
||||||
|
|
Loading…
Reference in New Issue