Fully functional end-to-end test of research with gemini flash
This commit is contained in:
parent
4753e567ad
commit
7f440286bc
|
@ -34,12 +34,12 @@ ENV/
|
|||
.DS_Store
|
||||
|
||||
# Project specific
|
||||
config/config.yaml
|
||||
*.json
|
||||
!config/config.yaml.example
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
report_*.md
|
||||
|
||||
# Logs
|
||||
logs/
|
||||
|
|
|
@ -140,3 +140,26 @@ The `ranking` module provides functionality for reranking and prioritizing docum
|
|||
- `FilterManager`: Class for filtering documents
|
||||
- `filter_by_date(documents, start_date, end_date)`: Filters by date
|
||||
- `filter_by_source(documents, sources)`: Filters by source
|
||||
|
||||
## Recent Updates
|
||||
|
||||
### 2025-02-28: Async Implementation and Reference Formatting
|
||||
|
||||
1. **LLM Interface Updates**:
|
||||
- Converted key methods to async:
|
||||
- `generate_completion`
|
||||
- `classify_query`
|
||||
- `enhance_query`
|
||||
- `generate_search_queries`
|
||||
- Added special handling for Gemini models
|
||||
- Improved reference formatting instructions
|
||||
|
||||
2. **Query Processor Updates**:
|
||||
- Updated `process_query` to be async
|
||||
- Made `generate_search_queries` async
|
||||
- Fixed async/await patterns throughout
|
||||
|
||||
3. **Gradio Interface Updates**:
|
||||
- Modified `generate_report` to handle async operations
|
||||
- Updated report button click handler
|
||||
- Improved error handling
|
||||
|
|
|
@ -1,39 +1,52 @@
|
|||
# Current Focus: Report Generation Module Implementation (Phase 4)
|
||||
# Current Focus: Google Gemini Integration, Reference Formatting, and NoneType Error Fixes
|
||||
|
||||
## Latest Update (2025-02-28)
|
||||
## Active Work
|
||||
|
||||
We have successfully implemented Phases 1, 2, and 3 of the Report Generation module, and are now making progress on Phase 4: Advanced Features. We have completed the implementation of customizable report detail levels, allowing users to select different levels of detail for generated reports, and have enhanced the analytical depth of detailed and comprehensive reports.
|
||||
### Google Gemini Integration
|
||||
- ✅ Fixed the integration of Google Gemini models with LiteLLM
|
||||
- ✅ Updated message formatting for Gemini models
|
||||
- ✅ Added proper handling for the 'gemini' provider in environment variables
|
||||
- ✅ Fixed reference formatting issues with Gemini models
|
||||
- ✅ Converted LLM interface methods to async to fix runtime errors
|
||||
|
||||
### Recent Progress
|
||||
### Gradio UI Updates
|
||||
- ✅ Updated the Gradio interface to handle async methods
|
||||
- ✅ Fixed parameter ordering in the report generation function
|
||||
- ✅ Improved error handling in the UI
|
||||
|
||||
1. **Enhanced Report Detail Levels**:
|
||||
- Enhanced the template modifiers for DETAILED and COMPREHENSIVE detail levels to focus more on analytical depth, evidence density, and perspective diversity rather than just adding additional sections
|
||||
- Improved the document chunk processing to extract more meaningful information from each chunk for detailed and comprehensive reports
|
||||
- Added detail-level-specific extraction prompts that guide the LLM to extract different types of information based on the selected detail level
|
||||
- Modified the map-reduce approach to pass detail level parameters throughout the process
|
||||
### Bug Fixes
|
||||
- ✅ Fixed NoneType error in report synthesis when chunk titles are None
|
||||
- ✅ Added defensive null checks throughout document processing and report synthesis
|
||||
- ✅ Improved chunk counter in map_document_chunks method
|
||||
|
||||
2. **Customizable Report Detail Levels Implementation**:
|
||||
- Created a `ReportDetailLevelManager` class in `report_detail_levels.py` that defines four detail levels:
|
||||
- Brief: Concise summary with key findings (uses llama-3.1-8b-instant model)
|
||||
- Standard: Balanced report with analysis and conclusions (uses llama-3.1-8b-instant model)
|
||||
- Detailed: Comprehensive report with in-depth analysis (uses llama-3.3-70b-versatile model)
|
||||
- Comprehensive: Exhaustive report with all available information (uses llama-3.3-70b-versatile model)
|
||||
- Each detail level has specific configuration parameters:
|
||||
- Number of search results per engine
|
||||
- Token budget for report generation
|
||||
- Chunk size and overlap size for document processing
|
||||
- Recommended LLM model
|
||||
- Updated the report synthesis module to use different templates based on detail level
|
||||
- Modified the report generator to automatically configure parameters based on the selected detail level
|
||||
- Updated the query_to_report.py script to accept a detail_level parameter
|
||||
- Created test scripts to demonstrate the different detail levels
|
||||
## Recent Changes
|
||||
|
||||
3. **Gradio UI Enhancements**:
|
||||
- Updated the Gradio interface to include report generation with detail levels
|
||||
- Added custom model selection for report generation
|
||||
- Implemented processing of thinking tags in the model output
|
||||
- Fixed method names and improved query processing for search execution
|
||||
- Enhanced error handling for report generation
|
||||
### Reference Formatting Improvements
|
||||
- Enhanced the instructions for reference formatting to ensure URLs are included
|
||||
- Added a recovery mechanism for truncated references
|
||||
- Improved context preparation to better extract URLs for references
|
||||
- Added duplicate URL fields in the context to emphasize their importance
|
||||
|
||||
### Async LLM Interface
|
||||
- Made `generate_completion`, `classify_query`, `enhance_query`, and `generate_search_queries` methods async
|
||||
- Updated dependent code to properly await these methods
|
||||
- Fixed runtime errors related to async/await patterns in the QueryProcessor
|
||||
|
||||
### Error Handling Improvements
|
||||
- Added null checks for chunk titles in report synthesis
|
||||
- Improved chunk counter in map_document_chunks method
|
||||
- Added defensive code to ensure all chunks have titles
|
||||
- Updated document processor to handle None titles with default values
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Continue testing with Gemini models to ensure stable operation
|
||||
2. Consider adding more robust error handling for LLM provider-specific issues
|
||||
3. Improve the reference formatting further if needed
|
||||
4. Update documentation to reflect the changes made to the LLM interface
|
||||
5. Consider adding more unit tests for the async methods
|
||||
6. Add more comprehensive null checks throughout the codebase
|
||||
7. Implement better error handling and recovery mechanisms
|
||||
|
||||
### Future Enhancements
|
||||
|
||||
|
|
|
@ -403,3 +403,39 @@ Implemented and tested successfully with both sample data and real URLs.
|
|||
- Standard report (balanced approach)
|
||||
- Comprehensive analysis (more results, larger token budget)
|
||||
- Technical deep-dive (specialized prompts, larger context)
|
||||
|
||||
## 2025-02-28: Async Implementation and Reference Formatting
|
||||
|
||||
### Decision: Convert LLM Interface Methods to Async
|
||||
|
||||
**Context**: The codebase was experiencing runtime errors related to coroutine handling, particularly with the LLM interface methods.
|
||||
|
||||
**Decision**: Convert all LLM interface methods to async and update dependent code to properly await these methods.
|
||||
|
||||
**Rationale**:
|
||||
- LLM API calls are I/O-bound operations that benefit from async handling
|
||||
- Consistent async/await patterns throughout the codebase improve reliability
|
||||
- Proper async implementation prevents runtime errors related to coroutine handling
|
||||
|
||||
**Implementation**:
|
||||
- Converted `generate_completion`, `classify_query`, `enhance_query`, and `generate_search_queries` methods to async
|
||||
- Updated QueryProcessor methods to be async
|
||||
- Modified query_to_report.py to correctly await async methods
|
||||
- Updated the Gradio interface to handle async operations
|
||||
|
||||
### Decision: Enhance Reference Formatting Instructions
|
||||
|
||||
**Context**: References in generated reports were missing URLs and sometimes using generic placeholders like "Document 1".
|
||||
|
||||
**Decision**: Enhance the reference formatting instructions to emphasize including URLs and improve context preparation.
|
||||
|
||||
**Rationale**:
|
||||
- Proper references with URLs are essential for academic and professional reports
|
||||
- Clear instructions to the LLM improve the quality of generated references
|
||||
- Duplicate URL fields in the context ensure URLs are captured
|
||||
|
||||
**Implementation**:
|
||||
- Improved instructions to emphasize including URLs for each reference
|
||||
- Added duplicate URL fields in the context to ensure URLs are captured
|
||||
- Updated the reference generation prompt to explicitly request URLs
|
||||
- Added a separate reference generation step to handle truncated references
|
||||
|
|
|
@ -420,11 +420,26 @@ Successfully tested the end-to-end query to report pipeline with a specific quer
|
|||
- Managing the processing of a large number of document chunks efficiently
|
||||
|
||||
### Next Steps
|
||||
1. Implement customizable report detail levels
|
||||
2. Add support for alternative models with larger context windows
|
||||
3. Develop progressive report generation for very large research tasks
|
||||
4. Create visualization components for data mentioned in reports
|
||||
5. Add interactive elements to the generated reports
|
||||
1. **Implement Customizable Report Detail Levels**:
|
||||
- Develop a system to allow users to select different levels of detail for generated reports
|
||||
- Integrate the customizable detail levels into the report generator
|
||||
- Test the new feature with various query types
|
||||
|
||||
2. **Add Support for Alternative Models**:
|
||||
- Research and implement support for alternative models with larger context windows
|
||||
- Test the new models with the report generation pipeline
|
||||
|
||||
3. **Develop Progressive Report Generation**:
|
||||
- Design and implement a system for progressive report generation
|
||||
- Test the new feature with very large research tasks
|
||||
|
||||
4. **Create Visualization Components**:
|
||||
- Develop visualization components for data mentioned in reports
|
||||
- Integrate the visualization components into the report generator
|
||||
|
||||
5. **Add Interactive Elements**:
|
||||
- Develop interactive elements for the generated reports
|
||||
- Integrate the interactive elements into the report generator
|
||||
|
||||
## Session: 2025-02-28
|
||||
|
||||
|
@ -567,3 +582,113 @@ In this session, we fixed issues in the Gradio UI for report generation and plan
|
|||
2. Begin work on the multiple query variation generation feature
|
||||
3. Test the current implementation with various query types to identify any remaining issues
|
||||
4. Update the documentation to reflect the new features and future plans
|
||||
|
||||
## Session: 2025-02-28: Google Gemini Integration and Reference Formatting
|
||||
|
||||
### Overview
|
||||
Fixed the integration of Google Gemini models with LiteLLM, and fixed reference formatting issues.
|
||||
|
||||
### Key Activities
|
||||
1. **Fixed Google Gemini Integration**:
|
||||
- Updated the model format to `gemini/gemini-2.0-flash` in config.yaml
|
||||
- Modified message formatting for Gemini models in LLM interface
|
||||
- Added proper handling for the 'gemini' provider in environment variable setup
|
||||
|
||||
2. **Fixed Reference Formatting Issues**:
|
||||
- Enhanced the instructions for reference formatting to ensure URLs are included
|
||||
- Added a recovery mechanism for truncated references
|
||||
- Improved context preparation to better extract URLs for references
|
||||
|
||||
3. **Converted LLM Interface Methods to Async**:
|
||||
- Made `generate_completion`, `classify_query`, and `enhance_query` methods async
|
||||
- Updated dependent code to properly await these methods
|
||||
- Fixed runtime errors related to async/await patterns
|
||||
|
||||
### Key Insights
|
||||
- Gemini models require special message formatting (using 'user' and 'model' roles instead of 'system' and 'assistant')
|
||||
- References were getting cut off due to token limits, requiring a separate generation step
|
||||
- The async conversion was necessary to properly handle async LLM calls throughout the codebase
|
||||
|
||||
### Challenges
|
||||
- Ensuring that the templates produce appropriate output for each detail level
|
||||
- Balancing between speed and quality for different detail levels
|
||||
- Managing token budgets effectively across different detail levels
|
||||
- Ensuring backward compatibility with existing code
|
||||
|
||||
### Next Steps
|
||||
1. Continue testing with Gemini models to ensure stable operation
|
||||
2. Consider adding more robust error handling for LLM provider-specific issues
|
||||
3. Improve the reference formatting further if needed
|
||||
|
||||
## Session: 2025-02-28: Fixing Reference Formatting and Async Implementation
|
||||
|
||||
### Overview
|
||||
Fixed reference formatting issues with Gemini models and updated the codebase to properly handle async methods.
|
||||
|
||||
### Key Activities
|
||||
1. **Enhanced Reference Formatting**:
|
||||
- Improved instructions to emphasize including URLs for each reference
|
||||
- Added duplicate URL fields in the context to ensure URLs are captured
|
||||
- Updated the reference generation prompt to explicitly request URLs
|
||||
- Added a separate reference generation step to handle truncated references
|
||||
|
||||
2. **Fixed Async Implementation**:
|
||||
- Converted all LLM interface methods to async for proper handling
|
||||
- Updated QueryProcessor's generate_search_queries method to be async
|
||||
- Modified query_to_report.py to correctly await async methods
|
||||
- Fixed runtime errors related to async/await patterns
|
||||
|
||||
3. **Updated Gradio Interface**:
|
||||
- Modified the generate_report method to properly handle async operations
|
||||
- Updated the report button click handler to correctly pass parameters
|
||||
- Fixed the parameter order in the lambda function for async execution
|
||||
- Improved error handling in the UI
|
||||
|
||||
### Key Insights
|
||||
- Async/await patterns need to be consistently applied throughout the codebase
|
||||
- Reference formatting requires explicit instructions to include URLs
|
||||
- Gradio's interface needs special handling for async functions
|
||||
|
||||
### Challenges
|
||||
- Ensuring that all async methods are properly awaited
|
||||
- Balancing between detailed instructions and token limits for reference generation
|
||||
- Managing the increased processing time for async operations
|
||||
|
||||
### Next Steps
|
||||
1. Continue testing with Gemini models to ensure stable operation
|
||||
2. Consider adding more robust error handling for LLM provider-specific issues
|
||||
3. Improve the reference formatting further if needed
|
||||
4. Update documentation to reflect the changes made to the LLM interface
|
||||
5. Consider adding more unit tests for the async methods
|
||||
|
||||
## Session: 2025-02-28: Fixed NoneType Error in Report Synthesis
|
||||
|
||||
### Issue
|
||||
Encountered an error during report generation:
|
||||
```
|
||||
TypeError: 'NoneType' object is not subscriptable
|
||||
```
|
||||
|
||||
The error occurred in the `map_document_chunks` method of the `ReportSynthesizer` class when trying to slice a title that was `None`.
|
||||
|
||||
### Changes Made
|
||||
1. Fixed the chunk counter in `map_document_chunks` method:
|
||||
- Used a separate counter for individual chunks instead of using the batch index
|
||||
- Added a null check for chunk titles with a fallback to 'Untitled'
|
||||
|
||||
2. Added defensive code in `synthesize_report` method:
|
||||
- Added code to ensure all chunks have a title before processing
|
||||
- Added null checks for title fields
|
||||
|
||||
3. Updated the `DocumentProcessor` class:
|
||||
- Modified `process_documents_for_report` to ensure all chunks have a title
|
||||
- Updated `chunk_document_by_sections`, `chunk_document_fixed_size`, and `chunk_document_hierarchical` methods to handle None titles
|
||||
- Added default 'Untitled' value for all title fields
|
||||
|
||||
### Testing
|
||||
The changes were tested with a report generation task that previously failed, and the error was resolved.
|
||||
|
||||
### Next Steps
|
||||
1. Consider adding more comprehensive null checks throughout the codebase
|
||||
2. Add unit tests to verify proper handling of missing or null fields
|
||||
3. Implement better error handling and recovery mechanisms
|
||||
|
|
|
@ -111,7 +111,7 @@ module_models:
|
|||
|
||||
# Report generation module
|
||||
report_generation:
|
||||
synthesize_report: "llama-3.3-70b-versatile" # Use Groq's Llama 3.3 70B for report synthesis
|
||||
synthesize_report: "gemini-2.0-flash" # Use Google's Gemini 2.0 Flash for report synthesis
|
||||
format_report: "llama-3.1-8b-instant" # Use Groq's Llama 3.1 8B for formatting
|
||||
|
||||
# Search engine configurations
|
||||
|
|
|
@ -1,42 +0,0 @@
|
|||
The Environmental and Economic Impact of Electric Vehicles Compared to Traditional Vehicles
|
||||
=====================================================================================
|
||||
|
||||
### Introduction
|
||||
|
||||
The transportation sector is one of the largest contributors to greenhouse gas emissions and air pollution, with traditional internal combustion engine vehicles being a significant source of these emissions [1]. Electric vehicles (EVs) have emerged as a promising alternative to traditional vehicles, with the potential to reduce greenhouse gas emissions and air pollution. This report provides an overview of the environmental and economic impact of EVs compared to traditional vehicles, based on information from various sources.
|
||||
|
||||
### Environmental Impact
|
||||
|
||||
EVs produce zero tailpipe emissions, reducing greenhouse gas emissions and air pollution in urban areas [2]. According to the Alternative Fuels Data Center, EVs can reduce emissions by 70% compared to traditional gasoline-powered vehicles [3]. However, the production of EVs generates more emissions than traditional vehicles, primarily due to the extraction and processing of raw materials for battery production [4]. The overall environmental impact of EVs depends on the source of electricity used to charge them, with renewable energy sources resulting in lower emissions [5].
|
||||
|
||||
### Economic Impact
|
||||
|
||||
The economic impact of EVs is significant, with potential benefits including reduced fuel costs and lower maintenance costs [6]. According to the Alternative Fuels Data Center, the cost of charging an EV can be as low as $3 to $5 per 100 miles, while driving a traditional vehicle can cost around $12 to $15 per 100 miles [7]. The cost of EVs is decreasing over time, making them more competitive with traditional vehicles [8]. Governments and companies are investing in EV infrastructure, including charging stations, to support the adoption of EVs [9].
|
||||
|
||||
### Comparison with Traditional Vehicles
|
||||
|
||||
EVs have several advantages over traditional vehicles, including lower operating costs and reduced emissions [10]. However, traditional vehicles have a lower upfront cost and a more established infrastructure [11]. The choice between EVs and traditional vehicles depends on various factors, including driving habits, budget, and access to charging infrastructure [12].
|
||||
|
||||
### Conclusion
|
||||
|
||||
In conclusion, EVs have a lower environmental impact than traditional vehicles, with the potential to reduce greenhouse gas emissions and air pollution [13]. The economic impact of EVs is significant, with potential benefits including reduced fuel costs and lower maintenance costs [14]. As the cost of EVs decreases and infrastructure improves, EVs are becoming a more viable alternative to traditional vehicles [15].
|
||||
|
||||
### References
|
||||
|
||||
[1] Document 1: Electric Cars | Environmental Pros and Cons | Workiva Carbon
|
||||
[2] Document 3: The Environmental Impact of Battery Production for EVs
|
||||
[3] Document 80: Alternative Fuels Data Center: Emissions from Electric Vehicles
|
||||
[4] Document 21: [2104.14287v1] Electric cars, assessment of green nature vis a vis conventional fuel driven cars
|
||||
[5] Document 33: The Environmental Impact of Battery Production for EVs
|
||||
[6] Document 15: Electric Cars | Environmental Pros and Cons | Workiva Carbon
|
||||
[7] Document 81: Alternative Fuels Data Center: Emissions from Electric Vehicles
|
||||
[8] Document 16: Electric Cars | Environmental Pros and Cons | Workiva Carbon
|
||||
[9] Document 36: The Environmental Impact of Battery Production for EVs
|
||||
[10] Document 13: Electric Cars | Environmental Pros and Cons | Workiva Carbon
|
||||
[11] Document 17: Electric Cars | Environmental Pros and Cons | Workiva Carbon
|
||||
[12] Document 18: Electric Cars | Environmental Pros and Cons | Workiva Carbon
|
||||
[13] Document 32: The Environmental Impact of Battery Production for EVs
|
||||
[14] Document 41: The Environmental Impact of Battery Production for EVs
|
||||
[15] Document 52: [1710.01359v2] Multi-Period Coordinated Management of Electric Vehicles in Zonal Power Markets: A Convex Relaxation Approach
|
||||
|
||||
Note: The references provided are based on the document numbers and sources mentioned in the query. The actual references may vary depending on the specific sources and documents used.
|
|
@ -1,58 +0,0 @@
|
|||
**Environmental and Economic Impact of Electric Vehicles Compared to Traditional Vehicles**
|
||||
|
||||
**Executive Summary**
|
||||
|
||||
Electric vehicles (EVs) have gained popularity in recent years due to their environmental benefits and economic advantages. This report aims to provide a comprehensive overview of the environmental and economic impact of EVs compared to traditional internal combustion engine vehicles (ICEVs). Our analysis reveals that EVs produce zero tailpipe emissions, reducing greenhouse gas emissions and air pollution in urban areas. Additionally, EVs have a lower well-to-wheel emissions profile compared to ICEVs, with studies suggesting a reduction of up to 70%. The economic benefits of EVs include lower operating costs and reduced dependence on imported oil.
|
||||
|
||||
**Environmental Impact**
|
||||
|
||||
Electric vehicles produce zero tailpipe emissions, reducing greenhouse gas emissions and air pollution in urban areas. According to a study by the Union of Concerned Scientists, EVs can reduce CO2 emissions by 70-80% compared to traditional internal combustion engine vehicles [1]. A study by the National Renewable Energy Laboratory found that EVs can reduce well-to-wheel emissions by 40-60% compared to traditional vehicles [2].
|
||||
|
||||
However, the production of EVs generates more emissions than ICEVs, mainly due to the energy required for battery manufacturing. A study by the European Commission found that the overall carbon footprint of EVs is lower over their entire life cycle (25-50 years) [3]. Recycling of EV batteries can reduce energy consumption and greenhouse gas emissions from EV production [4].
|
||||
|
||||
**Economic Impact**
|
||||
|
||||
The cost of EVs is decreasing, making them more competitive with traditional vehicles. A study by the International Energy Agency (IEA) found that the cost of EVs can be up to 30% lower than traditional vehicles over a 10-year period [5]. EVs can also reduce fuel costs by up to 75% compared to traditional vehicles, as electricity is generally cheaper than gasoline [6]. Additionally, EVs can reduce maintenance costs due to fewer moving parts and no oil changes required [7].
|
||||
|
||||
**Key Statistics**
|
||||
|
||||
* EVs produce around 35-50 g CO2e/km, while ICEVs produce around 110-130 g CO2e/km [8].
|
||||
* EVs can save around €500-€1,000 per year in fuel costs compared to ICEVs [9].
|
||||
* The global EV market is expected to reach 14.5 million units by 2027 [10].
|
||||
|
||||
**Conclusion**
|
||||
|
||||
Electric vehicles have a significantly lower environmental impact compared to traditional vehicles. However, their economic impact depends on various factors, including the source of energy used to charge them. The adoption of EVs can lead to a reduction in greenhouse gas emissions and air pollution, improving public health and the environment. Governments and companies can play a key role in promoting the adoption of EVs and reducing their environmental impact.
|
||||
|
||||
**References**
|
||||
|
||||
[1] Union of Concerned Scientists. (2020). Electric Vehicles: The Benefits and Challenges.
|
||||
|
||||
[2] National Renewable Energy Laboratory. (2020). Well-to-Wheel Emissions of Electric Vehicles.
|
||||
|
||||
[3] European Commission. (2020). The European Green Deal.
|
||||
|
||||
[4] Dunn, J.B.; Gaines, L.; Sullivan, J.; Wang, M.Q. Impact of recycling on cradle-to-gate energy consumption and greenhouse gas emissions of automotive lithium-ion batteries. Environ. Sci. Technol. **2012**, 46, 12704–12710.
|
||||
|
||||
[5] International Energy Agency. (2020). Global EV Outlook 2020.
|
||||
|
||||
[6] Messagie, M. (2017). Energy Savings from Electric Vehicles.
|
||||
|
||||
[7] Held, M., & Schücking, M. (2019). Utilization effects on battery electric vehicle life-cycle assessment: A case-driven analysis of two commercial mobility applications. Transportation Research Part D: Transport and Environment, 75, 87–105.
|
||||
|
||||
[8] Bauer, A.; Hache, E.; Ternel, C.; Beauchet, S. Comparative environmental life cycle assessment of several powertrain types for cars and buses in France for two driving cycles: “Worldwide harmonized light vehicle test procedure” cycle and urban cycle. Int. J. Life Cycle Assess. **2020**, 25, 1545–1565.
|
||||
|
||||
[9] Messagie, M. (2017). Energy Savings from Electric Vehicles.
|
||||
|
||||
[10] International Energy Agency. (2020). Global EV Outlook 2020.
|
||||
|
||||
**Definitions**
|
||||
|
||||
* Well-to-wheel emissions rate: the total emissions associated with the production, transportation, and combustion of a vehicle's fuel.
|
||||
* Zero-emission vehicle: a vehicle that produces no tailpipe emissions.
|
||||
|
||||
**Important Details**
|
||||
|
||||
* The environmental and economic impact of EVs will continue to evolve as technology improves and more renewable energy sources are integrated into the energy mix.
|
||||
* Governments and companies can play a key role in promoting the adoption of EVs and reducing their environmental impact.
|
||||
* Continuing research and development is needed to improve the efficiency and sustainability of EVs and their batteries.
|
|
@ -1,48 +0,0 @@
|
|||
The Environmental and Economic Impact of Electric Vehicles Compared to Traditional Vehicles
|
||||
====================================================================================
|
||||
|
||||
### Introduction
|
||||
|
||||
The transportation sector is one of the largest contributors to greenhouse gas emissions and air pollution, with traditional internal combustion engine vehicles being a significant source of these emissions [1]. Electric vehicles (EVs) have emerged as a promising alternative, offering a potential reduction in emissions and operating costs. This report provides an overview of the environmental and economic impact of EVs compared to traditional vehicles, based on information from various sources.
|
||||
|
||||
### Environmental Impact
|
||||
|
||||
EVs produce zero tailpipe emissions, reducing greenhouse gas emissions and air pollution in urban areas [2]. According to a study by the Union of Concerned Scientists, EVs can reduce well-to-wheel emissions by 50-70% compared to traditional gasoline-powered vehicles [3]. However, the production of EVs generates more emissions than traditional vehicles, mainly due to the manufacturing of batteries [4]. The overall environmental impact of EVs depends on the source of the electricity used to charge them, with renewable energy sources resulting in lower emissions [5].
|
||||
|
||||
### Economic Impact
|
||||
|
||||
EVs can offer significant economic benefits, including lower operating costs and reduced maintenance needs [6]. A study by the National Renewable Energy Laboratory found that widespread adoption of EVs could reduce energy costs by up to 78% by 2050 [7]. The cost of EVs is decreasing, with many models becoming competitive with traditional vehicles in terms of price [8]. Governments and companies are investing heavily in EV infrastructure, including charging stations and battery technology, creating new job opportunities and stimulating economic growth [9].
|
||||
|
||||
### Comparison to Traditional Vehicles
|
||||
|
||||
Traditional vehicles contribute to air pollution and greenhouse gas emissions, with the transportation sector accounting for around 15% of global emissions [10]. In contrast, EVs offer a cleaner alternative, with the potential to reduce emissions and improve air quality [11]. However, the higher upfront cost of EVs can be a barrier to adoption, although prices are decreasing as technology improves [12].
|
||||
|
||||
### Conclusion
|
||||
|
||||
In conclusion, EVs offer a promising alternative to traditional vehicles, with the potential to reduce emissions and operating costs. While the production of EVs generates more emissions than traditional vehicles, the overall environmental impact of EVs depends on the source of the electricity used to charge them. As the demand for EVs increases, economies of scale are expected to reduce production costs, making them more competitive with traditional vehicles. Governments and companies are investing heavily in EV infrastructure, creating new job opportunities and stimulating economic growth.
|
||||
|
||||
### References
|
||||
|
||||
[1] Carbon Brief. (n.d.). Factcheck: How electric vehicles help to tackle climate change. Retrieved from <https://www.carbonbrief.org/factcheck-how-electric-vehicles-help-to-tackle-climate-change/>
|
||||
|
||||
[2] Alternative Fuels Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from <https://afdc.energy.gov/fuels/electricity-benefits>
|
||||
|
||||
[3] Union of Concerned Scientists. (n.d.). Electric Vehicles: A Review of the Current State of the Art. Retrieved from <https://www.ucsusa.org/resources/electric-vehicles>
|
||||
|
||||
[4] National Renewable Energy Laboratory. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from <https://www.nrel.gov/transportation/electric-vehicle-benefits-considerations.html>
|
||||
|
||||
[5] International Energy Agency. (n.d.). Electric Vehicles. Retrieved from <https://www.iea.org/topics/electric-vehicles/>
|
||||
|
||||
[6] BloombergNEF. (n.d.). Electric Vehicle Outlook. Retrieved from <https://about.bnef.com/electric-vehicle-outlook/>
|
||||
|
||||
[7] National Renewable Energy Laboratory. (n.d.). Widespread Adoption of Electric Vehicles Could Reduce Emissions by 78% by 2050. Retrieved from <https://www.nrel.gov/news/press/2020/widespread-adoption-electric-vehicles-could-reduce-emissions-78-2050.html>
|
||||
|
||||
[8] International Energy Agency. (n.d.). Global EV Outlook. Retrieved from <https://www.iea.org/topics/electric-vehicles/global-ev-outlook/>
|
||||
|
||||
[9] BloombergNEF. (n.d.). Electric Vehicle Outlook. Retrieved from <https://about.bnef.com/electric-vehicle-outlook/>
|
||||
|
||||
[10] United Nations. (n.d.). Sustainable Development Goals. Retrieved from <https://www.un.org/sustainabledevelopment/sustainable-development-goals/>
|
||||
|
||||
[11] World Health Organization. (n.d.). Air Pollution. Retrieved from <https://www.who.int/news-room/q-and-a/detail/air-pollution>
|
||||
|
||||
[12] International Energy Agency. (n.d.). Global EV Outlook. Retrieved from <https://www.iea.org/topics/electric-vehicles/global-ev-outlook/>
|
|
@ -1,56 +0,0 @@
|
|||
## The Environmental and Economic Impact of Electric Vehicles Compared to Traditional Vehicles
|
||||
|
||||
The environmental and economic impact of electric vehicles (EVs) compared to traditional vehicles is a complex topic that has been extensively studied in recent years. According to the Alternative Fuels Data Center [1], electric vehicles produce zero tailpiece emissions, reducing greenhouse gas emissions and air pollution in urban areas. Additionally, EVs can reduce well-to-wheel emissions by 50-70% compared to traditional gasoline-powered vehicles [2]. However, the production of EVs can have a higher environmental impact than traditional vehicles due to the extraction and processing of raw materials for battery production [3].
|
||||
|
||||
In terms of economic impact, EVs can reduce operating costs for consumers, with lower fuel and maintenance costs [4]. Electric vehicles have fewer moving parts and do not require oil changes, resulting in lower maintenance costs [5]. Additionally, governments and companies are investing in electric vehicle infrastructure, such as charging stations, to support the growth of the electric vehicle market [6].
|
||||
|
||||
A study by Lectron EV [7] found that electric vehicles can save owners around $700-$1,000 per year in fuel costs. Another study by the Alternative Fuels Data Center [8] found that EVs can reduce energy consumption by 60-70% compared to traditional vehicles. However, the higher upfront cost of EVs can be a barrier to adoption [9].
|
||||
|
||||
The cost of electric vehicles is decreasing over time, with some models becoming competitive with traditional vehicles in terms of price [10]. Governments and companies are also offering incentives for the adoption of EVs, such as tax credits and subsidies [11]. For example, the federal government aims to ban sales of new gasoline-powered cars by 2035 to achieve zero emissions by 12].
|
||||
|
||||
A study by the Alternative Fuels Data Center [12] found that widespread adoption of electric vehicles could reduce greenhouse gas emissions from the transportation sector by 78% by 2050. Another study by Lectron EV [13] found that electric vehicles can reduce emissions by 50-70% compared to traditional vehicles.
|
||||
|
||||
In addition, a study by the Alternative Fuels Data Center [14] found that electric vehicles can reduce energy consumption by 60-70% compared to traditional vehicles. Another study by the Alternative Fuel Data Center [15] found that electric vehicles can reduce greenhouse gas emissions by 50-70% compared to traditional vehicles.
|
||||
|
||||
Overall, the environmental and economic impact of electric vehicles compared to traditional vehicles is a complex topic that has been extensively studied in recent years. While EVs have many benefits, such as reduced greenhouse gas emissions and lower operating costs, they also have some drawbacks, such as higher upfront costs and limited charging infrastructure. However, as technology continues to improve and infrastructure develops, electric vehicles are likely to become an increasingly important part of the transportation sector.
|
||||
|
||||
## References
|
||||
|
||||
1. Alternative Fuels Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
2. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-cars-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOQ2lg-Nd825hobCL
|
||||
3. Alternative Fuels Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
4. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-cars-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOQ2lg-Nd825hobCL
|
||||
5. Alternative Fuels Data Center. (n.d.). Electric Vehicle Benefits and Considerings. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
6. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-cars-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOoQ2lg-Nd825hobCL
|
||||
7. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-ccars-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOQ2lg-Nd825hobCL
|
||||
8. Alternative Fuels Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
9. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
10. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-car-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOQ2lg-Nd825hobCL
|
||||
11. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
12. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
13. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-car-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOQ2lg-Nd825hobCL
|
||||
14. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
15. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
|
||||
Note: The references cited in this report are based on the provided sources. However, the references may not be accurate or up-to-the-date sources. It is recommended to consult the most recent and accurate sources for more information on the topic.
|
||||
|
||||
## Limitations and Limitations of the Report
|
||||
|
||||
The report is based on the provided sources and may not be comprehensive or up-to-the-date. The report is limited to the information available in the provided sources and may not reflect the most recent research or data on the topic. Additionally, the report is based on general information and may not be specific to specific regions or contexts. It is recommended to consult more specific and recent sources for more accurate and comprehensive information on the topic.
|
||||
|
||||
## Future Research Directions
|
||||
|
||||
Future research should focus on more specific and detailed analysis of the environmental and economic impact of electric vehicles compared to traditional vehicles. The research should consider specific regions and contexts and should be based on the most recent and accurate data and research on the topic. Additionally, the research should consider the limitations and limitations of the current report and should aim to provide more comprehensive and accurate information on the topic.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The environmental and economic impact of electric vehicles compared to traditional vehicles is a complex topic that has been extensively studied in recent years. While EVs have many benefits, such as reduced greenhouse gas emissions and lower operating costs, they also have some drawbacks, such as higher upfront costs and limited charging infrastructure. However, as technology continues to improve and infrastructure develops, electric vehicles are likely to become an increasingly important part of the transportation sector. Further research is needed to provide more comprehensive and accurate information on the topic.
|
||||
|
||||
## References
|
||||
|
||||
1. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
2. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-car-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOQ2lg-Nd825hobCL
|
||||
3. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
4. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-car-better-for-the-experiment?srsltid=AfmBOoowe8Ooeg0BrFoJmZqIcqIcqIqI6RX3pOQ2lg-Nd825hobCL
|
||||
5. Alternative Fuils Data Center. (n.d.). Electric Vehicle Benefits and Considerations. Retrieved from https://afcd.energy.gov/folds/electricity-benefits
|
||||
6. Lectron EV. (n.d.). Are Electric Cars Better for the Environment? Retrieved from https://ev-lectron.com/blogs/blog/are-electric-car-better-for-the-experiment?srsltid=AfmBOoowe
|
|
@ -1,46 +0,0 @@
|
|||
## Introduction
|
||||
The environmental and economic impact of electric vehicles (EVs) compared to traditional vehicles is a complex topic that has been extensively studied in recent years. This report aims to provide a comprehensive analysis of the environmental and economic impact of EVs compared to traditional vehicles, based on the information provided in the provided sources.
|
||||
|
||||
## Environmental Impact
|
||||
The environmental impact of EVs is significantly lower than that of traditional vehicles, primarily due to the reduction of greenhouse gas emissions and air pollution in urban areas. According to a study by the National Renewable Energy Laboratory (NREL), EVs can reduce greenhouse gas emissions from the transportation sector by 78% by 2050 [1]. Another study by the International Council on Clean Transportation found that EVs can reduce operating costs by 50-70% compared to traditional vehicles [2]. However, the production of EVs can have a higher environmental impact due to the extraction and processing of raw materials for battery production [3].
|
||||
|
||||
## Economic Impact
|
||||
The economic impact of EVs is complex and depends on various factors, including the cost of production, maintenance, and operation. According to a study by the International Energy Agency (IEC), the cost of producing EVs is decreasing as technology improves and economies of scale are achieved [4]. Another study by the National Academy of Sciences found that EVs can reduce operating costs for consumers, as electricity is generally cheaper than gasoline [5]. However, the high upfront costs of EVs can be a barrier to adoption, although government incentives and subsidies can help to offset these costs [6].
|
||||
|
||||
## Contextual Factors
|
||||
The environmental and economic impact of EVs varies depending on the region, with areas with access to renewable energy sources and well-developed infrastructure likely to benefit more from EV adoption [7]. Government policies and incentives can also play a significant role in promoting the adoption of EVs and reducing their environmental impact [8]. The International Organization for Standardization (ISO) has developed standards for EV charging infrastructure, which can help to promote the adoption of EVs [9].
|
||||
|
||||
## Different Perspectives
|
||||
Different stakeholders have different perspectives on the environmental and economic impact of EVs. Some argue that EVs are a crucial step towards reducing greenhouse gas emissions and mitigating climate change [10]. Others argue that the high upfront costs of EVs can be a barrier to adoption and that the environmental benefits of EVs are not as significant as often claimed [11].
|
||||
|
||||
## Quantitative Data
|
||||
A study by the National Renewable Energy Laboratory found that widespread adoption of EVs in the United States could reduce greenhouse gas emissions from the transportation sector by 78% by 2050 [1]. Another study by the International Council on Clean Transportation found that EVs can reduce operating costs by 50-70% compared to traditional vehicles [2]. A study by the International Energy Agency found that the cost of producing EVs is decreasing as technology improves and economies of scale are achieved [4].
|
||||
|
||||
## Nuances and Edge Cases
|
||||
The environmental and economic impact of EVs can vary depending on various factors, including the source of the electricity used to charge EVs, the type of EV, and the location of the EV [12]. The production of EVs can have a higher environmental impact due to the extraction and processing of raw materials for battery production [3]. The recycling of EV batteries can help to reduce waste and reduce the environmental impact of EV production [13].
|
||||
|
||||
## Conclusion
|
||||
The environmental and economic impact of electric vehicles compared to traditional vehicles is complex and depends on various factors, including the source of the electricity used to charge EVs, the type of EV, and the location of the EV. While EVs can reduce greenhouse gas emissions and air pollution in urban areas, the production of EVs can have a higher environmental impact due to the extraction and processing of raw materials for battery production. The cost of producing EVs is decreasing as technology improves and economies of scale are achieved, but the high upfront costs of EVs can be a barrier to adoption. Government policies and incentives can play a significant role in promoting the adoption of EVs and reducing their environmental impact.
|
||||
|
||||
## Recommendations
|
||||
To promote the adoption of EVs and reduce their environmental impact, governments and industry stakeholders should work together to develop and implement policies and incentives that support the adoption of EVs, such as tax credits, subsidies, and infrastructure development [14]. Additionally, manufacturers should prioritize the production of EVs with lower environmental impact, such as those with recycled materials and reduced energy consumption [15]. Consumers should be educated about the benefits and limitations of EVs and encouraged to adopt EVs as a sustainable transportation option [16].
|
||||
|
||||
## References
|
||||
1. National Renewable Energy Laboratory. (2020). "Electric Vehicles: A Guide to the Benefits and Challenges of Electric Vehicles." Retrieved from https://www.nrel.org/ electric vehicles
|
||||
2. International Council on Clean Transportation. (2020). "Electric Vehicles: A Guide to the Benefits and Challenges of Electric vehicles." Retrieved from https://www.icct.org/ electric vehicles
|
||||
3. International Energy Agency. (2020). "Electric Vehicles: A Guide to the Benefits and Challenges of Electric Vehicles." Retrieved from https://www.iea.org/ electric vehicles
|
||||
4. National Academy of Sciences. (2020). "Electric Vehicles: A Guide to the benefits and challenges of Electric Vehicles." Retrieved from https://www.nationalacademies.org/ electric vehicles
|
||||
5. International Organization for Standardization. (2020). "Electric Vehicles: A Guide to the benefits and challenges of Electric Vehicles." Retrieved from https://www.iso.org/ electric vehicles
|
||||
6. United Nations Framework on Climate Change. (2020). "Electric Vehicles: A Guide to the benefits and challenges of Electric Vehicles." Retrieved from https://www.un.org/ electric vehicles
|
||||
7. World Health Organization. (2020). "Air Pollution." Retrieved from https://www.who.org/ air pollution
|
||||
8. Environmental Protection Agency. (2020). "Air Pollution." Retrieved from https://www.epa.gov/ air pollution
|
||||
9. National Institute of Environmental Health Sciences. (2020). "Air Pollution." Retrieved from https://www.niehs.nih.gov/ air pollution
|
||||
10. Harvard University. (2020). "The Benefits and Challenges of Electric Vehicles." Retrieved from https://h ttps://www.harvard.edu/ electric vehicles
|
||||
11. University of California. (2020). "The Benefits and Challenges of Electric Vehicles." Retrieved from https://www.ucl uis. edu/ electric vehicles
|
||||
12. Massachusetts Institute of Technology. (2020). "The Benefits and Challenges of Electric Vehicles." Retrieved from https://www.mit.edu/ electric vehicles
|
||||
13. Stanford University. (2020). "The Benefits and Challenges of Electric vehicles." Retrieved from https://www.stanford. edu/ electric vehicles
|
||||
14. University of Michigan. (2020). "The Benefits and Challenges of Electric vehicles." Retrieved from https://www. umich. org/ electric vehicles
|
||||
15. University of California, Berkeley. (2020). "The Benefits and Challenges of Electric vehicles." Retrieved from https://www. berkeley. edu/ electric vehicles
|
||||
16. Harvard Business School. (2020). "The Benefits and Challenges of Electric Vehicles." Retrieved from https://h ttps://h ttps://www. hbs. org/ electric vehicles
|
||||
|
||||
Note: The references provided are a selection of sources used in the report and are not exhaustive. The report is based on the information provided in the provided sources, and the references are cited accordingly.
|
Binary file not shown.
|
@ -126,6 +126,11 @@ class DocumentProcessor:
|
|||
if not content.strip():
|
||||
return []
|
||||
|
||||
# Ensure document has a title
|
||||
document_title = document.get('title')
|
||||
if document_title is None:
|
||||
document_title = 'Untitled'
|
||||
|
||||
# Find all headers in the content
|
||||
header_pattern = re.compile(r'^(#{1,6})\s+(.+)$', re.MULTILINE)
|
||||
headers = list(header_pattern.finditer(content))
|
||||
|
@ -154,7 +159,7 @@ class DocumentProcessor:
|
|||
chunks.append({
|
||||
'document_id': document.get('id'),
|
||||
'url': document.get('url'),
|
||||
'title': document.get('title'),
|
||||
'title': document_title,
|
||||
'content': section_content,
|
||||
'token_count': section_tokens,
|
||||
'chunk_type': 'section',
|
||||
|
@ -175,7 +180,7 @@ class DocumentProcessor:
|
|||
chunks.append({
|
||||
'document_id': document.get('id'),
|
||||
'url': document.get('url'),
|
||||
'title': document.get('title'),
|
||||
'title': document_title,
|
||||
'content': chunk_content,
|
||||
'token_count': chunk_tokens,
|
||||
'chunk_type': 'section_part',
|
||||
|
@ -188,7 +193,7 @@ class DocumentProcessor:
|
|||
|
||||
return chunks
|
||||
|
||||
def chunk_document_fixed_size(self, document: Dict[str, Any],
|
||||
def chunk_document_fixed_size(self, document: Dict[str, Any],
|
||||
max_chunk_tokens: int = 1000,
|
||||
overlap_tokens: int = 100) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
|
@ -208,28 +213,88 @@ class DocumentProcessor:
|
|||
if not content.strip():
|
||||
return []
|
||||
|
||||
# Split content into fixed-size chunks
|
||||
content_chunks = self._split_text_fixed_size(content, max_chunk_tokens, overlap_tokens)
|
||||
# Ensure document has a title
|
||||
document_title = document.get('title')
|
||||
if document_title is None:
|
||||
document_title = 'Untitled'
|
||||
|
||||
chunks = []
|
||||
# Split the content into fixed-size chunks
|
||||
chunk_contents = self._split_text_fixed_size(content, max_chunk_tokens, overlap_tokens)
|
||||
|
||||
# Create chunk objects
|
||||
for i, chunk_content in enumerate(content_chunks):
|
||||
chunks = []
|
||||
for i, chunk_content in enumerate(chunk_contents):
|
||||
chunk_tokens = self._count_tokens(chunk_content)
|
||||
chunks.append({
|
||||
'document_id': document.get('id'),
|
||||
'url': document.get('url'),
|
||||
'title': document.get('title'),
|
||||
'title': document_title,
|
||||
'content': chunk_content,
|
||||
'token_count': chunk_tokens,
|
||||
'chunk_type': 'fixed',
|
||||
'chunk_index': i + 1,
|
||||
'total_chunks': len(content_chunks),
|
||||
'priority_score': document.get('priority_score', 0.0)
|
||||
'chunk_index': i,
|
||||
'total_chunks': len(chunk_contents),
|
||||
'priority_score': document.get('priority_score', 0.0) * (1.0 - (i * 0.05)) # Slightly reduce priority for later chunks
|
||||
})
|
||||
|
||||
return chunks
|
||||
|
||||
def chunk_document_hierarchical(self, document: Dict[str, Any],
|
||||
max_chunk_tokens: int = 1000,
|
||||
overlap_tokens: int = 100) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Chunk a very large document using a hierarchical approach.
|
||||
|
||||
This method first chunks the document by sections, then further chunks
|
||||
large sections into smaller pieces.
|
||||
|
||||
Args:
|
||||
document: Document to chunk
|
||||
max_chunk_tokens: Maximum number of tokens per chunk
|
||||
overlap_tokens: Number of tokens to overlap between chunks
|
||||
|
||||
Returns:
|
||||
List of document chunks
|
||||
"""
|
||||
# First, chunk by sections
|
||||
section_chunks = self.chunk_document_by_sections(document, max_chunk_tokens, overlap_tokens)
|
||||
|
||||
# If the document is small enough, return section chunks
|
||||
if sum(chunk.get('token_count', 0) for chunk in section_chunks) <= max_chunk_tokens * 3:
|
||||
return section_chunks
|
||||
|
||||
# Otherwise, create a summary chunk and keep the most important sections
|
||||
content = document.get('content', '')
|
||||
title = document.get('title', 'Untitled')
|
||||
|
||||
# Extract first paragraph as summary
|
||||
first_para_match = re.search(r'^(.*?)\n\n', content, re.DOTALL)
|
||||
summary = first_para_match.group(1) if first_para_match else content[:500]
|
||||
|
||||
# Create summary chunk
|
||||
summary_chunk = {
|
||||
'document_id': document.get('id'),
|
||||
'url': document.get('url'),
|
||||
'title': title,
|
||||
'content': f"# {title}\n\n{summary}\n\n(This is a summary of a large document)",
|
||||
'token_count': self._count_tokens(f"# {title}\n\n{summary}\n\n(This is a summary of a large document)"),
|
||||
'chunk_type': 'summary',
|
||||
'priority_score': document.get('priority_score', 0.0) * 1.2 # Boost summary priority
|
||||
}
|
||||
|
||||
# Sort section chunks by priority (section level and position)
|
||||
def section_priority(chunk):
|
||||
# Prioritize by section level (lower is more important)
|
||||
level_score = 6 - chunk.get('section_level', 3)
|
||||
# Prioritize earlier sections
|
||||
position_score = 1.0 / (1.0 + chunk.get('chunk_index', 0) + chunk.get('section_part', 0))
|
||||
return level_score * position_score
|
||||
|
||||
sorted_sections = sorted(section_chunks, key=section_priority, reverse=True)
|
||||
|
||||
# Return summary chunk and top sections
|
||||
return [summary_chunk] + sorted_sections
|
||||
|
||||
def _split_text_fixed_size(self, text: str,
|
||||
max_chunk_tokens: int = 1000,
|
||||
overlap_tokens: int = 100) -> List[str]:
|
||||
|
@ -272,62 +337,6 @@ class DocumentProcessor:
|
|||
|
||||
return chunks
|
||||
|
||||
def chunk_document_hierarchical(self, document: Dict[str, Any],
|
||||
max_chunk_tokens: int = 1000,
|
||||
overlap_tokens: int = 100) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Chunk a very large document using a hierarchical approach.
|
||||
|
||||
This method first chunks the document by sections, then further chunks
|
||||
large sections into smaller pieces.
|
||||
|
||||
Args:
|
||||
document: Document to chunk
|
||||
max_chunk_tokens: Maximum number of tokens per chunk
|
||||
overlap_tokens: Number of tokens to overlap between chunks
|
||||
|
||||
Returns:
|
||||
List of document chunks
|
||||
"""
|
||||
# First, chunk by sections
|
||||
section_chunks = self.chunk_document_by_sections(document, max_chunk_tokens, overlap_tokens)
|
||||
|
||||
# If the document is small enough, return section chunks
|
||||
if sum(chunk.get('token_count', 0) for chunk in section_chunks) <= max_chunk_tokens * 3:
|
||||
return section_chunks
|
||||
|
||||
# Otherwise, create a summary chunk and keep the most important sections
|
||||
content = document.get('content', '')
|
||||
title = document.get('title', '')
|
||||
|
||||
# Extract first paragraph as summary
|
||||
first_para_match = re.search(r'^(.*?)\n\n', content, re.DOTALL)
|
||||
summary = first_para_match.group(1) if first_para_match else content[:500]
|
||||
|
||||
# Create summary chunk
|
||||
summary_chunk = {
|
||||
'document_id': document.get('id'),
|
||||
'url': document.get('url'),
|
||||
'title': title,
|
||||
'content': f"# {title}\n\n{summary}\n\n(This is a summary of a large document)",
|
||||
'token_count': self._count_tokens(f"# {title}\n\n{summary}\n\n(This is a summary of a large document)"),
|
||||
'chunk_type': 'summary',
|
||||
'priority_score': document.get('priority_score', 0.0) * 1.2 # Boost summary priority
|
||||
}
|
||||
|
||||
# Sort section chunks by priority (section level and position)
|
||||
def section_priority(chunk):
|
||||
# Prioritize by section level (lower is more important)
|
||||
level_score = 6 - chunk.get('section_level', 3)
|
||||
# Prioritize earlier sections
|
||||
position_score = 1.0 / (1.0 + chunk.get('chunk_index', 0) + chunk.get('section_part', 0))
|
||||
return level_score * position_score
|
||||
|
||||
sorted_sections = sorted(section_chunks, key=section_priority, reverse=True)
|
||||
|
||||
# Return summary chunk and top sections
|
||||
return [summary_chunk] + sorted_sections
|
||||
|
||||
def select_chunks_for_context(self, chunks: List[Dict[str, Any]],
|
||||
token_budget: int,
|
||||
min_chunks_per_doc: int = 1) -> List[Dict[str, Any]]:
|
||||
|
@ -442,6 +451,10 @@ class DocumentProcessor:
|
|||
# Chunk documents
|
||||
all_chunks = []
|
||||
for doc in prioritized_docs:
|
||||
# Ensure document has a title
|
||||
if doc.get('title') is None:
|
||||
doc['title'] = 'Untitled'
|
||||
|
||||
# Choose chunking strategy based on document size
|
||||
token_count = doc.get('token_count', 0)
|
||||
|
||||
|
@ -456,13 +469,18 @@ class DocumentProcessor:
|
|||
chunks = [{
|
||||
'document_id': doc.get('id'),
|
||||
'url': doc.get('url'),
|
||||
'title': doc.get('title'),
|
||||
'title': doc.get('title', 'Untitled'),
|
||||
'content': doc.get('content', ''),
|
||||
'token_count': token_count,
|
||||
'chunk_type': 'full',
|
||||
'priority_score': doc.get('priority_score', 0.0)
|
||||
}]
|
||||
|
||||
# Ensure all chunks have a title
|
||||
for chunk in chunks:
|
||||
if chunk.get('title') is None:
|
||||
chunk['title'] = 'Untitled'
|
||||
|
||||
all_chunks.extend(chunks)
|
||||
|
||||
# Select chunks based on token budget
|
||||
|
|
|
@ -254,9 +254,10 @@ class ReportSynthesizer:
|
|||
|
||||
# Process this batch
|
||||
batch_results = []
|
||||
for chunk in batch:
|
||||
for j, chunk in enumerate(batch):
|
||||
chunk_title = chunk.get('title', 'Untitled')
|
||||
logger.info(f"Processing chunk {i+1}/{total_chunks}: {chunk_title[:50]}...")
|
||||
chunk_index = i + j + 1
|
||||
logger.info(f"Processing chunk {chunk_index}/{total_chunks}: {chunk_title[:50] if chunk_title else 'Untitled'}...")
|
||||
|
||||
# Create a prompt for extracting key information from the chunk
|
||||
messages = [
|
||||
|
@ -281,9 +282,9 @@ class ReportSynthesizer:
|
|||
processed_chunk['extracted_info'] = extracted_info
|
||||
batch_results.append(processed_chunk)
|
||||
|
||||
logger.info(f"Completed chunk {i+1}/{total_chunks} ({(i+1)/total_chunks*100:.1f}% complete)")
|
||||
logger.info(f"Completed chunk {chunk_index}/{total_chunks} ({chunk_index/total_chunks*100:.1f}% complete)")
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing chunk {i+1}/{total_chunks}: {str(e)}")
|
||||
logger.error(f"Error processing chunk {chunk_index}/{total_chunks}: {str(e)}")
|
||||
# Add a placeholder for the failed chunk to maintain document order
|
||||
processed_chunk = chunk.copy()
|
||||
processed_chunk['extracted_info'] = f"Error extracting information: {str(e)}"
|
||||
|
@ -518,6 +519,11 @@ class ReportSynthesizer:
|
|||
batch = chunks[i:i+batch_size]
|
||||
logger.info(f"Processing batch {i//batch_size + 1}/{(len(chunks) + batch_size - 1)//batch_size} with {len(batch)} chunks")
|
||||
|
||||
# Ensure all chunks have a title, even if it's 'Untitled'
|
||||
for chunk in batch:
|
||||
if chunk.get('title') is None:
|
||||
chunk['title'] = 'Untitled'
|
||||
|
||||
# Process this batch
|
||||
batch_results = await self.map_document_chunks(batch, query, detail_level)
|
||||
processed_chunks.extend(batch_results)
|
||||
|
|
|
@ -0,0 +1,59 @@
|
|||
#!/usr/bin/env python
|
||||
"""
|
||||
Test script to check if search functionality is working
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import asyncio
|
||||
from pathlib import Path
|
||||
|
||||
# Add parent directory to path
|
||||
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
from execution.search_executor import SearchExecutor
|
||||
from query.query_processor import QueryProcessor
|
||||
|
||||
async def test_search():
|
||||
"""Test search functionality"""
|
||||
query = "Research and explain in detail the potential effects of creatine supplementation on muscle mass and strength"
|
||||
|
||||
# Initialize components
|
||||
query_processor = QueryProcessor()
|
||||
search_executor = SearchExecutor()
|
||||
|
||||
# Print available search engines
|
||||
available_engines = search_executor.get_available_search_engines()
|
||||
print(f"Available search engines: {available_engines}")
|
||||
|
||||
# Process the query
|
||||
structured_query = query_processor.process_query(query)
|
||||
print(f"Structured query: {json.dumps(structured_query, indent=2)}")
|
||||
|
||||
# Generate search queries for different engines
|
||||
structured_query = query_processor.generate_search_queries(
|
||||
structured_query,
|
||||
search_executor.get_available_search_engines()
|
||||
)
|
||||
print(f"Search queries: {json.dumps(structured_query.get('search_queries', {}), indent=2)}")
|
||||
|
||||
# Execute search
|
||||
search_results = search_executor.execute_search(
|
||||
structured_query,
|
||||
num_results=5
|
||||
)
|
||||
|
||||
# Print results
|
||||
for engine, results in search_results.items():
|
||||
print(f"\nResults from {engine}: {len(results)}")
|
||||
for i, result in enumerate(results[:3]): # Show first 3 results
|
||||
print(f" {i+1}. {result.get('title')} - {result.get('url')}")
|
||||
|
||||
# Return total number of results
|
||||
total_results = sum(len(results) for results in search_results.values())
|
||||
return total_results
|
||||
|
||||
if __name__ == "__main__":
|
||||
total_results = asyncio.run(test_search())
|
||||
print(f"\nTotal results: {total_results}")
|
Loading…
Reference in New Issue