Update memory bank to reflect project directory reorganization
This commit is contained in:
parent
7f440286bc
commit
7744249d65
|
@ -11,7 +11,6 @@ sim-search/
|
||||||
├── query/
|
├── query/
|
||||||
│ ├── __init__.py
|
│ ├── __init__.py
|
||||||
│ ├── query_processor.py # Module for processing user queries
|
│ ├── query_processor.py # Module for processing user queries
|
||||||
│ ├── query_classifier.py # Module for classifying query types
|
|
||||||
│ └── llm_interface.py # Module for interacting with LLM providers
|
│ └── llm_interface.py # Module for interacting with LLM providers
|
||||||
├── execution/
|
├── execution/
|
||||||
│ ├── __init__.py
|
│ ├── __init__.py
|
||||||
|
@ -22,15 +21,68 @@ sim-search/
|
||||||
│ ├── base_handler.py # Base class for search handlers
|
│ ├── base_handler.py # Base class for search handlers
|
||||||
│ ├── serper_handler.py # Handler for Serper API (Google search)
|
│ ├── serper_handler.py # Handler for Serper API (Google search)
|
||||||
│ ├── scholar_handler.py # Handler for Google Scholar via Serper
|
│ ├── scholar_handler.py # Handler for Google Scholar via Serper
|
||||||
|
│ ├── google_handler.py # Handler for Google search
|
||||||
│ └── arxiv_handler.py # Handler for arXiv API
|
│ └── arxiv_handler.py # Handler for arXiv API
|
||||||
├── ranking/
|
├── ranking/
|
||||||
│ ├── __init__.py
|
│ ├── __init__.py
|
||||||
│ ├── jina_reranker.py # Module for reranking documents using Jina AI
|
│ └── jina_reranker.py # Module for reranking documents using Jina AI
|
||||||
│ └── filter_manager.py # Module for filtering documents
|
├── report/
|
||||||
├── test_search_execution.py # Test script for search execution
|
│ ├── __init__.py
|
||||||
├── test_all_handlers.py # Test script for all search handlers
|
│ ├── report_generator.py # Module for generating reports
|
||||||
├── requirements.txt # Project dependencies
|
│ ├── report_synthesis.py # Module for synthesizing reports
|
||||||
└── search_execution_test_results.json # Test results
|
│ ├── document_processor.py # Module for processing documents
|
||||||
|
│ ├── document_scraper.py # Module for scraping documents
|
||||||
|
│ ├── report_detail_levels.py # Module for managing report detail levels
|
||||||
|
│ └── database/ # Database for storing reports
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ └── db_manager.py # Module for managing the database
|
||||||
|
├── ui/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ └── gradio_interface.py # Gradio-based web interface
|
||||||
|
├── utils/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── jina_similarity.py # Module for computing text similarity
|
||||||
|
│ └── markdown_segmenter.py # Module for segmenting markdown documents
|
||||||
|
├── scripts/
|
||||||
|
│ └── query_to_report.py # Script for generating reports from queries
|
||||||
|
├── tests/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── query/ # Tests for query module
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── test_query_processor.py
|
||||||
|
│ │ ├── test_query_processor_comprehensive.py
|
||||||
|
│ │ └── test_llm_interface.py
|
||||||
|
│ ├── execution/ # Tests for execution module
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── test_search.py
|
||||||
|
│ │ ├── test_search_execution.py
|
||||||
|
│ │ └── test_all_handlers.py
|
||||||
|
│ ├── ranking/ # Tests for ranking module
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── test_reranker.py
|
||||||
|
│ │ ├── test_similarity.py
|
||||||
|
│ │ └── test_simple_reranker.py
|
||||||
|
│ ├── report/ # Tests for report module
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── test_custom_model.py
|
||||||
|
│ │ └── test_detail_levels.py
|
||||||
|
│ ├── ui/ # Tests for UI module
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ └── test_ui_search.py
|
||||||
|
│ ├── integration/ # Integration tests
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── test_ev_query.py
|
||||||
|
│ │ └── test_query_to_report.py
|
||||||
|
│ ├── test_document_processor.py
|
||||||
|
│ ├── test_document_scraper.py
|
||||||
|
│ └── test_report_synthesis.py
|
||||||
|
├── examples/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── data/ # Example data files
|
||||||
|
│ └── scripts/ # Example scripts
|
||||||
|
│ └── __init__.py
|
||||||
|
├── run_ui.py # Script to run the UI
|
||||||
|
└── requirements.txt # Project dependencies
|
||||||
```
|
```
|
||||||
|
|
||||||
## Module Details
|
## Module Details
|
||||||
|
|
|
@ -1,52 +1,55 @@
|
||||||
# Current Focus: Google Gemini Integration, Reference Formatting, and NoneType Error Fixes
|
# Current Focus: Project Directory Reorganization, Testing, and Embedding Usage
|
||||||
|
|
||||||
## Active Work
|
## Active Work
|
||||||
|
|
||||||
### Google Gemini Integration
|
### Project Directory Reorganization
|
||||||
- ✅ Fixed the integration of Google Gemini models with LiteLLM
|
- ✅ Reorganized project directory structure for better maintainability
|
||||||
- ✅ Updated message formatting for Gemini models
|
- ✅ Moved utility scripts to the `utils/` directory
|
||||||
- ✅ Added proper handling for the 'gemini' provider in environment variables
|
- ✅ Organized test files into subdirectories under `tests/`
|
||||||
- ✅ Fixed reference formatting issues with Gemini models
|
- ✅ Moved sample data to the `examples/data/` directory
|
||||||
- ✅ Converted LLM interface methods to async to fix runtime errors
|
- ✅ Created proper `__init__.py` files for all packages
|
||||||
|
- ✅ Verified pipeline functionality after reorganization
|
||||||
|
|
||||||
### Gradio UI Updates
|
### Embedding Usage Analysis
|
||||||
- ✅ Updated the Gradio interface to handle async methods
|
- ✅ Confirmed that the pipeline uses Jina AI's Embeddings API through the `JinaSimilarity` class
|
||||||
- ✅ Fixed parameter ordering in the report generation function
|
- ✅ Verified that the `JinaReranker` class uses embeddings for document reranking
|
||||||
- ✅ Improved error handling in the UI
|
- ✅ Analyzed how embeddings are integrated into the search and ranking process
|
||||||
|
|
||||||
### Bug Fixes
|
### Pipeline Testing
|
||||||
- ✅ Fixed NoneType error in report synthesis when chunk titles are None
|
- ✅ Tested the pipeline after reorganization to ensure functionality
|
||||||
- ✅ Added defensive null checks throughout document processing and report synthesis
|
- ✅ Verified that the UI works correctly with the new directory structure
|
||||||
- ✅ Improved chunk counter in map_document_chunks method
|
- ✅ Confirmed that all imports are working properly with the new structure
|
||||||
|
|
||||||
## Recent Changes
|
## Recent Changes
|
||||||
|
|
||||||
### Reference Formatting Improvements
|
### Directory Structure Reorganization
|
||||||
- Enhanced the instructions for reference formatting to ensure URLs are included
|
- Created a dedicated `utils/` directory for utility scripts
|
||||||
- Added a recovery mechanism for truncated references
|
- Moved `jina_similarity.py` to `utils/`
|
||||||
- Improved context preparation to better extract URLs for references
|
- Added `__init__.py` to make it a proper Python package
|
||||||
- Added duplicate URL fields in the context to emphasize their importance
|
- Organized test files into subdirectories under `tests/`
|
||||||
|
- Created subdirectories for each module (query, execution, ranking, report, ui, integration)
|
||||||
|
- Added `__init__.py` files to all test directories
|
||||||
|
- Created an `examples/` directory with subdirectories for data and scripts
|
||||||
|
- Moved sample data to `examples/data/`
|
||||||
|
- Added `__init__.py` files to make them proper Python packages
|
||||||
|
- Added a dedicated `scripts/` directory for utility scripts
|
||||||
|
- Moved `query_to_report.py` to `scripts/`
|
||||||
|
|
||||||
### Async LLM Interface
|
### Pipeline Verification
|
||||||
- Made `generate_completion`, `classify_query`, `enhance_query`, and `generate_search_queries` methods async
|
- Verified that the pipeline functions correctly after reorganization
|
||||||
- Updated dependent code to properly await these methods
|
- Confirmed that the `JinaSimilarity` class in `utils/jina_similarity.py` is properly used for embeddings
|
||||||
- Fixed runtime errors related to async/await patterns in the QueryProcessor
|
- Tested the reranking functionality with the `JinaReranker` class
|
||||||
|
- Checked that the report generation process works with the new structure
|
||||||
### Error Handling Improvements
|
|
||||||
- Added null checks for chunk titles in report synthesis
|
|
||||||
- Improved chunk counter in map_document_chunks method
|
|
||||||
- Added defensive code to ensure all chunks have titles
|
|
||||||
- Updated document processor to handle None titles with default values
|
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
1. Continue testing with Gemini models to ensure stable operation
|
1. Run comprehensive tests to ensure all functionality works with the new directory structure
|
||||||
2. Consider adding more robust error handling for LLM provider-specific issues
|
2. Update any remaining documentation to reflect the new directory structure
|
||||||
3. Improve the reference formatting further if needed
|
3. Consider moving the remaining test files in the root of the `tests/` directory to appropriate subdirectories
|
||||||
4. Update documentation to reflect the changes made to the LLM interface
|
4. Review import statements throughout the codebase to ensure they follow the new structure
|
||||||
5. Consider adding more unit tests for the async methods
|
5. Add more comprehensive documentation about the directory structure
|
||||||
6. Add more comprehensive null checks throughout the codebase
|
6. Consider creating a development guide for new contributors
|
||||||
7. Implement better error handling and recovery mechanisms
|
7. Implement automated tests to verify the directory structure remains consistent
|
||||||
|
|
||||||
### Future Enhancements
|
### Future Enhancements
|
||||||
|
|
||||||
|
|
|
@ -644,6 +644,60 @@ Fixed reference formatting issues with Gemini models and updated the codebase to
|
||||||
- Fixed the parameter order in the lambda function for async execution
|
- Fixed the parameter order in the lambda function for async execution
|
||||||
- Improved error handling in the UI
|
- Improved error handling in the UI
|
||||||
|
|
||||||
|
## Session: 2025-03-11
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
Reorganized the project directory structure to improve maintainability and clarity, ensuring all components are properly organized into their respective directories.
|
||||||
|
|
||||||
|
### Key Activities
|
||||||
|
|
||||||
|
1. **Directory Structure Reorganization**:
|
||||||
|
|
||||||
|
- Created a dedicated `utils/` directory for utility scripts
|
||||||
|
- Moved `jina_similarity.py` to `utils/`
|
||||||
|
- Added `__init__.py` to make it a proper Python package
|
||||||
|
- Organized test files into subdirectories under `tests/`
|
||||||
|
- Created subdirectories for each module (query, execution, ranking, report, ui, integration)
|
||||||
|
- Added `__init__.py` files to all test directories
|
||||||
|
- Created an `examples/` directory with subdirectories for data and scripts
|
||||||
|
- Moved sample data to `examples/data/`
|
||||||
|
- Added `__init__.py` files to make them proper Python packages
|
||||||
|
- Added a dedicated `scripts/` directory for utility scripts
|
||||||
|
- Moved `query_to_report.py` to `scripts/`
|
||||||
|
|
||||||
|
2. **Pipeline Verification**:
|
||||||
|
|
||||||
|
- Tested the pipeline after reorganization to ensure functionality
|
||||||
|
- Verified that the UI works correctly with the new directory structure
|
||||||
|
- Confirmed that all imports are working properly with the new structure
|
||||||
|
|
||||||
|
3. **Embedding Usage Analysis**:
|
||||||
|
|
||||||
|
- Confirmed that the pipeline uses Jina AI's Embeddings API through the `JinaSimilarity` class
|
||||||
|
- Verified that the `JinaReranker` class uses embeddings for document reranking
|
||||||
|
- Analyzed how embeddings are integrated into the search and ranking process
|
||||||
|
|
||||||
|
### Insights
|
||||||
|
|
||||||
|
- A well-organized directory structure significantly improves code maintainability and readability
|
||||||
|
- Using proper Python package structure with `__init__.py` files ensures clean imports
|
||||||
|
- Separating tests, utilities, examples, and scripts into dedicated directories makes the codebase more navigable
|
||||||
|
- The Jina AI embeddings are used throughout the pipeline for semantic similarity and document reranking
|
||||||
|
|
||||||
|
### Challenges
|
||||||
|
|
||||||
|
- Ensuring all import statements are updated correctly after moving files
|
||||||
|
- Maintaining backward compatibility with existing code
|
||||||
|
- Verifying that all components still work together after reorganization
|
||||||
|
|
||||||
|
### Next Steps
|
||||||
|
|
||||||
|
1. Run comprehensive tests to ensure all functionality works with the new directory structure
|
||||||
|
2. Update any remaining documentation to reflect the new directory structure
|
||||||
|
3. Consider moving the remaining test files in the root of the `tests/` directory to appropriate subdirectories
|
||||||
|
4. Review import statements throughout the codebase to ensure they follow the new structure
|
||||||
|
|
||||||
### Key Insights
|
### Key Insights
|
||||||
- Async/await patterns need to be consistently applied throughout the codebase
|
- Async/await patterns need to be consistently applied throughout the codebase
|
||||||
- Reference formatting requires explicit instructions to include URLs
|
- Reference formatting requires explicit instructions to include URLs
|
||||||
|
|
Loading…
Reference in New Issue