ira/.note/code_structure.md

143 lines
5.9 KiB
Markdown

# Code Structure
## Current Project Organization
```
sim-search/
├── config/
│ ├── __init__.py
│ ├── config.py # Configuration management
│ └── config.yaml # Configuration file
├── query/
│ ├── __init__.py
│ ├── query_processor.py # Module for processing user queries
│ ├── query_classifier.py # Module for classifying query types
│ └── llm_interface.py # Module for interacting with LLM providers
├── execution/
│ ├── __init__.py
│ ├── search_executor.py # Module for executing search queries
│ ├── result_collector.py # Module for collecting search results
│ └── api_handlers/ # Handlers for different search APIs
│ ├── __init__.py
│ ├── base_handler.py # Base class for search handlers
│ ├── serper_handler.py # Handler for Serper API (Google search)
│ ├── scholar_handler.py # Handler for Google Scholar via Serper
│ └── arxiv_handler.py # Handler for arXiv API
├── ranking/
│ ├── __init__.py
│ ├── jina_reranker.py # Module for reranking documents using Jina AI
│ └── filter_manager.py # Module for filtering documents
├── test_search_execution.py # Test script for search execution
├── test_all_handlers.py # Test script for all search handlers
├── requirements.txt # Project dependencies
└── search_execution_test_results.json # Test results
```
## Module Details
### Config Module
The `config` module manages configuration settings for the entire system, including API keys, model selections, and other parameters.
### Files
- `__init__.py`: Package initialization file
- `config.py`: Configuration management class
- `config.yaml`: YAML configuration file with settings for different components
### Classes
- `Config`: Singleton class for loading and accessing configuration settings
- `load_config(config_path)`: Loads configuration from a YAML file
- `get(key, default=None)`: Gets a configuration value by key
### Query Module
The `query` module handles the processing and enhancement of user queries, including classification and optimization for search.
### Files
- `__init__.py`: Package initialization file
- `query_processor.py`: Main module for processing user queries
- `query_classifier.py`: Module for classifying query types
- `llm_interface.py`: Interface for interacting with LLM providers
### Classes
- `QueryProcessor`: Main class for processing user queries
- `process_query(query)`: Processes a user query and returns enhanced results
- `classify_query(query)`: Classifies a query by type and intent
- `generate_search_queries(query, classification)`: Generates optimized search queries
- `QueryClassifier`: Class for classifying queries
- `classify(query)`: Classifies a query by type, intent, and entities
- `LLMInterface`: Interface for interacting with LLM providers
- `get_completion(prompt, model=None)`: Gets a completion from an LLM
- `enhance_query(query)`: Enhances a query with additional context
- `classify_query(query)`: Uses an LLM to classify a query
### Execution Module
The `execution` module handles the execution of search queries across multiple search engines and the collection of results.
### Files
- `__init__.py`: Package initialization file
- `search_executor.py`: Module for executing search queries
- `result_collector.py`: Module for collecting and processing search results
- `api_handlers/`: Directory containing handlers for different search APIs
- `__init__.py`: Package initialization file
- `base_handler.py`: Base class for search handlers
- `serper_handler.py`: Handler for Serper API (Google search)
- `scholar_handler.py`: Handler for Google Scholar via Serper
- `arxiv_handler.py`: Handler for arXiv API
### Classes
- `SearchExecutor`: Class for executing search queries
- `execute_search(query_data)`: Executes a search across multiple engines
- `_execute_search_async(query, engines)`: Executes a search asynchronously
- `_execute_search_sync(query, engines)`: Executes a search synchronously
- `ResultCollector`: Class for collecting and processing search results
- `process_results(search_results)`: Processes search results from multiple engines
- `deduplicate_results(results)`: Deduplicates results based on URL
- `save_results(results, file_path)`: Saves results to a file
- `BaseSearchHandler`: Base class for search handlers
- `search(query, num_results)`: Abstract method for searching
- `_process_response(response)`: Processes the API response
- `SerperSearchHandler`: Handler for Serper API
- `search(query, num_results)`: Searches using Serper API
- `_process_response(response)`: Processes the Serper API response
- `ScholarSearchHandler`: Handler for Google Scholar via Serper
- `search(query, num_results)`: Searches Google Scholar
- `_process_response(response)`: Processes the Scholar API response
- `ArxivSearchHandler`: Handler for arXiv API
- `search(query, num_results)`: Searches arXiv
- `_process_response(response)`: Processes the arXiv API response
### Ranking Module
The `ranking` module provides functionality for reranking and prioritizing documents based on their relevance to the user's query.
### Files
- `__init__.py`: Package initialization file
- `jina_reranker.py`: Module for reranking documents using Jina AI
- `filter_manager.py`: Module for filtering documents
### Classes
- `JinaReranker`: Class for reranking documents
- `rerank(documents, query)`: Reranks documents based on relevance to query
- `_prepare_inputs(documents, query)`: Prepares inputs for the reranker
- `FilterManager`: Class for filtering documents
- `filter_by_date(documents, start_date, end_date)`: Filters by date
- `filter_by_source(documents, sources)`: Filters by source