143 lines
5.9 KiB
Markdown
143 lines
5.9 KiB
Markdown
# Code Structure
|
|
|
|
## Current Project Organization
|
|
|
|
```
|
|
sim-search/
|
|
├── config/
|
|
│ ├── __init__.py
|
|
│ ├── config.py # Configuration management
|
|
│ └── config.yaml # Configuration file
|
|
├── query/
|
|
│ ├── __init__.py
|
|
│ ├── query_processor.py # Module for processing user queries
|
|
│ ├── query_classifier.py # Module for classifying query types
|
|
│ └── llm_interface.py # Module for interacting with LLM providers
|
|
├── execution/
|
|
│ ├── __init__.py
|
|
│ ├── search_executor.py # Module for executing search queries
|
|
│ ├── result_collector.py # Module for collecting search results
|
|
│ └── api_handlers/ # Handlers for different search APIs
|
|
│ ├── __init__.py
|
|
│ ├── base_handler.py # Base class for search handlers
|
|
│ ├── serper_handler.py # Handler for Serper API (Google search)
|
|
│ ├── scholar_handler.py # Handler for Google Scholar via Serper
|
|
│ └── arxiv_handler.py # Handler for arXiv API
|
|
├── ranking/
|
|
│ ├── __init__.py
|
|
│ ├── jina_reranker.py # Module for reranking documents using Jina AI
|
|
│ └── filter_manager.py # Module for filtering documents
|
|
├── test_search_execution.py # Test script for search execution
|
|
├── test_all_handlers.py # Test script for all search handlers
|
|
├── requirements.txt # Project dependencies
|
|
└── search_execution_test_results.json # Test results
|
|
```
|
|
|
|
## Module Details
|
|
|
|
### Config Module
|
|
|
|
The `config` module manages configuration settings for the entire system, including API keys, model selections, and other parameters.
|
|
|
|
### Files
|
|
|
|
- `__init__.py`: Package initialization file
|
|
- `config.py`: Configuration management class
|
|
- `config.yaml`: YAML configuration file with settings for different components
|
|
|
|
### Classes
|
|
|
|
- `Config`: Singleton class for loading and accessing configuration settings
|
|
- `load_config(config_path)`: Loads configuration from a YAML file
|
|
- `get(key, default=None)`: Gets a configuration value by key
|
|
|
|
### Query Module
|
|
|
|
The `query` module handles the processing and enhancement of user queries, including classification and optimization for search.
|
|
|
|
### Files
|
|
|
|
- `__init__.py`: Package initialization file
|
|
- `query_processor.py`: Main module for processing user queries
|
|
- `query_classifier.py`: Module for classifying query types
|
|
- `llm_interface.py`: Interface for interacting with LLM providers
|
|
|
|
### Classes
|
|
|
|
- `QueryProcessor`: Main class for processing user queries
|
|
- `process_query(query)`: Processes a user query and returns enhanced results
|
|
- `classify_query(query)`: Classifies a query by type and intent
|
|
- `generate_search_queries(query, classification)`: Generates optimized search queries
|
|
|
|
- `QueryClassifier`: Class for classifying queries
|
|
- `classify(query)`: Classifies a query by type, intent, and entities
|
|
|
|
- `LLMInterface`: Interface for interacting with LLM providers
|
|
- `get_completion(prompt, model=None)`: Gets a completion from an LLM
|
|
- `enhance_query(query)`: Enhances a query with additional context
|
|
- `classify_query(query)`: Uses an LLM to classify a query
|
|
|
|
### Execution Module
|
|
|
|
The `execution` module handles the execution of search queries across multiple search engines and the collection of results.
|
|
|
|
### Files
|
|
|
|
- `__init__.py`: Package initialization file
|
|
- `search_executor.py`: Module for executing search queries
|
|
- `result_collector.py`: Module for collecting and processing search results
|
|
- `api_handlers/`: Directory containing handlers for different search APIs
|
|
- `__init__.py`: Package initialization file
|
|
- `base_handler.py`: Base class for search handlers
|
|
- `serper_handler.py`: Handler for Serper API (Google search)
|
|
- `scholar_handler.py`: Handler for Google Scholar via Serper
|
|
- `arxiv_handler.py`: Handler for arXiv API
|
|
|
|
### Classes
|
|
|
|
- `SearchExecutor`: Class for executing search queries
|
|
- `execute_search(query_data)`: Executes a search across multiple engines
|
|
- `_execute_search_async(query, engines)`: Executes a search asynchronously
|
|
- `_execute_search_sync(query, engines)`: Executes a search synchronously
|
|
|
|
- `ResultCollector`: Class for collecting and processing search results
|
|
- `process_results(search_results)`: Processes search results from multiple engines
|
|
- `deduplicate_results(results)`: Deduplicates results based on URL
|
|
- `save_results(results, file_path)`: Saves results to a file
|
|
|
|
- `BaseSearchHandler`: Base class for search handlers
|
|
- `search(query, num_results)`: Abstract method for searching
|
|
- `_process_response(response)`: Processes the API response
|
|
|
|
- `SerperSearchHandler`: Handler for Serper API
|
|
- `search(query, num_results)`: Searches using Serper API
|
|
- `_process_response(response)`: Processes the Serper API response
|
|
|
|
- `ScholarSearchHandler`: Handler for Google Scholar via Serper
|
|
- `search(query, num_results)`: Searches Google Scholar
|
|
- `_process_response(response)`: Processes the Scholar API response
|
|
|
|
- `ArxivSearchHandler`: Handler for arXiv API
|
|
- `search(query, num_results)`: Searches arXiv
|
|
- `_process_response(response)`: Processes the arXiv API response
|
|
|
|
### Ranking Module
|
|
|
|
The `ranking` module provides functionality for reranking and prioritizing documents based on their relevance to the user's query.
|
|
|
|
### Files
|
|
|
|
- `__init__.py`: Package initialization file
|
|
- `jina_reranker.py`: Module for reranking documents using Jina AI
|
|
- `filter_manager.py`: Module for filtering documents
|
|
|
|
### Classes
|
|
|
|
- `JinaReranker`: Class for reranking documents
|
|
- `rerank(documents, query)`: Reranks documents based on relevance to query
|
|
- `_prepare_inputs(documents, query)`: Prepares inputs for the reranker
|
|
|
|
- `FilterManager`: Class for filtering documents
|
|
- `filter_by_date(documents, start_date, end_date)`: Filters by date
|
|
- `filter_by_source(documents, sources)`: Filters by source
|