# Code Structure ## Current Project Organization ``` sim-search/ ├── config/ │ ├── __init__.py │ ├── config.py # Configuration management │ └── config.yaml # Configuration file ├── query/ │ ├── __init__.py │ ├── query_processor.py # Module for processing user queries │ ├── query_classifier.py # Module for classifying query types │ └── llm_interface.py # Module for interacting with LLM providers ├── execution/ │ ├── __init__.py │ ├── search_executor.py # Module for executing search queries │ ├── result_collector.py # Module for collecting search results │ └── api_handlers/ # Handlers for different search APIs │ ├── __init__.py │ ├── base_handler.py # Base class for search handlers │ ├── serper_handler.py # Handler for Serper API (Google search) │ ├── scholar_handler.py # Handler for Google Scholar via Serper │ └── arxiv_handler.py # Handler for arXiv API ├── ranking/ │ ├── __init__.py │ ├── jina_reranker.py # Module for reranking documents using Jina AI │ └── filter_manager.py # Module for filtering documents ├── test_search_execution.py # Test script for search execution ├── test_all_handlers.py # Test script for all search handlers ├── requirements.txt # Project dependencies └── search_execution_test_results.json # Test results ``` ## Module Details ### Config Module The `config` module manages configuration settings for the entire system, including API keys, model selections, and other parameters. ### Files - `__init__.py`: Package initialization file - `config.py`: Configuration management class - `config.yaml`: YAML configuration file with settings for different components ### Classes - `Config`: Singleton class for loading and accessing configuration settings - `load_config(config_path)`: Loads configuration from a YAML file - `get(key, default=None)`: Gets a configuration value by key ### Query Module The `query` module handles the processing and enhancement of user queries, including classification and optimization for search. ### Files - `__init__.py`: Package initialization file - `query_processor.py`: Main module for processing user queries - `query_classifier.py`: Module for classifying query types - `llm_interface.py`: Interface for interacting with LLM providers ### Classes - `QueryProcessor`: Main class for processing user queries - `process_query(query)`: Processes a user query and returns enhanced results - `classify_query(query)`: Classifies a query by type and intent - `generate_search_queries(query, classification)`: Generates optimized search queries - `QueryClassifier`: Class for classifying queries - `classify(query)`: Classifies a query by type, intent, and entities - `LLMInterface`: Interface for interacting with LLM providers - `get_completion(prompt, model=None)`: Gets a completion from an LLM - `enhance_query(query)`: Enhances a query with additional context - `classify_query(query)`: Uses an LLM to classify a query ### Execution Module The `execution` module handles the execution of search queries across multiple search engines and the collection of results. ### Files - `__init__.py`: Package initialization file - `search_executor.py`: Module for executing search queries - `result_collector.py`: Module for collecting and processing search results - `api_handlers/`: Directory containing handlers for different search APIs - `__init__.py`: Package initialization file - `base_handler.py`: Base class for search handlers - `serper_handler.py`: Handler for Serper API (Google search) - `scholar_handler.py`: Handler for Google Scholar via Serper - `arxiv_handler.py`: Handler for arXiv API ### Classes - `SearchExecutor`: Class for executing search queries - `execute_search(query_data)`: Executes a search across multiple engines - `_execute_search_async(query, engines)`: Executes a search asynchronously - `_execute_search_sync(query, engines)`: Executes a search synchronously - `ResultCollector`: Class for collecting and processing search results - `process_results(search_results)`: Processes search results from multiple engines - `deduplicate_results(results)`: Deduplicates results based on URL - `save_results(results, file_path)`: Saves results to a file - `BaseSearchHandler`: Base class for search handlers - `search(query, num_results)`: Abstract method for searching - `_process_response(response)`: Processes the API response - `SerperSearchHandler`: Handler for Serper API - `search(query, num_results)`: Searches using Serper API - `_process_response(response)`: Processes the Serper API response - `ScholarSearchHandler`: Handler for Google Scholar via Serper - `search(query, num_results)`: Searches Google Scholar - `_process_response(response)`: Processes the Scholar API response - `ArxivSearchHandler`: Handler for arXiv API - `search(query, num_results)`: Searches arXiv - `_process_response(response)`: Processes the arXiv API response ### Ranking Module The `ranking` module provides functionality for reranking and prioritizing documents based on their relevance to the user's query. ### Files - `__init__.py`: Package initialization file - `jina_reranker.py`: Module for reranking documents using Jina AI - `filter_manager.py`: Module for filtering documents ### Classes - `JinaReranker`: Class for reranking documents - `rerank(documents, query)`: Reranks documents based on relevance to query - `_prepare_inputs(documents, query)`: Prepares inputs for the reranker - `FilterManager`: Class for filtering documents - `filter_by_date(documents, start_date, end_date)`: Filters by date - `filter_by_source(documents, sources)`: Filters by source