# Code Structure ## Current Project Organization ``` sim-search/ ├── config/ │ ├── __init__.py │ ├── config.py # Configuration management │ └── config.yaml # Configuration file ├── query/ │ ├── __init__.py │ ├── query_processor.py # Module for processing user queries │ └── llm_interface.py # Module for interacting with LLM providers ├── execution/ │ ├── __init__.py │ ├── search_executor.py # Module for executing search queries │ ├── result_collector.py # Module for collecting search results │ └── api_handlers/ # Handlers for different search APIs │ ├── __init__.py │ ├── base_handler.py # Base class for search handlers │ ├── serper_handler.py # Handler for Serper API (Google search) │ ├── scholar_handler.py # Handler for Google Scholar via Serper │ ├── google_handler.py # Handler for Google search │ └── arxiv_handler.py # Handler for arXiv API ├── ranking/ │ ├── __init__.py │ └── jina_reranker.py # Module for reranking documents using Jina AI ├── report/ │ ├── __init__.py │ ├── report_generator.py # Module for generating reports │ ├── report_synthesis.py # Module for synthesizing reports │ ├── document_processor.py # Module for processing documents │ ├── document_scraper.py # Module for scraping documents │ ├── report_detail_levels.py # Module for managing report detail levels │ └── database/ # Database for storing reports │ ├── __init__.py │ └── db_manager.py # Module for managing the database ├── ui/ │ ├── __init__.py │ └── gradio_interface.py # Gradio-based web interface ├── utils/ │ ├── __init__.py │ ├── jina_similarity.py # Module for computing text similarity │ └── markdown_segmenter.py # Module for segmenting markdown documents ├── scripts/ │ └── query_to_report.py # Script for generating reports from queries ├── tests/ │ ├── __init__.py │ ├── query/ # Tests for query module │ │ ├── __init__.py │ │ ├── test_query_processor.py │ │ ├── test_query_processor_comprehensive.py │ │ └── test_llm_interface.py │ ├── execution/ # Tests for execution module │ │ ├── __init__.py │ │ ├── test_search.py │ │ ├── test_search_execution.py │ │ └── test_all_handlers.py │ ├── ranking/ # Tests for ranking module │ │ ├── __init__.py │ │ ├── test_reranker.py │ │ ├── test_similarity.py │ │ └── test_simple_reranker.py │ ├── report/ # Tests for report module │ │ ├── __init__.py │ │ ├── test_custom_model.py │ │ └── test_detail_levels.py │ ├── ui/ # Tests for UI module │ │ ├── __init__.py │ │ └── test_ui_search.py │ ├── integration/ # Integration tests │ │ ├── __init__.py │ │ ├── test_ev_query.py │ │ └── test_query_to_report.py │ ├── test_document_processor.py │ ├── test_document_scraper.py │ └── test_report_synthesis.py ├── examples/ │ ├── __init__.py │ ├── data/ # Example data files │ └── scripts/ # Example scripts │ └── __init__.py ├── run_ui.py # Script to run the UI └── requirements.txt # Project dependencies ``` ## Module Details ### Config Module The `config` module manages configuration settings for the entire system, including API keys, model selections, and other parameters. ### Files - `__init__.py`: Package initialization file - `config.py`: Configuration management class - `config.yaml`: YAML configuration file with settings for different components ### Classes - `Config`: Singleton class for loading and accessing configuration settings - `load_config(config_path)`: Loads configuration from a YAML file - `get(key, default=None)`: Gets a configuration value by key ### Query Module The `query` module handles the processing and enhancement of user queries, including classification and optimization for search. ### Files - `__init__.py`: Package initialization file - `query_processor.py`: Main module for processing user queries - `query_classifier.py`: Module for classifying query types - `llm_interface.py`: Interface for interacting with LLM providers ### Classes - `QueryProcessor`: Main class for processing user queries - `process_query(query)`: Processes a user query and returns enhanced results - `classify_query(query)`: Classifies a query by type and intent - `generate_search_queries(query, classification)`: Generates optimized search queries - `QueryClassifier`: Class for classifying queries - `classify(query)`: Classifies a query by type, intent, and entities - `LLMInterface`: Interface for interacting with LLM providers - `get_completion(prompt, model=None)`: Gets a completion from an LLM - `enhance_query(query)`: Enhances a query with additional context - `classify_query(query)`: Uses an LLM to classify a query ### Execution Module The `execution` module handles the execution of search queries across multiple search engines and the collection of results. ### Files - `__init__.py`: Package initialization file - `search_executor.py`: Module for executing search queries - `result_collector.py`: Module for collecting and processing search results - `api_handlers/`: Directory containing handlers for different search APIs - `__init__.py`: Package initialization file - `base_handler.py`: Base class for search handlers - `serper_handler.py`: Handler for Serper API (Google search) - `scholar_handler.py`: Handler for Google Scholar via Serper - `arxiv_handler.py`: Handler for arXiv API ### Classes - `SearchExecutor`: Class for executing search queries - `execute_search(query_data)`: Executes a search across multiple engines - `_execute_search_async(query, engines)`: Executes a search asynchronously - `_execute_search_sync(query, engines)`: Executes a search synchronously - `ResultCollector`: Class for collecting and processing search results - `process_results(search_results)`: Processes search results from multiple engines - `deduplicate_results(results)`: Deduplicates results based on URL - `save_results(results, file_path)`: Saves results to a file - `BaseSearchHandler`: Base class for search handlers - `search(query, num_results)`: Abstract method for searching - `_process_response(response)`: Processes the API response - `SerperSearchHandler`: Handler for Serper API - `search(query, num_results)`: Searches using Serper API - `_process_response(response)`: Processes the Serper API response - `ScholarSearchHandler`: Handler for Google Scholar via Serper - `search(query, num_results)`: Searches Google Scholar - `_process_response(response)`: Processes the Scholar API response - `ArxivSearchHandler`: Handler for arXiv API - `search(query, num_results)`: Searches arXiv - `_process_response(response)`: Processes the arXiv API response ### Ranking Module The `ranking` module provides functionality for reranking and prioritizing documents based on their relevance to the user's query. ### Files - `__init__.py`: Package initialization file - `jina_reranker.py`: Module for reranking documents using Jina AI - `filter_manager.py`: Module for filtering documents ### Classes - `JinaReranker`: Class for reranking documents - `rerank(documents, query)`: Reranks documents based on relevance to query - `_prepare_inputs(documents, query)`: Prepares inputs for the reranker - `FilterManager`: Class for filtering documents - `filter_by_date(documents, start_date, end_date)`: Filters by date - `filter_by_source(documents, sources)`: Filters by source ## Recent Updates ### 2025-02-28: Async Implementation and Reference Formatting 1. **LLM Interface Updates**: - Converted key methods to async: - `generate_completion` - `classify_query` - `enhance_query` - `generate_search_queries` - Added special handling for Gemini models - Improved reference formatting instructions 2. **Query Processor Updates**: - Updated `process_query` to be async - Made `generate_search_queries` async - Fixed async/await patterns throughout 3. **Gradio Interface Updates**: - Modified `generate_report` to handle async operations - Updated report button click handler - Improved error handling