8.8 KiB
Code Structure
Current Project Organization
sim-search/
├── config/
│ ├── __init__.py
│ ├── config.py # Configuration management
│ └── config.yaml # Configuration file
├── query/
│ ├── __init__.py
│ ├── query_processor.py # Module for processing user queries
│ └── llm_interface.py # Module for interacting with LLM providers
├── execution/
│ ├── __init__.py
│ ├── search_executor.py # Module for executing search queries
│ ├── result_collector.py # Module for collecting search results
│ └── api_handlers/ # Handlers for different search APIs
│ ├── __init__.py
│ ├── base_handler.py # Base class for search handlers
│ ├── serper_handler.py # Handler for Serper API (Google search)
│ ├── scholar_handler.py # Handler for Google Scholar via Serper
│ ├── google_handler.py # Handler for Google search
│ └── arxiv_handler.py # Handler for arXiv API
├── ranking/
│ ├── __init__.py
│ └── jina_reranker.py # Module for reranking documents using Jina AI
├── report/
│ ├── __init__.py
│ ├── report_generator.py # Module for generating reports
│ ├── report_synthesis.py # Module for synthesizing reports
│ ├── document_processor.py # Module for processing documents
│ ├── document_scraper.py # Module for scraping documents
│ ├── report_detail_levels.py # Module for managing report detail levels
│ └── database/ # Database for storing reports
│ ├── __init__.py
│ └── db_manager.py # Module for managing the database
├── ui/
│ ├── __init__.py
│ └── gradio_interface.py # Gradio-based web interface
├── utils/
│ ├── __init__.py
│ ├── jina_similarity.py # Module for computing text similarity
│ └── markdown_segmenter.py # Module for segmenting markdown documents
├── scripts/
│ └── query_to_report.py # Script for generating reports from queries
├── tests/
│ ├── __init__.py
│ ├── query/ # Tests for query module
│ │ ├── __init__.py
│ │ ├── test_query_processor.py
│ │ ├── test_query_processor_comprehensive.py
│ │ └── test_llm_interface.py
│ ├── execution/ # Tests for execution module
│ │ ├── __init__.py
│ │ ├── test_search.py
│ │ ├── test_search_execution.py
│ │ └── test_all_handlers.py
│ ├── ranking/ # Tests for ranking module
│ │ ├── __init__.py
│ │ ├── test_reranker.py
│ │ ├── test_similarity.py
│ │ └── test_simple_reranker.py
│ ├── report/ # Tests for report module
│ │ ├── __init__.py
│ │ ├── test_custom_model.py
│ │ └── test_detail_levels.py
│ ├── ui/ # Tests for UI module
│ │ ├── __init__.py
│ │ └── test_ui_search.py
│ ├── integration/ # Integration tests
│ │ ├── __init__.py
│ │ ├── test_ev_query.py
│ │ └── test_query_to_report.py
│ ├── test_document_processor.py
│ ├── test_document_scraper.py
│ └── test_report_synthesis.py
├── examples/
│ ├── __init__.py
│ ├── data/ # Example data files
│ └── scripts/ # Example scripts
│ └── __init__.py
├── run_ui.py # Script to run the UI
└── requirements.txt # Project dependencies
Module Details
Config Module
The config
module manages configuration settings for the entire system, including API keys, model selections, and other parameters.
Files
__init__.py
: Package initialization fileconfig.py
: Configuration management classconfig.yaml
: YAML configuration file with settings for different components
Classes
Config
: Singleton class for loading and accessing configuration settingsload_config(config_path)
: Loads configuration from a YAML fileget(key, default=None)
: Gets a configuration value by key
Query Module
The query
module handles the processing and enhancement of user queries, including classification and optimization for search.
Files
__init__.py
: Package initialization filequery_processor.py
: Main module for processing user queriesquery_classifier.py
: Module for classifying query typesllm_interface.py
: Interface for interacting with LLM providers
Classes
-
QueryProcessor
: Main class for processing user queriesprocess_query(query)
: Processes a user query and returns enhanced resultsclassify_query(query)
: Classifies a query by type and intentgenerate_search_queries(query, classification)
: Generates optimized search queries
-
QueryClassifier
: Class for classifying queriesclassify(query)
: Classifies a query by type, intent, and entities
-
LLMInterface
: Interface for interacting with LLM providersget_completion(prompt, model=None)
: Gets a completion from an LLMenhance_query(query)
: Enhances a query with additional contextclassify_query(query)
: Uses an LLM to classify a query
Execution Module
The execution
module handles the execution of search queries across multiple search engines and the collection of results.
Files
__init__.py
: Package initialization filesearch_executor.py
: Module for executing search queriesresult_collector.py
: Module for collecting and processing search resultsapi_handlers/
: Directory containing handlers for different search APIs__init__.py
: Package initialization filebase_handler.py
: Base class for search handlersserper_handler.py
: Handler for Serper API (Google search)scholar_handler.py
: Handler for Google Scholar via Serperarxiv_handler.py
: Handler for arXiv API
Classes
-
SearchExecutor
: Class for executing search queriesexecute_search(query_data)
: Executes a search across multiple engines_execute_search_async(query, engines)
: Executes a search asynchronously_execute_search_sync(query, engines)
: Executes a search synchronously
-
ResultCollector
: Class for collecting and processing search resultsprocess_results(search_results)
: Processes search results from multiple enginesdeduplicate_results(results)
: Deduplicates results based on URLsave_results(results, file_path)
: Saves results to a file
-
BaseSearchHandler
: Base class for search handlerssearch(query, num_results)
: Abstract method for searching_process_response(response)
: Processes the API response
-
SerperSearchHandler
: Handler for Serper APIsearch(query, num_results)
: Searches using Serper API_process_response(response)
: Processes the Serper API response
-
ScholarSearchHandler
: Handler for Google Scholar via Serpersearch(query, num_results)
: Searches Google Scholar_process_response(response)
: Processes the Scholar API response
-
ArxivSearchHandler
: Handler for arXiv APIsearch(query, num_results)
: Searches arXiv_process_response(response)
: Processes the arXiv API response
Ranking Module
The ranking
module provides functionality for reranking and prioritizing documents based on their relevance to the user's query.
Files
__init__.py
: Package initialization filejina_reranker.py
: Module for reranking documents using Jina AIfilter_manager.py
: Module for filtering documents
Classes
-
JinaReranker
: Class for reranking documentsrerank(documents, query)
: Reranks documents based on relevance to query_prepare_inputs(documents, query)
: Prepares inputs for the reranker
-
FilterManager
: Class for filtering documentsfilter_by_date(documents, start_date, end_date)
: Filters by datefilter_by_source(documents, sources)
: Filters by source
Recent Updates
2025-02-28: Async Implementation and Reference Formatting
-
LLM Interface Updates:
- Converted key methods to async:
generate_completion
classify_query
enhance_query
generate_search_queries
- Added special handling for Gemini models
- Improved reference formatting instructions
- Converted key methods to async:
-
Query Processor Updates:
- Updated
process_query
to be async - Made
generate_search_queries
async - Fixed async/await patterns throughout
- Updated
-
Gradio Interface Updates:
- Modified
generate_report
to handle async operations - Updated report button click handler
- Improved error handling
- Modified