5.9 KiB
Code Structure
Current Project Organization
sim-search/
├── config/
│ ├── __init__.py
│ ├── config.py # Configuration management
│ └── config.yaml # Configuration file
├── query/
│ ├── __init__.py
│ ├── query_processor.py # Module for processing user queries
│ ├── query_classifier.py # Module for classifying query types
│ └── llm_interface.py # Module for interacting with LLM providers
├── execution/
│ ├── __init__.py
│ ├── search_executor.py # Module for executing search queries
│ ├── result_collector.py # Module for collecting search results
│ └── api_handlers/ # Handlers for different search APIs
│ ├── __init__.py
│ ├── base_handler.py # Base class for search handlers
│ ├── serper_handler.py # Handler for Serper API (Google search)
│ ├── scholar_handler.py # Handler for Google Scholar via Serper
│ └── arxiv_handler.py # Handler for arXiv API
├── ranking/
│ ├── __init__.py
│ ├── jina_reranker.py # Module for reranking documents using Jina AI
│ └── filter_manager.py # Module for filtering documents
├── test_search_execution.py # Test script for search execution
├── test_all_handlers.py # Test script for all search handlers
├── requirements.txt # Project dependencies
└── search_execution_test_results.json # Test results
Module Details
Config Module
The config
module manages configuration settings for the entire system, including API keys, model selections, and other parameters.
Files
__init__.py
: Package initialization fileconfig.py
: Configuration management classconfig.yaml
: YAML configuration file with settings for different components
Classes
Config
: Singleton class for loading and accessing configuration settingsload_config(config_path)
: Loads configuration from a YAML fileget(key, default=None)
: Gets a configuration value by key
Query Module
The query
module handles the processing and enhancement of user queries, including classification and optimization for search.
Files
__init__.py
: Package initialization filequery_processor.py
: Main module for processing user queriesquery_classifier.py
: Module for classifying query typesllm_interface.py
: Interface for interacting with LLM providers
Classes
-
QueryProcessor
: Main class for processing user queriesprocess_query(query)
: Processes a user query and returns enhanced resultsclassify_query(query)
: Classifies a query by type and intentgenerate_search_queries(query, classification)
: Generates optimized search queries
-
QueryClassifier
: Class for classifying queriesclassify(query)
: Classifies a query by type, intent, and entities
-
LLMInterface
: Interface for interacting with LLM providersget_completion(prompt, model=None)
: Gets a completion from an LLMenhance_query(query)
: Enhances a query with additional contextclassify_query(query)
: Uses an LLM to classify a query
Execution Module
The execution
module handles the execution of search queries across multiple search engines and the collection of results.
Files
__init__.py
: Package initialization filesearch_executor.py
: Module for executing search queriesresult_collector.py
: Module for collecting and processing search resultsapi_handlers/
: Directory containing handlers for different search APIs__init__.py
: Package initialization filebase_handler.py
: Base class for search handlersserper_handler.py
: Handler for Serper API (Google search)scholar_handler.py
: Handler for Google Scholar via Serperarxiv_handler.py
: Handler for arXiv API
Classes
-
SearchExecutor
: Class for executing search queriesexecute_search(query_data)
: Executes a search across multiple engines_execute_search_async(query, engines)
: Executes a search asynchronously_execute_search_sync(query, engines)
: Executes a search synchronously
-
ResultCollector
: Class for collecting and processing search resultsprocess_results(search_results)
: Processes search results from multiple enginesdeduplicate_results(results)
: Deduplicates results based on URLsave_results(results, file_path)
: Saves results to a file
-
BaseSearchHandler
: Base class for search handlerssearch(query, num_results)
: Abstract method for searching_process_response(response)
: Processes the API response
-
SerperSearchHandler
: Handler for Serper APIsearch(query, num_results)
: Searches using Serper API_process_response(response)
: Processes the Serper API response
-
ScholarSearchHandler
: Handler for Google Scholar via Serpersearch(query, num_results)
: Searches Google Scholar_process_response(response)
: Processes the Scholar API response
-
ArxivSearchHandler
: Handler for arXiv APIsearch(query, num_results)
: Searches arXiv_process_response(response)
: Processes the arXiv API response
Ranking Module
The ranking
module provides functionality for reranking and prioritizing documents based on their relevance to the user's query.
Files
__init__.py
: Package initialization filejina_reranker.py
: Module for reranking documents using Jina AIfilter_manager.py
: Module for filtering documents
Classes
-
JinaReranker
: Class for reranking documentsrerank(documents, query)
: Reranks documents based on relevance to query_prepare_inputs(documents, query)
: Prepares inputs for the reranker
-
FilterManager
: Class for filtering documentsfilter_by_date(documents, start_date, end_date)
: Filters by datefilter_by_source(documents, sources)
: Filters by source