8.8 KiB

Raw Blame History

Code Structure

Current Project Organization

sim-search/
├── config/
│   ├── __init__.py
│   ├── config.py              # Configuration management
│   └── config.yaml            # Configuration file
├── query/
│   ├── __init__.py
│   ├── query_processor.py     # Module for processing user queries
│   └── llm_interface.py       # Module for interacting with LLM providers
├── execution/
│   ├── __init__.py
│   ├── search_executor.py     # Module for executing search queries
│   ├── result_collector.py    # Module for collecting search results
│   └── api_handlers/          # Handlers for different search APIs
│       ├── __init__.py
│       ├── base_handler.py    # Base class for search handlers
│       ├── serper_handler.py  # Handler for Serper API (Google search)
│       ├── scholar_handler.py # Handler for Google Scholar via Serper
│       ├── google_handler.py  # Handler for Google search
│       └── arxiv_handler.py   # Handler for arXiv API
├── ranking/
│   ├── __init__.py
│   └── jina_reranker.py       # Module for reranking documents using Jina AI
├── report/
│   ├── __init__.py
│   ├── report_generator.py    # Module for generating reports
│   ├── report_synthesis.py    # Module for synthesizing reports
│   ├── document_processor.py  # Module for processing documents
│   ├── document_scraper.py    # Module for scraping documents
│   ├── report_detail_levels.py # Module for managing report detail levels
│   └── database/              # Database for storing reports
│       ├── __init__.py
│       └── db_manager.py      # Module for managing the database
├── ui/
│   ├── __init__.py
│   └── gradio_interface.py    # Gradio-based web interface
├── utils/
│   ├── __init__.py
│   ├── jina_similarity.py     # Module for computing text similarity
│   └── markdown_segmenter.py  # Module for segmenting markdown documents
├── scripts/
│   └── query_to_report.py     # Script for generating reports from queries
├── tests/
│   ├── __init__.py
│   ├── query/                 # Tests for query module
│   │   ├── __init__.py
│   │   ├── test_query_processor.py
│   │   ├── test_query_processor_comprehensive.py
│   │   └── test_llm_interface.py
│   ├── execution/             # Tests for execution module
│   │   ├── __init__.py
│   │   ├── test_search.py
│   │   ├── test_search_execution.py
│   │   └── test_all_handlers.py
│   ├── ranking/               # Tests for ranking module
│   │   ├── __init__.py
│   │   ├── test_reranker.py
│   │   ├── test_similarity.py
│   │   └── test_simple_reranker.py
│   ├── report/                # Tests for report module
│   │   ├── __init__.py
│   │   ├── test_custom_model.py
│   │   └── test_detail_levels.py
│   ├── ui/                    # Tests for UI module
│   │   ├── __init__.py
│   │   └── test_ui_search.py
│   ├── integration/           # Integration tests
│   │   ├── __init__.py
│   │   ├── test_ev_query.py
│   │   └── test_query_to_report.py
│   ├── test_document_processor.py
│   ├── test_document_scraper.py
│   └── test_report_synthesis.py
├── examples/
│   ├── __init__.py
│   ├── data/                  # Example data files
│   └── scripts/               # Example scripts
│       └── __init__.py
├── run_ui.py                  # Script to run the UI
└── requirements.txt           # Project dependencies

Module Details

Config Module

The config module manages configuration settings for the entire system, including API keys, model selections, and other parameters.

Files

__init__.py: Package initialization file
config.py: Configuration management class
config.yaml: YAML configuration file with settings for different components

Classes

Config: Singleton class for loading and accessing configuration settings
- load_config(config_path): Loads configuration from a YAML file
- get(key, default=None): Gets a configuration value by key

Query Module

The query module handles the processing and enhancement of user queries, including classification and optimization for search.

Files

__init__.py: Package initialization file
query_processor.py: Main module for processing user queries
query_classifier.py: Module for classifying query types
llm_interface.py: Interface for interacting with LLM providers

Classes

QueryProcessor: Main class for processing user queries
- process_query(query): Processes a user query and returns enhanced results
- classify_query(query): Classifies a query by type and intent
- generate_search_queries(query, classification): Generates optimized search queries
QueryClassifier: Class for classifying queries
- classify(query): Classifies a query by type, intent, and entities
LLMInterface: Interface for interacting with LLM providers
- get_completion(prompt, model=None): Gets a completion from an LLM
- enhance_query(query): Enhances a query with additional context
- classify_query(query): Uses an LLM to classify a query

Execution Module

The execution module handles the execution of search queries across multiple search engines and the collection of results.

Files

__init__.py: Package initialization file
search_executor.py: Module for executing search queries
result_collector.py: Module for collecting and processing search results
api_handlers/: Directory containing handlers for different search APIs
- __init__.py: Package initialization file
- base_handler.py: Base class for search handlers
- serper_handler.py: Handler for Serper API (Google search)
- scholar_handler.py: Handler for Google Scholar via Serper
- arxiv_handler.py: Handler for arXiv API

Classes

SearchExecutor: Class for executing search queries
- execute_search(query_data): Executes a search across multiple engines
- _execute_search_async(query, engines): Executes a search asynchronously
- _execute_search_sync(query, engines): Executes a search synchronously
ResultCollector: Class for collecting and processing search results
- process_results(search_results): Processes search results from multiple engines
- deduplicate_results(results): Deduplicates results based on URL
- save_results(results, file_path): Saves results to a file
BaseSearchHandler: Base class for search handlers
- search(query, num_results): Abstract method for searching
- _process_response(response): Processes the API response
SerperSearchHandler: Handler for Serper API
- search(query, num_results): Searches using Serper API
- _process_response(response): Processes the Serper API response
ScholarSearchHandler: Handler for Google Scholar via Serper
- search(query, num_results): Searches Google Scholar
- _process_response(response): Processes the Scholar API response
ArxivSearchHandler: Handler for arXiv API
- search(query, num_results): Searches arXiv
- _process_response(response): Processes the arXiv API response

Ranking Module

The ranking module provides functionality for reranking and prioritizing documents based on their relevance to the user's query.

Files

__init__.py: Package initialization file
jina_reranker.py: Module for reranking documents using Jina AI
filter_manager.py: Module for filtering documents

Classes

JinaReranker: Class for reranking documents
- rerank(documents, query): Reranks documents based on relevance to query
- _prepare_inputs(documents, query): Prepares inputs for the reranker
FilterManager: Class for filtering documents
- filter_by_date(documents, start_date, end_date): Filters by date
- filter_by_source(documents, sources): Filters by source

Recent Updates

2025-02-28: Async Implementation and Reference Formatting

LLM Interface Updates:
- Converted key methods to async:
  - generate_completion
  - classify_query
  - enhance_query
  - generate_search_queries
- Added special handling for Gemini models
- Improved reference formatting instructions
Query Processor Updates:
- Updated process_query to be async
- Made generate_search_queries async
- Fixed async/await patterns throughout
Gradio Interface Updates:
- Modified generate_report to handle async operations
- Updated report button click handler
- Improved error handling

8.8 KiB Raw Blame History

Code Structure

Current Project Organization

Module Details

Config Module

Files

Classes

Query Module

Files

Classes

Execution Module

Files

Classes

Ranking Module

Files

Classes

Recent Updates

2025-02-28: Async Implementation and Reference Formatting

8.8 KiB

Raw Blame History