# Code Structure

## Current Project Organization

```
sim-search/
├── config/
│   ├── __init__.py
│   ├── config.py              # Configuration management
│   └── config.yaml            # Configuration file
├── query/
│   ├── __init__.py
│   ├── query_processor.py     # Module for processing user queries
│   └── llm_interface.py       # Module for interacting with LLM providers
├── execution/
│   ├── __init__.py
│   ├── search_executor.py     # Module for executing search queries
│   ├── result_collector.py    # Module for collecting search results
│   └── api_handlers/          # Handlers for different search APIs
│       ├── __init__.py
│       ├── base_handler.py    # Base class for search handlers
│       ├── serper_handler.py  # Handler for Serper API (Google search)
│       ├── scholar_handler.py # Handler for Google Scholar via Serper
│       ├── google_handler.py  # Handler for Google search
│       └── arxiv_handler.py   # Handler for arXiv API
├── ranking/
│   ├── __init__.py
│   └── jina_reranker.py       # Module for reranking documents using Jina AI
├── report/
│   ├── __init__.py
│   ├── report_generator.py    # Module for generating reports
│   ├── report_synthesis.py    # Module for synthesizing reports
│   ├── document_processor.py  # Module for processing documents
│   ├── document_scraper.py    # Module for scraping documents
│   ├── report_detail_levels.py # Module for managing report detail levels
│   └── database/              # Database for storing reports
│       ├── __init__.py
│       └── db_manager.py      # Module for managing the database
├── ui/
│   ├── __init__.py
│   └── gradio_interface.py    # Gradio-based web interface
├── utils/
│   ├── __init__.py
│   ├── jina_similarity.py     # Module for computing text similarity
│   └── markdown_segmenter.py  # Module for segmenting markdown documents
├── scripts/
│   └── query_to_report.py     # Script for generating reports from queries
├── tests/
│   ├── __init__.py
│   ├── query/                 # Tests for query module
│   │   ├── __init__.py
│   │   ├── test_query_processor.py
│   │   ├── test_query_processor_comprehensive.py
│   │   └── test_llm_interface.py
│   ├── execution/             # Tests for execution module
│   │   ├── __init__.py
│   │   ├── test_search.py
│   │   ├── test_search_execution.py
│   │   └── test_all_handlers.py
│   ├── ranking/               # Tests for ranking module
│   │   ├── __init__.py
│   │   ├── test_reranker.py
│   │   ├── test_similarity.py
│   │   └── test_simple_reranker.py
│   ├── report/                # Tests for report module
│   │   ├── __init__.py
│   │   ├── test_custom_model.py
│   │   └── test_detail_levels.py
│   ├── ui/                    # Tests for UI module
│   │   ├── __init__.py
│   │   └── test_ui_search.py
│   ├── integration/           # Integration tests
│   │   ├── __init__.py
│   │   ├── test_ev_query.py
│   │   └── test_query_to_report.py
│   ├── test_document_processor.py
│   ├── test_document_scraper.py
│   └── test_report_synthesis.py
├── examples/
│   ├── __init__.py
│   ├── data/                  # Example data files
│   └── scripts/               # Example scripts
│       └── __init__.py
├── run_ui.py                  # Script to run the UI
└── requirements.txt           # Project dependencies
```

## Module Details

### Config Module

The `config` module manages configuration settings for the entire system, including API keys, model selections, and other parameters.

### Files

- `__init__.py`: Package initialization file
- `config.py`: Configuration management class
- `config.yaml`: YAML configuration file with settings for different components

### Classes

- `Config`: Singleton class for loading and accessing configuration settings
  - `load_config(config_path)`: Loads configuration from a YAML file
  - `get(key, default=None)`: Gets a configuration value by key

### Query Module

The `query` module handles the processing and enhancement of user queries, including classification and optimization for search.

### Files

- `__init__.py`: Package initialization file
- `query_processor.py`: Main module for processing user queries
- `query_classifier.py`: Module for classifying query types
- `llm_interface.py`: Interface for interacting with LLM providers

### Classes

- `QueryProcessor`: Main class for processing user queries
  - `process_query(query)`: Processes a user query and returns enhanced results
  - `classify_query(query)`: Classifies a query by type and intent
  - `generate_search_queries(query, classification)`: Generates optimized search queries

- `QueryClassifier`: Class for classifying queries
  - `classify(query)`: Classifies a query by type, intent, and entities

- `LLMInterface`: Interface for interacting with LLM providers
  - `get_completion(prompt, model=None)`: Gets a completion from an LLM
  - `enhance_query(query)`: Enhances a query with additional context
  - `classify_query(query)`: Uses an LLM to classify a query

### Execution Module

The `execution` module handles the execution of search queries across multiple search engines and the collection of results.

### Files

- `__init__.py`: Package initialization file
- `search_executor.py`: Module for executing search queries
- `result_collector.py`: Module for collecting and processing search results
- `api_handlers/`: Directory containing handlers for different search APIs
  - `__init__.py`: Package initialization file
  - `base_handler.py`: Base class for search handlers
  - `serper_handler.py`: Handler for Serper API (Google search)
  - `scholar_handler.py`: Handler for Google Scholar via Serper
  - `arxiv_handler.py`: Handler for arXiv API

### Classes

- `SearchExecutor`: Class for executing search queries
  - `execute_search(query_data)`: Executes a search across multiple engines
  - `_execute_search_async(query, engines)`: Executes a search asynchronously
  - `_execute_search_sync(query, engines)`: Executes a search synchronously

- `ResultCollector`: Class for collecting and processing search results
  - `process_results(search_results)`: Processes search results from multiple engines
  - `deduplicate_results(results)`: Deduplicates results based on URL
  - `save_results(results, file_path)`: Saves results to a file

- `BaseSearchHandler`: Base class for search handlers
  - `search(query, num_results)`: Abstract method for searching
  - `_process_response(response)`: Processes the API response

- `SerperSearchHandler`: Handler for Serper API
  - `search(query, num_results)`: Searches using Serper API
  - `_process_response(response)`: Processes the Serper API response

- `ScholarSearchHandler`: Handler for Google Scholar via Serper
  - `search(query, num_results)`: Searches Google Scholar
  - `_process_response(response)`: Processes the Scholar API response

- `ArxivSearchHandler`: Handler for arXiv API
  - `search(query, num_results)`: Searches arXiv
  - `_process_response(response)`: Processes the arXiv API response

### Ranking Module

The `ranking` module provides functionality for reranking and prioritizing documents based on their relevance to the user's query.

### Files

- `__init__.py`: Package initialization file
- `jina_reranker.py`: Module for reranking documents using Jina AI
- `filter_manager.py`: Module for filtering documents

### Classes

- `JinaReranker`: Class for reranking documents
  - `rerank(documents, query)`: Reranks documents based on relevance to query
  - `_prepare_inputs(documents, query)`: Prepares inputs for the reranker

- `FilterManager`: Class for filtering documents
  - `filter_by_date(documents, start_date, end_date)`: Filters by date
  - `filter_by_source(documents, sources)`: Filters by source

## Recent Updates

### 2025-02-28: Async Implementation and Reference Formatting

1. **LLM Interface Updates**:
   - Converted key methods to async:
     - `generate_completion`
     - `classify_query`
     - `enhance_query`
     - `generate_search_queries`
   - Added special handling for Gemini models
   - Improved reference formatting instructions

2. **Query Processor Updates**:
   - Updated `process_query` to be async
   - Made `generate_search_queries` async
   - Fixed async/await patterns throughout

3. **Gradio Interface Updates**:
   - Modified `generate_report` to handle async operations
   - Updated report button click handler
   - Improved error handling