12 KiB
Code Structure
Current Project Organization
project/
│
├── examples/ # Sample data and query examples
├── report/ # Report generation module
│ ├── __init__.py
│ ├── report_generator.py # Module for generating reports
│ ├── report_synthesis.py # Module for synthesizing reports
│ ├── document_processor.py # Module for processing documents
│ ├── document_scraper.py # Module for scraping documents
│ ├── report_detail_levels.py # Module for managing report detail levels
│ ├── report_templates.py # Module for managing report templates
│ └── database/ # Database for storing reports
│ ├── __init__.py
│ └── db_manager.py # Module for managing the database
├── tests/ # Test suite
│ ├── __init__.py
│ ├── execution/ # Search execution tests
│ │ ├── __init__.py
│ │ ├── test_search.py
│ │ ├── test_search_execution.py
│ │ └── test_all_handlers.py
│ ├── integration/ # Integration tests
│ │ ├── __init__.py
│ │ ├── test_ev_query.py
│ │ └── test_query_to_report.py
│ ├── query/ # Query processing tests
│ │ ├── __init__.py
│ │ ├── test_query_processor.py
│ │ ├── test_query_processor_comprehensive.py
│ │ └── test_llm_interface.py
│ ├── ranking/ # Ranking algorithm tests
│ │ ├── __init__.py
│ │ ├── test_reranker.py
│ │ ├── test_similarity.py
│ │ └── test_simple_reranker.py
│ ├── report/ # Report generation tests
│ │ ├── __init__.py
│ │ ├── test_custom_model.py
│ │ ├── test_detail_levels.py
│ │ ├── test_brief_report.py
│ │ └── test_report_templates.py
│ ├── ui/ # UI component tests
│ │ ├── __init__.py
│ │ └── test_ui_search.py
│ ├── test_document_processor.py
│ ├── test_document_scraper.py
│ └── test_report_synthesis.py
├── utils/ # Utility scripts and shared functions
│ ├── __init__.py
│ ├── jina_similarity.py # Module for computing text similarity
│ └── markdown_segmenter.py # Module for segmenting markdown documents
├── config/ # Configuration management
│ ├── __init__.py
│ ├── config.py # Configuration management class
│ └── config.yaml # YAML configuration file with settings for different components
├── query/ # Query processing module
│ ├── __init__.py
│ ├── query_processor.py # Module for processing user queries
│ └── llm_interface.py # Module for interacting with LLM providers
├── execution/ # Search execution module
│ ├── __init__.py
│ ├── search_executor.py # Module for executing search queries
│ ├── result_collector.py # Module for collecting search results
│ └── api_handlers/ # Handlers for different search APIs
│ ├── __init__.py
│ ├── base_handler.py # Base class for search handlers
│ ├── serper_handler.py # Handler for Serper API (Google search)
│ ├── scholar_handler.py # Handler for Google Scholar via Serper
│ ├── google_handler.py # Handler for Google search
│ └── arxiv_handler.py # Handler for arXiv API
├── ranking/ # Ranking module
│ ├── __init__.py
│ └── jina_reranker.py # Module for reranking documents using Jina AI
├── ui/ # UI module
│ ├── __init__.py
│ └── gradio_interface.py # Gradio-based web interface
├── scripts/ # Scripts
│ └── query_to_report.py # Script for generating reports from queries
├── run_ui.py # Script to run the UI
└── requirements.txt # Project dependencies
Module Details
Config Module
The config
module manages configuration settings for the entire system, including API keys, model selections, and other parameters.
Files
__init__.py
: Package initialization fileconfig.py
: Configuration management classconfig.yaml
: YAML configuration file with settings for different components
Classes
Config
: Singleton class for loading and accessing configuration settingsload_config(config_path)
: Loads configuration from a YAML fileget(key, default=None)
: Gets a configuration value by key
Query Module
The query
module handles the processing and enhancement of user queries, including classification and optimization for search.
Files
__init__.py
: Package initialization filequery_processor.py
: Main module for processing user queriesquery_classifier.py
: Module for classifying query typesllm_interface.py
: Interface for interacting with LLM providers
Classes
-
QueryProcessor
: Main class for processing user queriesprocess_query(query)
: Processes a user query and returns enhanced resultsclassify_query(query)
: Classifies a query by type and intentgenerate_search_queries(query, classification)
: Generates optimized search queries
-
QueryClassifier
: Class for classifying queriesclassify(query)
: Classifies a query by type, intent, and entities
-
LLMInterface
: Interface for interacting with LLM providersget_completion(prompt, model=None)
: Gets a completion from an LLMenhance_query(query)
: Enhances a query with additional contextclassify_query(query)
: Uses an LLM to classify a query
Execution Module
The execution
module handles the execution of search queries across multiple search engines and the collection of results.
Files
__init__.py
: Package initialization filesearch_executor.py
: Module for executing search queriesresult_collector.py
: Module for collecting and processing search resultsapi_handlers/
: Directory containing handlers for different search APIs__init__.py
: Package initialization filebase_handler.py
: Base class for search handlersserper_handler.py
: Handler for Serper API (Google search)scholar_handler.py
: Handler for Google Scholar via Serperarxiv_handler.py
: Handler for arXiv API
Classes
-
SearchExecutor
: Class for executing search queriesexecute_search(query_data)
: Executes a search across multiple engines_execute_search_async(query, engines)
: Executes a search asynchronously_execute_search_sync(query, engines)
: Executes a search synchronously
-
ResultCollector
: Class for collecting and processing search resultsprocess_results(search_results)
: Processes search results from multiple enginesdeduplicate_results(results)
: Deduplicates results based on URLsave_results(results, file_path)
: Saves results to a file
-
BaseSearchHandler
: Base class for search handlerssearch(query, num_results)
: Abstract method for searching_process_response(response)
: Processes the API response
-
SerperSearchHandler
: Handler for Serper APIsearch(query, num_results)
: Searches using Serper API_process_response(response)
: Processes the Serper API response
-
ScholarSearchHandler
: Handler for Google Scholar via Serpersearch(query, num_results)
: Searches Google Scholar_process_response(response)
: Processes the Scholar API response
-
ArxivSearchHandler
: Handler for arXiv APIsearch(query, num_results)
: Searches arXiv_process_response(response)
: Processes the arXiv API response
Ranking Module
The ranking
module provides functionality for reranking and prioritizing documents based on their relevance to the user's query.
Files
__init__.py
: Package initialization filejina_reranker.py
: Module for reranking documents using Jina AIfilter_manager.py
: Module for filtering documents
Classes
-
JinaReranker
: Class for reranking documentsrerank(documents, query)
: Reranks documents based on relevance to query_prepare_inputs(documents, query)
: Prepares inputs for the reranker
-
FilterManager
: Class for filtering documentsfilter_by_date(documents, start_date, end_date)
: Filters by datefilter_by_source(documents, sources)
: Filters by source
Report Templates Module
The report_templates
module provides a template system for generating reports with different detail levels and query types.
Files
__init__.py
: Package initialization filereport_templates.py
: Module for managing report templates
Classes
-
QueryType
(Enum): Defines the types of queries supported by the systemFACTUAL
: For factual queries seeking specific informationEXPLORATORY
: For exploratory queries investigating a topicCOMPARATIVE
: For comparative queries comparing multiple items
-
DetailLevel
(Enum): Defines the levels of detail for generated reportsBRIEF
: Short summary with key findingsSTANDARD
: Standard report with introduction, key findings, and analysisDETAILED
: Detailed report with methodology and more in-depth analysisCOMPREHENSIVE
: Comprehensive report with executive summary, literature review, and appendices
-
ReportTemplate
: Class representing a report templatetemplate
(str): The template string with placeholdersdetail_level
(DetailLevel): The detail level of the templatequery_type
(QueryType): The query type the template is designed formodel
(Optional[str]): The LLM model recommended for this templaterequired_sections
(Optional[List[str]]): Required sections in the templatevalidate()
: Validates that the template contains all required sections
-
ReportTemplateManager
: Class for managing report templatesadd_template(template)
: Adds a template to the managerget_template(query_type, detail_level)
: Gets a template for a specific query type and detail levelget_available_templates()
: Gets a list of available templatesinitialize_default_templates()
: Initializes the default templates for all combinations of query types and detail levels
Recent Updates
2025-03-11: Report Templates Implementation
-
Report Templates Module:
- Created a new module
report_templates.py
for managing report templates - Implemented enums for query types (FACTUAL, EXPLORATORY, COMPARATIVE) and detail levels (BRIEF, STANDARD, DETAILED, COMPREHENSIVE)
- Created a template system with placeholders for different report sections
- Implemented 12 different templates (3 query types × 4 detail levels)
- Added validation to ensure templates contain all required sections
- Created a new module
-
Report Synthesis Integration:
- Updated the report synthesis module to use the new template system
- Added support for different templates based on query type and detail level
- Implemented fallback to standard templates when specific templates are not found
- Added better logging for template retrieval process
-
Testing:
- Created test_report_templates.py to test template retrieval and validation
- Implemented test_brief_report.py to test the brief report generation
- Successfully tested all combinations of detail levels and query types
2025-02-28: Async Implementation and Reference Formatting
-
LLM Interface Updates:
- Converted key methods to async:
generate_completion
classify_query
enhance_query
generate_search_queries
- Added special handling for Gemini models
- Improved reference formatting instructions
- Converted key methods to async:
-
Query Processor Updates:
- Updated
process_query
to be async - Made
generate_search_queries
async - Fixed async/await patterns throughout
- Updated
-
Gradio Interface Updates:
- Modified
generate_report
to handle async operations - Updated report button click handler
- Improved error handling
- Modified