Update project documentation and memory bank entries. Add new integration tests for query classification.

2025-03-20 08:14:53 -05:00 · 2025-03-20 08:14:53 -05:00 · ec285c03d4
parent 98754dfdcc
commit ec285c03d4
12 changed files with 1388 additions and 683 deletions
--- a/.note/decision_log.md
+++ b/.note/decision_log.md
@ -439,3 +439,59 @@ Implemented and tested successfully with both sample data and real URLs.
 - Added duplicate URL fields in the context to ensure URLs are captured
 - Updated the reference generation prompt to explicitly request URLs
 - Added a separate reference generation step to handle truncated references
+
+## 2025-03-18: LLM-Based Query Classification Implementation
+
+### Context
+The project was using a keyword-based approach to classify queries into different domains (academic, code, current events). This approach had several limitations:
+- Reliance on static keyword lists that needed constant maintenance
+- Inability to understand the semantic meaning of queries
+- False classifications for ambiguous queries or those containing keywords with multiple meanings
+- Difficulty handling emerging topics without updating keyword lists
+
+### Decision
+1. Replace the keyword-based query classification with an LLM-based approach:
+   - Implement a new `classify_query_domain` method in the `LLMInterface` class
+   - Create a new query structuring method that uses the LLM classification results
+   - Retain the keyword-based method as a fallback
+   - Add confidence scores and reasoning to the classification results
+
+2. Enhance the structured query format:
+   - Add primary domain and confidence
+   - Include secondary domains with confidence scores
+   - Add classification reasoning
+   - Maintain backward compatibility with existing search executor
+
+3. Use a 0.3 confidence threshold for secondary domains:
+   - Set domain flags (is_academic, is_code, is_current_events) based on primary domain
+   - Also set flags for secondary domains with confidence scores above 0.3
+
+### Rationale
+- LLM-based approach provides better semantic understanding of queries
+- Multi-domain classification with confidence scores handles complex queries better
+- Self-explaining classifications with reasoning aids debugging and transparency
+- The approach automatically adapts to new topics without code changes
+- Retaining keyword-based fallback ensures system resilience
+
+### Alternatives Considered
+1. Expanding the keyword lists:
+   - Would still lack semantic understanding
+   - Increasing maintenance burden
+   - False positives would still occur
+   
+2. Using embedding similarity to predefined domain descriptions:
+   - Potentially more computationally expensive
+   - Less explainable than the LLM's reasoning
+   - Would require managing embedding models
+
+3. Creating a custom classifier:
+   - Would require labeled training data
+   - More development effort
+   - Less flexible than the LLM approach
+
+### Impact
+- More accurate query classification, especially for ambiguous or multi-domain queries
+- Reduction in maintenance overhead for keyword lists
+- Better search engine selection based on query domains
+- Improved report generation due to more accurate query understanding
+- Enhanced debugging capabilities with classification reasoning
--- a/.note/decision_log_addition.md
+++ b/.note/decision_log_addition.md
@ -0,0 +1,56 @@
+
+## 2025-03-18: LLM-Based Query Classification Implementation
+
+### Context
+The project was using a keyword-based approach to classify queries into different domains (academic, code, current events). This approach had several limitations:
+- Reliance on static keyword lists that needed constant maintenance
+- Inability to understand the semantic meaning of queries
+- False classifications for ambiguous queries or those containing keywords with multiple meanings
+- Difficulty handling emerging topics without updating keyword lists
+
+### Decision
+1. Replace the keyword-based query classification with an LLM-based approach:
+   - Implement a new `classify_query_domain` method in the `LLMInterface` class
+   - Create a new query structuring method that uses the LLM classification results
+   - Retain the keyword-based method as a fallback
+   - Add confidence scores and reasoning to the classification results
+
+2. Enhance the structured query format:
+   - Add primary domain and confidence
+   - Include secondary domains with confidence scores
+   - Add classification reasoning
+   - Maintain backward compatibility with existing search executor
+
+3. Use a 0.3 confidence threshold for secondary domains:
+   - Set domain flags (is_academic, is_code, is_current_events) based on primary domain
+   - Also set flags for secondary domains with confidence scores above 0.3
+
+### Rationale
+- LLM-based approach provides better semantic understanding of queries
+- Multi-domain classification with confidence scores handles complex queries better
+- Self-explaining classifications with reasoning aids debugging and transparency
+- The approach automatically adapts to new topics without code changes
+- Retaining keyword-based fallback ensures system resilience
+
+### Alternatives Considered
+1. Expanding the keyword lists:
+   - Would still lack semantic understanding
+   - Increasing maintenance burden
+   - False positives would still occur
+   
+2. Using embedding similarity to predefined domain descriptions:
+   - Potentially more computationally expensive
+   - Less explainable than the LLM's reasoning
+   - Would require managing embedding models
+
+3. Creating a custom classifier:
+   - Would require labeled training data
+   - More development effort
+   - Less flexible than the LLM approach
+
+### Impact
+- More accurate query classification, especially for ambiguous or multi-domain queries
+- Reduction in maintenance overhead for keyword lists
+- Better search engine selection based on query domains
+- Improved report generation due to more accurate query understanding
+- Enhanced debugging capabilities with classification reasoning
--- a/.note/llm_query_classification_plan.md
+++ b/.note/llm_query_classification_plan.md
@ -0,0 +1,397 @@
+# LLM-Based Query Classification Implementation Plan
+
+## Overview
+
+This document outlines a plan to replace the current keyword-based query classification system with an LLM-based approach. The current system uses predefined keyword lists to determine if a query is academic, code-related, or about current events. This approach is limited by the static nature of the keywords and doesn't capture the semantic meaning of queries. Switching to an LLM-based classification will provide more accurate and adaptable query typing.
+
+## Current Limitations
+
+1. **Keyword Dependency**: 
+   - The system relies on static lists of keywords that need constant updating
+   - Many relevant terms are likely to be missing, especially for emerging topics
+   - Some words have different meanings in different contexts (e.g., "model" can refer to code or academic concepts)
+
+2. **False Classifications**:
+   - Queries about LLMs being incorrectly classified as code-related instead of academic
+   - General queries potentially being misclassified if they happen to contain certain keywords
+   - No way to handle queries that span multiple categories
+
+3. **Maintenance Burden**:
+   - Need to regularly update keyword lists for each category
+   - Complex if/then logic to determine query types
+   - Hard to adapt to new research domains or technologies
+
+## Proposed Solution
+
+Replace the keyword-based classification with an LLM-based classification that:
+1. Uses semantic understanding to determine query intent and domain
+2. Can classify queries into multiple categories with confidence scores
+3. Provides reasoning for the classification
+4. Can adapt to new topics without code changes
+
+## Technical Implementation
+
+### 1. Extend LLM Interface with Domain Classification
+
+Add a new method to the `LLMInterface` class in `query/llm_interface.py`:
+
+```python
+async def classify_query_domain(self, query: str) -> Dict[str, Any]:
+    """
+    Classify a query's domain type (academic, code, current_events, general).
+    
+    Args:
+        query: The query to classify
+        
+    Returns:
+        Dictionary with query domain type and confidence scores
+    """
+    # Get the model assigned to this function
+    model_name = self.config.get_module_model('query_processing', 'classify_query_domain')
+    
+    # Create a new interface with the assigned model if different from current
+    if model_name != self.model_name:
+        interface = LLMInterface(model_name)
+        return await interface._classify_query_domain_impl(query)
+    
+    return await self._classify_query_domain_impl(query)
+
+async def _classify_query_domain_impl(self, query: str) -> Dict[str, Any]:
+    """Implementation of query domain classification."""
+    messages = [
+        {"role": "system", "content": """You are an expert query classifier. 
+        Analyze the given query and classify it into the following domain types:
+        - academic: Related to scholarly research, scientific studies, academic papers, formal theories, university-level research topics, or scholarly fields of study
+        - code: Related to programming, software development, technical implementation, coding languages, frameworks, or technology implementation questions
+        - current_events: Related to recent news, ongoing developments, time-sensitive information, current politics, breaking stories, or real-time events
+        - general: General information seeking that doesn't fit the above categories
+        
+        You may assign multiple types if the query spans several domains.
+        
+        Respond with a JSON object containing:
+        {
+            "primary_type": "the most appropriate type",
+            "confidence": 0.X,
+            "secondary_types": [{"type": "another_applicable_type", "confidence": 0.X}, ...],
+            "reasoning": "brief explanation of your classification"
+        }
+        """},
+        {"role": "user", "content": query}
+    ]
+    
+    # Generate classification
+    response = await self.generate_completion(messages)
+    
+    # Parse JSON response
+    try:
+        classification = json.loads(response)
+        return classification
+    except json.JSONDecodeError:
+        # Fallback to default classification if parsing fails
+        print(f"Error parsing domain classification response: {response}")
+        return {
+            "primary_type": "general", 
+            "confidence": 0.5, 
+            "secondary_types": [],
+            "reasoning": "Failed to parse classification response"
+        }
+```
+
+### 2. Update QueryProcessor Class
+
+Modify the `QueryProcessor` class in `query/query_processor.py` to use the new LLM-based classification:
+
+```python
+async def process_query(self, query: str) -> Dict[str, Any]:
+    """
+    Process a user query.
+    
+    Args:
+        query: The raw user query
+        
+    Returns:
+        Dictionary containing the processed query information
+    """
+    logger.info(f"Processing query: {query}")
+    
+    # Enhance the query
+    enhanced_query = await self.llm_interface.enhance_query(query)
+    logger.info(f"Enhanced query: {enhanced_query}")
+    
+    # Classify the query type (factual, exploratory, comparative)
+    query_type_classification = await self.llm_interface.classify_query(query)
+    logger.info(f"Query type classification: {query_type_classification}")
+    
+    # Classify the query domain (academic, code, current_events, general)
+    domain_classification = await self.llm_interface.classify_query_domain(query)
+    logger.info(f"Query domain classification: {domain_classification}")
+    
+    # Extract entities from the classification
+    entities = query_type_classification.get('entities', [])
+    
+    # Structure the query using the new classification approach
+    structured_query = self._structure_query_with_llm(
+        query, 
+        enhanced_query, 
+        query_type_classification,
+        domain_classification
+    )
+    
+    # Decompose the query into sub-questions (if complex enough)
+    structured_query = await self.query_decomposer.decompose_query(query, structured_query)
+    
+    # Log the number of sub-questions if any
+    if 'sub_questions' in structured_query and structured_query['sub_questions']:
+        logger.info(f"Decomposed into {len(structured_query['sub_questions'])} sub-questions")
+    else:
+        logger.info("Query was not decomposed into sub-questions")
+    
+    return structured_query
+
+def _structure_query_with_llm(self, original_query: str, enhanced_query: str,
+                         type_classification: Dict[str, Any],
+                         domain_classification: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Structure a query using LLM classification results.
+    
+    Args:
+        original_query: The original user query
+        enhanced_query: The enhanced query
+        type_classification: Classification of query type (factual, exploratory, comparative)
+        domain_classification: Classification of query domain (academic, code, current_events)
+        
+    Returns:
+        Dictionary containing the structured query
+    """
+    # Get primary domain and confidence
+    primary_domain = domain_classification.get('primary_type', 'general')
+    primary_confidence = domain_classification.get('confidence', 0.5)
+    
+    # Get secondary domains
+    secondary_domains = domain_classification.get('secondary_types', [])
+    
+    # Determine domain flags
+    is_academic = primary_domain == 'academic' or any(d['type'] == 'academic' for d in secondary_domains)
+    is_code = primary_domain == 'code' or any(d['type'] == 'code' for d in secondary_domains)
+    is_current_events = primary_domain == 'current_events' or any(d['type'] == 'current_events' for d in secondary_domains)
+    
+    # Higher threshold for secondary domains to avoid false positives
+    if primary_domain != 'academic' and any(d['type'] == 'academic' and d['confidence'] >= 0.3 for d in secondary_domains):
+        is_academic = True
+        
+    if primary_domain != 'code' and any(d['type'] == 'code' and d['confidence'] >= 0.3 for d in secondary_domains):
+        is_code = True
+        
+    if primary_domain != 'current_events' and any(d['type'] == 'current_events' and d['confidence'] >= 0.3 for d in secondary_domains):
+        is_current_events = True
+    
+    return {
+        'original_query': original_query,
+        'enhanced_query': enhanced_query,
+        'type': type_classification.get('type', 'unknown'),
+        'intent': type_classification.get('intent', 'research'),
+        'entities': type_classification.get('entities', []),
+        'domain': primary_domain,
+        'domain_confidence': primary_confidence,
+        'secondary_domains': secondary_domains,
+        'classification_reasoning': domain_classification.get('reasoning', ''),
+        'timestamp': None,  # Will be filled in by the caller
+        'is_current_events': is_current_events,
+        'is_academic': is_academic,
+        'is_code': is_code,
+        'metadata': {
+            'type_classification': type_classification,
+            'domain_classification': domain_classification
+        }
+    }
+```
+
+### 3. Remove Legacy Keyword-Based Classification Methods
+
+Once the new LLM-based classification is working correctly, remove or deprecate the old keyword-based methods:
+- `_is_current_events_query`
+- `_is_academic_query`
+- `_is_code_query`
+
+And the original `_structure_query` method.
+
+### 4. Update Search Executor Integration
+
+The `SearchExecutor` class already looks for the flags in the structured query:
+- `is_academic`
+- `is_code`
+- `is_current_events`
+
+So no changes are needed to the `execute_search` method. The improved classification will simply provide more accurate flags.
+
+### 5. Update Configuration
+
+Add the new `classify_query_domain` function to the module model configuration to allow different models to be assigned to this function:
+
+```yaml
+module_models:
+  query_processing:
+    enhance_query: llama-3.1-8b-instant  # Fast model for query enhancement
+    classify_query: llama-3.1-8b-instant  # Fast model for query type classification
+    classify_query_domain: llama-3.1-8b-instant  # Fast model for domain classification
+    generate_search_queries: llama-3.1-8b-instant  # Fast model for search query generation
+```
+
+### 6. Testing Plan
+
+1. **Unit Tests**:
+   - Create test cases for `classify_query_domain` with various query types
+   - Verify correct classification of academic, code, and current events queries
+   - Test edge cases and queries that span multiple domains
+
+2. **Integration Tests**:
+   - Test the full query processing pipeline with the new classification
+   - Verify that the correct search engines are selected based on the classification
+   - Compare results with the old keyword-based approach
+
+3. **Regression Testing**:
+   - Ensure that all existing functionality works with the new classification
+   - Verify that no existing test cases fail
+
+### 7. Logging and Monitoring
+
+Add detailed logging to monitor the performance of the new classification:
+
+```python
+logger.info(f"Query domain classification: primary={domain_classification.get('primary_type')} confidence={domain_classification.get('confidence')}")
+if domain_classification.get('secondary_types'):
+    for sec_type in domain_classification.get('secondary_types'):
+        logger.info(f"Secondary domain: {sec_type['type']} confidence={sec_type['confidence']}")
+logger.info(f"Classification reasoning: {domain_classification.get('reasoning', 'None provided')}")
+```
+
+### 8. Fallback Mechanism
+
+Implement a fallback to the keyword-based approach if the LLM classification fails:
+
+```python
+try:
+    domain_classification = await self.llm_interface.classify_query_domain(query)
+    structured_query = self._structure_query_with_llm(query, enhanced_query, query_type_classification, domain_classification)
+except Exception as e:
+    logger.error(f"LLM domain classification failed: {e}. Falling back to keyword-based classification.")
+    # Fallback to keyword-based approach
+    structured_query = self._structure_query(query, enhanced_query, query_type_classification)
+```
+
+## Timeline and Resources
+
+### Phase 1: Development (2-3 days)
+- Implement the new `classify_query_domain` method in `LLMInterface`
+- Create the new `_structure_query_with_llm` method in `QueryProcessor`
+- Update the `process_query` method to use the new approach
+- Add configuration for the new function
+
+### Phase 2: Testing (1-2 days)
+- Create test cases for the new classification
+- Test with various query types
+- Compare with the old approach
+
+### Phase 3: Deployment and Monitoring (1 day)
+- Deploy the new version
+- Monitor logs for classification issues
+- Adjust prompts and thresholds as needed
+
+### Phase 4: Cleanup (1 day)
+- Remove the old keyword-based methods
+- Update documentation
+
+## Expected Outcomes
+
+1. **Improved Classification Accuracy**:
+   - More accurate identification of academic, code, and current events queries
+   - Better handling of queries that span multiple domains
+   - Proper classification of queries about emerging topics (like LLMs)
+
+2. **Reduced Maintenance**:
+   - No need to update keyword lists
+   - Adaptability to new domains without code changes
+
+3. **Enhanced User Experience**:
+   - More relevant search results
+   - Better report generation due to proper query classification
+
+4. **System Robustness**:
+   - Graceful handling of edge cases
+   - Better explanation of classification decisions
+   - Proper confidence scoring for ambiguous queries
+
+## Examples
+
+To illustrate how the new approach would work, here are some examples:
+
+### Example 1: Academic Query
+**Query**: "What are the technological, economic, and social implications of large language models in today's society?"
+
+**Current Classification**: Might be misclassified as code-related due to "models"
+
+**LLM Classification**:
+```json
+{
+  "primary_type": "academic",
+  "confidence": 0.9,
+  "secondary_types": [
+    {"type": "general", "confidence": 0.4}
+  ],
+  "reasoning": "This query is asking about implications of LLMs across multiple domains (technological, economic, and social) which is a scholarly research topic that would be well-addressed by academic sources."
+}
+```
+
+### Example 2: Code Query
+**Query**: "How do I implement a transformer model in PyTorch for text classification?"
+
+**Current Classification**: Might be correctly classified as code-related due to "implement", "model", and "PyTorch"
+
+**LLM Classification**:
+```json
+{
+  "primary_type": "code",
+  "confidence": 0.95,
+  "secondary_types": [
+    {"type": "academic", "confidence": 0.4}
+  ],
+  "reasoning": "This is primarily a programming question about implementing a specific model in PyTorch, which is a coding framework. It has academic aspects since it relates to machine learning models, but the focus is on implementation."
+}
+```
+
+### Example 3: Current Events Query
+**Query**: "What are the latest developments in the Ukraine conflict?"
+
+**Current Classification**: Likely correct if "Ukraine" is in the current events entities list
+
+**LLM Classification**:
+```json
+{
+  "primary_type": "current_events",
+  "confidence": 0.95,
+  "secondary_types": [],
+  "reasoning": "This query is asking about 'latest developments' in an ongoing conflict, which clearly indicates a focus on recent news and time-sensitive information."
+}
+```
+
+### Example 4: Mixed Query
+**Query**: "How are LLMs being used to detect and prevent cyber attacks?"
+
+**Current Classification**: Might have mixed signals from both academic and code keywords
+
+**LLM Classification**:
+```json
+{
+  "primary_type": "academic",
+  "confidence": 0.7,
+  "secondary_types": [
+    {"type": "code", "confidence": 0.6},
+    {"type": "current_events", "confidence": 0.3}
+  ],
+  "reasoning": "This query relates to research on LLM applications in cybersecurity (academic), has technical implementation aspects (code), and could relate to recent developments in the field (current events). The primary focus appears to be on research and study of this application."
+}
+```
+
+## Conclusion
+
+Replacing the keyword-based classification with an LLM-based approach will significantly improve the accuracy and adaptability of the query classification system. This will lead to better search results and report generation, particularly for complex or multi-domain queries like those about large language models. The implementation can be completed in 5-7 days and will reduce ongoing maintenance work by eliminating the need to update keyword lists.
--- a/.note/session_log.md
+++ b/.note/session_log.md
@ -1,5 +1,45 @@
 # Session Log

+## Session: 2025-03-19 - Fixed Gradio UI Bug with List Object in Markdown Component
+
+### Overview
+Fixed a critical bug in the Gradio UI where a list object was being passed to a Markdown component, causing an AttributeError when the `expandtabs()` method was called on the list.
+
+### Key Activities
+1. **Identified the Root Cause**:
+   - The error occurred in the Gradio interface, specifically in the Markdown component's postprocess method
+   - The error message was: `AttributeError: 'list' object has no attribute 'expandtabs'`
+   - The issue was in the `_delete_selected_reports` and `refresh_reports_list` functions, which were returning three values (reports_data, choices, status_message), but the click handlers were only expecting two outputs (reports_checkbox_group, status_message)
+   - This caused the list to be passed to the Markdown component, which expected a string
+
+2. **Implemented Fixes**:
+   - Updated the click handlers for the delete button and refresh button to handle all three outputs
+   - Added the reports_checkbox_group component twice in the outputs list to match the three return values
+   - This ensured that the status_message (a string) was correctly passed to the Markdown component
+   - Tested the fix by running the UI and verifying that the error no longer occurs
+
+3. **Verified the Solution**:
+   - Confirmed that the UI now works correctly without any errors
+   - Tested various operations (deleting reports, refreshing the list) to ensure they work as expected
+   - Verified that the status messages are displayed correctly in the UI
+
+### Insights
+- Gradio's component handling requires careful matching between function return values and output components
+- When a function returns more values than there are output components, Gradio will try to pass the extra values to the last component
+- In this case, the list was being passed to the Markdown component, which expected a string
+- Adding the same component multiple times in the outputs list is a valid solution to handle multiple return values
+
+### Challenges
+- Identifying the root cause of the error required careful analysis of the error message and the code
+- Understanding how Gradio handles function return values and output components
+- Ensuring that the fix doesn't introduce new issues
+
+### Next Steps
+1. Consider adding more comprehensive error handling in the UI components
+2. Review other similar functions to ensure they don't have the same issue
+3. Add more detailed logging to help diagnose similar issues in the future
+4. Consider adding unit tests for the UI components to catch similar issues earlier
+
 ## Session: 2025-03-19 - Model Provider Selection Fix in Report Generation

 ### Overview
@ -665,357 +705,4 @@ Implemented Phase 3 of the Report Generation module, focusing on report synthesi
 - Handling edge cases where document chunks contain irrelevant information

 ### Next Steps
-1. Implement support for alternative models with larger context windows
-2. Develop progressive report generation for very large research tasks
-3. Create visualization components for data mentioned in reports
-4. Add interactive elements to the generated reports
-5. Implement report versioning and comparison
-
-## Session: 2025-02-27 (Update 2)
-
-### Overview
-Successfully tested the end-to-end query to report pipeline with a specific query about the environmental and economic impact of electric vehicles, and fixed an issue with the Jina reranker integration.
-
-### Key Activities
-1. **Fixed Jina Reranker Integration**:
-   - Corrected the import statement in query_to_report.py to use the proper function name (get_jina_reranker)
-   - Updated the reranker call to properly format the results for the JinaReranker
-   - Implemented proper extraction of text from search results for reranking
-   - Added mapping of reranked indices back to the original results
-
-2. **Created EV Query Test Script**:
-   - Developed a dedicated test script (test_ev_query.py) for testing the pipeline with a query about electric vehicles
-   - Configured the script to use 7 results per search engine for a comprehensive report
-   - Added proper error handling and result display
-
-3. **Tested End-to-End Pipeline**:
-   - Successfully executed the full query to report workflow
-   - Verified that all components (query processor, search executor, reranker, report generator) work together seamlessly
-   - Generated a comprehensive report on the environmental and economic impact of electric vehicles
-
-4. **Identified Report Detail Configuration Options**:
-   - Documented multiple ways to adjust the level of detail in generated reports
-   - Identified parameters that can be modified to control report comprehensiveness
-   - Created a plan for implementing customizable report detail levels
-
-### Insights
- The end-to-end pipeline successfully connects all major components of the system
- The Jina reranker significantly improves the relevance of search results for report generation
- The map-reduce approach effectively processes document chunks into a coherent report
- Some document sources (like ScienceDirect and ResearchGate) may require special handling due to access restrictions
-
-### Challenges
- Handling API errors and access restrictions for certain document sources
- Ensuring proper formatting of data between different components
- Managing the processing of a large number of document chunks efficiently
-
-### Next Steps
-1. **Implement Customizable Report Detail Levels**:
-   - Develop a system to allow users to select different levels of detail for generated reports
-   - Integrate the customizable detail levels into the report generator
-   - Test the new feature with various query types
-
-2. **Add Support for Alternative Models**:
-   - Research and implement support for alternative models with larger context windows
-   - Test the new models with the report generation pipeline
-
-3. **Develop Progressive Report Generation**:
-   - Design and implement a system for progressive report generation
-   - Test the new feature with very large research tasks
-
-4. **Create Visualization Components**:
-   - Develop visualization components for data mentioned in reports
-   - Integrate the visualization components into the report generator
-
-5. **Add Interactive Elements**:
-   - Develop interactive elements for the generated reports
-   - Integrate the interactive elements into the report generator
-
-## Session: 2025-02-28
-
-### Overview
-Implemented customizable report detail levels for the Report Generation Module, allowing users to select different levels of detail for generated reports.
-
-### Key Activities
-1. **Created Report Detail Levels Module**:
-   - Implemented a new module `report_detail_levels.py` with an enum for detail levels (Brief, Standard, Detailed, Comprehensive)
-   - Created a `ReportDetailLevelManager` class to manage detail level configurations
-   - Defined specific parameters for each detail level (num_results, token_budget, chunk_size, overlap_size, model)
-   - Added methods to validate and retrieve detail level configurations
-
-2. **Updated Report Synthesis Module**:
-   - Modified the `ReportSynthesizer` class to accept and use detail level parameters
-   - Updated synthesis templates to adapt based on the selected detail level
-   - Adjusted the map-reduce process to handle different levels of detail
-   - Implemented model selection based on detail level requirements
-
-3. **Enhanced Report Generator**:
-   - Added methods to set and get detail levels in the `ReportGenerator` class
-   - Updated the document preparation process to use detail level configurations
-   - Modified the report generation workflow to incorporate detail level settings
-   - Implemented validation for detail level parameters
-
-4. **Updated Query to Report Script**:
-   - Added command-line arguments for detail level selection
-   - Implemented a `--list-detail-levels` option to display available options
-   - Updated the main workflow to pass detail level parameters to the report generator
-   - Added documentation for the new parameters
-
-5. **Created Test Scripts**:
-   - Updated `test_ev_query.py` to support detail level selection
-   - Created a new `test_detail_levels.py` script to generate reports with all detail levels for comparison
-   - Added metrics collection (timing, report size, word count) for comparison
-
-### Insights
- Different detail levels significantly affect report length, depth, and generation time
- The brief level is useful for quick summaries, while comprehensive provides exhaustive information
- Using different models for different detail levels offers a good balance between speed and quality
- Configuring multiple parameters (num_results, token_budget, etc.) together creates a coherent detail level experience
-
-### Challenges
- Ensuring that the templates produce appropriate output for each detail level
- Balancing between speed and quality for different detail levels
- Managing token budgets effectively across different detail levels
- Ensuring backward compatibility with existing code
-
-### Next Steps
-1. Conduct thorough testing of the detail level features with various query types
-2. Gather user feedback on the quality and usefulness of reports at different detail levels
-3. Refine the detail level configurations based on testing and feedback
-4. Implement progressive report generation for very large research tasks
-5. Develop visualization components for data mentioned in reports
-
-## Session: 2025-02-28 - Enhanced Report Detail Levels
-
-### Overview
-In this session, we enhanced the report detail levels to focus more on analytical depth rather than just adding additional sections. We improved the document chunk processing to extract more meaningful information from each chunk for detailed and comprehensive reports.
-
-### Key Activities
-1. **Enhanced Template Modifiers for Detailed and Comprehensive Reports**:
-   - Rewrote the template modifiers to focus on analytical depth, evidence density, and perspective diversity
-   - Added explicit instructions to prioritize depth over breadth
-   - Emphasized multi-layered analysis, causal relationships, and interconnections
-   - Added instructions for exploring second and third-order effects
-
-2. **Improved Document Chunk Processing**:
-   - Created a new `_get_extraction_prompt` method that provides different extraction prompts based on detail level
-   - For DETAILED reports: Added focus on underlying principles, causal relationships, and different perspectives
-   - For COMPREHENSIVE reports: Added focus on multi-layered analysis, complex causal networks, and theoretical frameworks
-   - Modified the `map_document_chunks` method to pass the detail level parameter
-
-3. **Enhanced MapReduce Approach**:
-   - Updated the map phase to use detail-level-specific extraction prompts
-   - Ensured the detail level parameter is passed throughout the process
-   - Maintained the efficient processing of document chunks while improving the quality of extraction
-
-### Insights
- The MapReduce approach is well-suited for LLM-based report generation, allowing processing of more information than would fit in a single context window
- Different extraction prompts for different detail levels significantly affect the quality and depth of the extracted information
- Focusing on analytical depth rather than additional sections provides more value to the end user
- The enhanced prompts guide the LLM to provide deeper analysis of causal relationships, underlying mechanisms, and interconnections
-
-### Challenges
- Balancing between depth and breadth in detailed reports
- Ensuring that the extraction prompts extract the most relevant information for each detail level
- Managing the increased processing time for detailed and comprehensive reports with enhanced extraction
-
-### Next Steps
-1. Conduct thorough testing of the enhanced detail level features with various query types
-2. Compare the analytical depth and quality of reports generated with the new prompts
-3. Gather user feedback on the improved reports at different detail levels
-4. Explore parallel processing for the map phase to reduce overall report generation time
-5. Further refine the detail level configurations based on testing and feedback
-
-## Session: 2025-02-28 - Gradio UI Enhancements and Future Planning
-
-### Overview
-In this session, we fixed issues in the Gradio UI for report generation and planned future enhancements to improve search quality and user experience.
-
-### Key Activities
-1. **Fixed Gradio UI for Report Generation**:
-   - Updated the `generate_report` method in the Gradio UI to properly process queries and generate structured queries
-   - Integrated the `QueryProcessor` to create structured queries from user input
-   - Fixed method calls and parameter passing to the `execute_search` method
-   - Implemented functionality to process `<thinking>` tags in the generated report
-   - Added support for custom model selection in the UI
-   - Updated the interfaces documentation to include ReportGenerator and ReportDetailLevelManager interfaces
-
-2. **Planned Future Enhancements**:
-   - **Multiple Query Variation Generation**:
-     - Designed an approach to generate several similar queries with different keywords for better search coverage
-     - Planned modifications to the QueryProcessor and SearchExecutor to handle multiple queries
-     - Estimated this as a moderate difficulty task (3-4 days of work)
-   
-   - **Threshold-Based Reranking with Larger Document Sets**:
-     - Developed a plan to process more initial documents and use reranking to select the most relevant ones
-     - Designed new detail level configuration parameters for initial and final result counts
-     - Estimated this as an easy to moderate difficulty task (2-3 days of work)
-   
-   - **UI Progress Indicators**:
-     - Identified the need for chunk processing progress indicators in the UI
-     - Planned modifications to report_synthesis.py to add logging during document processing
-     - Estimated this as a simple enhancement (15-30 minutes of work)
-
-### Insights
- The modular architecture of the system makes it easy to extend with new features
- Providing progress indicators during report generation would significantly improve user experience
- Generating multiple query variations could substantially improve search coverage and result quality
- Using a two-stage approach (fetch more, then filter) for document retrieval would likely improve report quality
-
-### Challenges
- Balancing between fetching enough documents for comprehensive coverage and maintaining performance
- Ensuring proper deduplication when using multiple query variations
- Managing the increased API usage that would result from processing more queries and documents
-
-### Next Steps
-1. Implement the chunk processing progress indicators as a quick win
-2. Begin work on the multiple query variation generation feature
-3. Test the current implementation with various query types to identify any remaining issues
-4. Update the documentation to reflect the new features and future plans
-
-## Session: 2025-03-12 - Query Type Selection in Gradio UI
-
-### Overview
-In this session, we enhanced the Gradio UI by adding a query type selection dropdown, allowing users to explicitly select the query type (factual, exploratory, comparative) instead of relying on automatic detection.
-
-### Key Activities
-1. **Added Query Type Selection to Gradio UI**:
-   - Added a dropdown menu for query type selection in the "Generate Report" tab
-   - Included options for "auto-detect", "factual", "exploratory", and "comparative"
-   - Added descriptive tooltips explaining each query type
-   - Set "auto-detect" as the default option
-
-2. **Updated Report Generation Logic**:
-   - Modified the `generate_report` method in the `GradioInterface` class to handle the new query_type parameter
-   - Updated the report button click handler to pass the query type to the generate_report method
-   - Added logging to show when a user-selected query type is being used
-
-3. **Enhanced Report Generator**:
-   - Updated the `generate_report` method in the `ReportGenerator` class to accept a query_type parameter
-   - Modified the report synthesizer calls to pass the query_type parameter
-   - Added logging to track query type usage
-
-4. **Added Documentation**:
-   - Added a "Query Types" section to the Gradio UI explaining each query type
-   - Included examples of when to use each query type
-   - Updated code comments to explain the query type parameter
-
-### Insights
- Explicit query type selection gives users more control over the report generation process
- Different query types benefit from specialized report templates and structures
- The auto-detect option provides convenience while still allowing manual override
- Clear documentation helps users understand when to use each query type
-
-### Challenges
- Ensuring backward compatibility with existing code
- Maintaining the auto-detect functionality while adding manual selection
- Passing the query type parameter through multiple layers of the application
- Providing clear explanations of query types for users
-
-### Next Steps
-1. Test the query type selection with various queries to ensure it works correctly
-2. Gather user feedback on the usefulness of manual query type selection
-3. Consider adding more specialized templates for specific query types
-4. Explore adding query type detection confidence scores to help users decide when to override
-5. Add examples of each query type to help users understand the differences
-
-## Session: 2025-03-12 - Fixed Query Type Parameter Bug
-
-### Overview
-Fixed a bug in the report generation process where the `query_type` parameter was not properly handled, causing an error when it was `None`.
-
-### Key Activities
-1. **Fixed NoneType Error in Report Synthesis**:
-   - Added a null check in the `_get_extraction_prompt` method in `report_synthesis.py`
-   - Modified the condition that checks for comparative queries to handle the case where `query_type` is `None`
-   - Ensured the method works correctly regardless of whether a query type is explicitly provided
-
-2. **Root Cause Analysis**:
-   - Identified that the error occurred when the `query_type` parameter was `None` and the code tried to call `.lower()` on it
-   - Traced the issue through the call chain from the UI to the report generator to the report synthesizer
-   - Confirmed that the fix addresses the specific error message: `'NoneType' object has no attribute 'lower'`
-
-### Insights
- Proper null checking is essential when working with optional parameters that are passed through multiple layers
- The error occurred in the report synthesis module but was triggered by the UI's query type selection feature
- The fix maintains backward compatibility while ensuring the new query type selection feature works correctly
-
-### Next Steps
-1. Test the fix with various query types to ensure it works correctly
-2. Consider adding similar null checks in other parts of the code that handle the query_type parameter
-3. Add more comprehensive error handling throughout the report generation process
-4. Update the test suite to include tests for null query_type values
-
-## Session: 2025-03-12 - Fixed Template Retrieval for Null Query Type
-
-### Overview
-Fixed a second issue in the report generation process where the template retrieval was failing when the `query_type` parameter was `None`.
-
-### Key Activities
-1. **Fixed Template Retrieval for Null Query Type**:
-   - Updated the `_get_template_from_strings` method in `report_synthesis.py` to handle `None` query_type
-   - Added a default value of "exploratory" when query_type is `None`
-   - Modified the method signature to explicitly indicate that query_type_str can be `None`
-   - Added logging to indicate when the default query type is being used
-
-2. **Root Cause Analysis**:
-   - Identified that the error occurred when trying to convert `None` to a `QueryType` enum value
-   - The error message was: "No template found for None standard" and "None is not a valid QueryType"
-   - The issue was in the template retrieval process which is used by both standard and progressive report synthesis
-
-### Insights
- When fixing one issue with optional parameters, it's important to check for similar issues in related code paths
- Providing sensible defaults for optional parameters helps maintain robustness
- Proper error handling and logging helps diagnose issues in complex systems with multiple layers
-
-### Next Steps
-1. Test the fix with comprehensive reports to ensure it works correctly
-2. Consider adding similar default values for other optional parameters
-3. Review the codebase for other potential null reference issues
-4. Update documentation to clarify the behavior when optional parameters are not provided
-
-## Session: 2025-03-19 - Enhanced Report Management UI
-
-### Overview
-Implemented significant improvements to the report management UI, focusing on checkbox functionality and visual aesthetics while ensuring proper integration with backend operations.
-
-### Key Activities
-1. **Improved Checkbox Display and Organization**:
-   - Implemented a custom HTML/JavaScript solution for the checkbox interface
-   - Created a visually appealing single-column layout for better readability
-   - Added proper styling with dark theme to match the overall UI aesthetics
-   - Implemented scrolling capability for handling long lists of reports
-
-2. **Added Check/Uncheck All Functionality**:
-   - Implemented a "Check/Uncheck All" checkbox at the top of the list
-   - Created JavaScript functions to handle the toggle behavior client-side
-   - Ensured all checkbox state changes are properly tracked
-
-3. **Enhanced Backend Integration**:
-   - Implemented JSON-based communication between the UI and backend
-   - Added robust error handling for JSON parsing to prevent crashes
-   - Improved logging to track user selections and aide debugging
-   - Made the download and delete handlers more resilient to input errors
-
-4. **UI Styling Enhancements**:
-   - Changed the container background to dark theme to match the rest of the UI
-   - Improved text contrast for better readability
-   - Added proper borders, padding, and spacing for visual consistency
-   - Ensured the UI is responsive with appropriate scrolling behavior
-
-### Insights
- Custom HTML/JavaScript solutions provide more control over UI layout than Gradio's built-in components
- Dark themed UI elements create a more consistent and professional look
- Robust error handling is critical for UI components that process user input
- Detailed logging helps identify and fix issues in interactive elements
-
-### Challenges
- Gradio's default checkbox layout wasn't conducive to a single-column display
- JSON parsing between JavaScript and Python required careful error handling
- Ensuring visual consistency with the rest of the application
- Maintaining proper event handler connections when switching to custom HTML
-
-### Next Steps
-1. Gather user feedback on the improved checkbox interface
-2. Consider adding filtering capabilities to help manage large report lists
-3. Explore the possibility of batch operations for report management
+1.
--- a/docs/llm_query_classification.md
+++ b/docs/llm_query_classification.md
@ -0,0 +1,122 @@
+# LLM-Based Query Classification
+
+## Overview
+
+This document describes the implementation of LLM-based query domain classification in the sim-search project, replacing the previous keyword-based approach.
+
+## Motivation
+
+The previous keyword-based classification had several limitations:
+- Relied on static lists of keywords that needed constant updating
+- Could not capture the semantic meaning of queries
+- Generated false classifications for ambiguous or novel queries
+- Required significant maintenance to keep keyword lists updated
+
+## Implementation
+
+### New Components
+
+1. **LLM Interface Extension**:
+   - Added `classify_query_domain()` method to `LLMInterface` class
+   - Added `_classify_query_domain_impl()` private implementation method
+   - Configured to use the fast Llama-3.1-8b-instant model by default
+
+2. **Query Processor Updates**:
+   - Added `_structure_query_with_llm()` method that uses the LLM classification results
+   - Updated `process_query()` to use both query type and domain classification
+   - Retained keyword-based method as a fallback in case of LLM API failures
+
+3. **Structured Query Enhancements**:
+   - Added new fields to the structured query:
+     - `domain`: Primary domain type (academic, code, current_events, general)
+     - `domain_confidence`: Confidence score for the primary domain
+     - `secondary_domains`: Array of secondary domains with confidence scores
+     - `classification_reasoning`: Explanation of the classification
+
+4. **Configuration Updates**:
+   - Added `classify_query_domain` to the module-specific model assignments
+   - Using the same Llama-3.1-8b-instant model for domain classification as for other query processing tasks
+
+5. **Logging and Monitoring**:
+   - Added detailed logging of domain classification results
+   - Log secondary domains with confidence scores
+   - Log the reasoning behind classifications
+
+6. **Error Handling**:
+   - Added fallback to keyword-based classification if LLM-based classification fails
+   - Implemented robust JSON parsing with fallbacks to default values
+   - Added explicit error messages for troubleshooting
+
+### Classification Process
+
+The query domain classification process works as follows:
+
+1. The query is sent to the LLM with a prompt specifying the four domain types
+2. The LLM returns a JSON response containing:
+   - Primary domain type with confidence score
+   - Array of secondary domain types with confidence scores
+   - Reasoning for the classification
+3. The response is parsed and integrated into the structured query
+4. The `is_academic`, `is_code`, and `is_current_events` flags are set based on:
+   - Primary domain matching the type
+   - Any secondary domain matching the type with confidence above 0.3
+5. The structured query is then used by downstream components like the search executor
+
+## Benefits
+
+The new approach offers several advantages:
+
+1. **Semantic Understanding**: Captures the meaning and intent of queries rather than just keyword matching
+2. **Multi-Domain Recognition**: Recognizes when queries span multiple domains with confidence scores
+3. **Self-Explaining**: Provides reasoning for classifications, aiding debugging and transparency
+4. **Adaptability**: Automatically adapts to new topics and terminology without code changes
+5. **Confidence Scoring**: Indicates how confident the system is in its classification
+
+## Testing and Validation
+
+A comprehensive test script (`test_domain_classification.py`) was created to:
+1. Test the raw domain classification function with a variety of queries
+2. Test the query processor's integration with domain classification
+3. Compare the LLM-based approach with the previous keyword-based approach
+
+## Examples
+
+### Academic Query Example
+**Query**: "What are the technological, economic, and social implications of large language models in today's society?"
+
+**LLM Classification**:
+```json
+{
+  "primary_type": "academic",
+  "confidence": 0.9,
+  "secondary_types": [
+    {"type": "general", "confidence": 0.4}
+  ],
+  "reasoning": "This query is asking about implications of LLMs across multiple domains (technological, economic, and social) which is a scholarly research topic that would be well-addressed by academic sources."
+}
+```
+
+### Code Query Example
+**Query**: "How do I implement a transformer model in PyTorch for text classification?"
+
+**LLM Classification**:
+```json
+{
+  "primary_type": "code",
+  "confidence": 0.95,
+  "secondary_types": [
+    {"type": "academic", "confidence": 0.4}
+  ],
+  "reasoning": "This is primarily a programming question about implementing a specific model in PyTorch, which is a coding framework. It has academic aspects since it relates to machine learning models, but the focus is on implementation."
+}
+```
+
+## Future Improvements
+
+Potential enhancements for the future:
+
+1. **Caching**: Add caching for frequently asked or similar queries to reduce API calls
+2. **Few-Shot Learning**: Add examples in the prompt to improve classification accuracy
+3. **Expanded Domains**: Consider additional domain categories beyond the current four
+4. **UI Integration**: Expose classification reasoning in the UI for advanced users
+5. **Classification Feedback Loop**: Allow users to correct misclassifications to improve the system over time
--- a/query/llm_interface.py
+++ b/query/llm_interface.py
@ -199,6 +199,66 @@ class LLMInterface:
            # Return error message in a user-friendly format
            return f"I encountered an error while processing your request: {str(e)}"
    
+    async def classify_query_domain(self, query: str) -> Dict[str, Any]:
+        """
+        Classify a query's domain type (academic, code, current_events, general).
+        
+        Args:
+            query: The query to classify
+            
+        Returns:
+            Dictionary with query domain type and confidence scores
+        """
+        # Get the model assigned to this function
+        model_name = self.config.get_module_model('query_processing', 'classify_query_domain')
+        
+        # Create a new interface with the assigned model if different from current
+        if model_name != self.model_name:
+            interface = LLMInterface(model_name)
+            return await interface._classify_query_domain_impl(query)
+        
+        return await self._classify_query_domain_impl(query)
+
+    async def _classify_query_domain_impl(self, query: str) -> Dict[str, Any]:
+        """Implementation of query domain classification."""
+        messages = [
+            {"role": "system", "content": """You are an expert query classifier. 
+            Analyze the given query and classify it into the following domain types:
+            - academic: Related to scholarly research, scientific studies, academic papers, formal theories, university-level research topics, or scholarly fields of study
+            - code: Related to programming, software development, technical implementation, coding languages, frameworks, or technology implementation questions
+            - current_events: Related to recent news, ongoing developments, time-sensitive information, current politics, breaking stories, or real-time events
+            - general: General information seeking that doesn't fit the above categories
+            
+            You may assign multiple types if the query spans several domains.
+            
+            Respond with a JSON object containing:
+            {
+                "primary_type": "the most appropriate type",
+                "confidence": 0.X,
+                "secondary_types": [{"type": "another_applicable_type", "confidence": 0.X}, ...],
+                "reasoning": "brief explanation of your classification"
+            }
+            """},
+            {"role": "user", "content": query}
+        ]
+        
+        # Generate classification
+        response = await self.generate_completion(messages)
+        
+        # Parse JSON response
+        try:
+            classification = json.loads(response)
+            return classification
+        except json.JSONDecodeError:
+            # Fallback to default classification if parsing fails
+            print(f"Error parsing domain classification response: {response}")
+            return {
+                "primary_type": "general", 
+                "confidence": 0.5, 
+                "secondary_types": [],
+                "reasoning": "Failed to parse classification response"
+            }
+            
    async def classify_query(self, query: str) -> Dict[str, str]:
        """
        Classify a query as factual, exploratory, or comparative.
--- a/query/query_processor.py
+++ b/query/query_processor.py
@ -45,15 +45,27 @@ class QueryProcessor:
        enhanced_query = await self.llm_interface.enhance_query(query)
        logger.info(f"Enhanced query: {enhanced_query}")
        
-        # Classify the query
-        classification = await self.llm_interface.classify_query(query)
-        logger.info(f"Query classification: {classification}")
+        # Classify the query type (factual, exploratory, comparative)
+        query_type_classification = await self.llm_interface.classify_query(query)
+        logger.info(f"Query type classification: {query_type_classification}")
        
-        # Extract entities from the classification
-        entities = classification.get('entities', [])
+        # Classify the query domain (academic, code, current_events, general)
+        domain_classification = await self.llm_interface.classify_query_domain(query)
+        logger.info(f"Query domain classification: {domain_classification}")
        
-        # Structure the query for downstream modules
-        structured_query = self._structure_query(query, enhanced_query, classification)
+        # Log classification details for monitoring
+        if domain_classification.get('secondary_types'):
+            for sec_type in domain_classification.get('secondary_types'):
+                logger.info(f"Secondary domain: {sec_type['type']} confidence={sec_type['confidence']}")
+        logger.info(f"Classification reasoning: {domain_classification.get('reasoning', 'None provided')}")
+        
+        try:
+            # Structure the query using the new classification approach
+            structured_query = self._structure_query_with_llm(query, enhanced_query, query_type_classification, domain_classification)
+        except Exception as e:
+            logger.error(f"LLM domain classification failed: {e}. Falling back to keyword-based classification.")
+            # Fallback to keyword-based approach
+            structured_query = self._structure_query(query, enhanced_query, query_type_classification)
        
        # Decompose the query into sub-questions (if complex enough)
        structured_query = await self.query_decomposer.decompose_query(query, structured_query)
@ -66,10 +78,68 @@ class QueryProcessor:
        
        return structured_query
    
+    def _structure_query_with_llm(self, original_query: str, enhanced_query: str, 
+                             type_classification: Dict[str, Any],
+                             domain_classification: Dict[str, Any]) -> Dict[str, Any]:
+        """
+        Structure a query using LLM classification results.
+        
+        Args:
+            original_query: The original user query
+            enhanced_query: The enhanced query
+            type_classification: Classification of query type (factual, exploratory, comparative)
+            domain_classification: Classification of query domain (academic, code, current_events)
+            
+        Returns:
+            Dictionary containing the structured query
+        """
+        # Get primary domain and confidence
+        primary_domain = domain_classification.get('primary_type', 'general')
+        primary_confidence = domain_classification.get('confidence', 0.5)
+        
+        # Get secondary domains
+        secondary_domains = domain_classification.get('secondary_types', [])
+        
+        # Determine domain flags
+        is_academic = primary_domain == 'academic' or any(d['type'] == 'academic' for d in secondary_domains)
+        is_code = primary_domain == 'code' or any(d['type'] == 'code' for d in secondary_domains)
+        is_current_events = primary_domain == 'current_events' or any(d['type'] == 'current_events' for d in secondary_domains)
+        
+        # Higher threshold for secondary domains to avoid false positives
+        if primary_domain != 'academic' and any(d['type'] == 'academic' and d['confidence'] >= 0.3 for d in secondary_domains):
+            is_academic = True
+            
+        if primary_domain != 'code' and any(d['type'] == 'code' and d['confidence'] >= 0.3 for d in secondary_domains):
+            is_code = True
+            
+        if primary_domain != 'current_events' and any(d['type'] == 'current_events' and d['confidence'] >= 0.3 for d in secondary_domains):
+            is_current_events = True
+        
+        return {
+            'original_query': original_query,
+            'enhanced_query': enhanced_query,
+            'type': type_classification.get('type', 'unknown'),
+            'intent': type_classification.get('intent', 'research'),
+            'entities': type_classification.get('entities', []),
+            'domain': primary_domain,
+            'domain_confidence': primary_confidence,
+            'secondary_domains': secondary_domains,
+            'classification_reasoning': domain_classification.get('reasoning', ''),
+            'timestamp': None,  # Will be filled in by the caller
+            'is_current_events': is_current_events,
+            'is_academic': is_academic,
+            'is_code': is_code,
+            'metadata': {
+                'type_classification': type_classification,
+                'domain_classification': domain_classification
+            }
+        }
+        
    def _structure_query(self, original_query: str, enhanced_query: str, 
                         classification: Dict[str, Any]) -> Dict[str, Any]:
        """
-        Structure a query for downstream modules.
+        Structure a query for downstream modules using keyword-based classification.
+        This is a fallback method when LLM classification fails.
        
        Args:
            original_query: The original user query
@ -79,7 +149,7 @@ class QueryProcessor:
        Returns:
            Dictionary containing the structured query
        """
-        # Detect query types
+        # Detect query types using keyword-based methods
        is_current_events = self._is_current_events_query(original_query, classification)
        is_academic = self._is_academic_query(original_query, classification)
        is_code = self._is_code_query(original_query, classification)
@ -95,7 +165,8 @@ class QueryProcessor:
            'is_academic': is_academic,
            'is_code': is_code,
            'metadata': {
-                'classification': classification
+                'classification': classification,
+                'classification_method': 'keyword' # Indicate this used the keyword-based method
            }
        }
        
--- a/report/database/documents.db
+++ b/report/database/documents.db
--- a/report/progressive_report_synthesis.py
+++ b/report/progressive_report_synthesis.py
@ -463,7 +463,20 @@ def get_progressive_report_synthesizer(model_name: Optional[str] = None) -> Prog
    global progressive_report_synthesizer
    
    if model_name and model_name != progressive_report_synthesizer.model_name:
-        progressive_report_synthesizer = ProgressiveReportSynthesizer(model_name)
+        logger.info(f"Creating new progressive report synthesizer with model: {model_name}")
+        try:
+            previous_model = progressive_report_synthesizer.model_name
+            progressive_report_synthesizer = ProgressiveReportSynthesizer(model_name)
+            logger.info(f"Successfully changed progressive synthesizer model from {previous_model} to {model_name}")
+        except Exception as e:
+            logger.error(f"Error creating new progressive report synthesizer with model {model_name}: {str(e)}")
+            # Fall back to the existing synthesizer
+            logger.info(f"Falling back to existing progressive synthesizer with model {progressive_report_synthesizer.model_name}")
+    else:
+        if model_name:
+            logger.info(f"Using existing progressive report synthesizer with model: {model_name} (already initialized)")
+        else:
+            logger.info(f"Using existing progressive report synthesizer with default model: {progressive_report_synthesizer.model_name}")
    
    return progressive_report_synthesizer

--- a/tests/integration/test_query_classification_search.py
+++ b/tests/integration/test_query_classification_search.py
@ -0,0 +1,100 @@
+"""
+Integration test for query classification and search execution.
+
+This test demonstrates how the LLM-based query domain classification
+affects the search engines selected for different types of queries.
+"""
+
+import os
+import sys
+import json
+import asyncio
+from typing import Dict, Any, List
+
+# Add parent directory to path
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
+
+from query.query_processor import get_query_processor
+from execution.search_executor import get_search_executor
+
+
+async def test_query_classification_search_integration():
+    """Test how query classification affects search engine selection."""
+    query_processor = get_query_processor()
+    search_executor = get_search_executor()
+    
+    # Test queries for different domains
+    test_queries = [
+        {
+            "description": "Academic query about quantum computing",
+            "query": "What are the latest theoretical advances in quantum computing algorithms?"
+        },
+        {
+            "description": "Code query about implementing a neural network",
+            "query": "How do I implement a convolutional neural network in TensorFlow?"
+        },
+        {
+            "description": "Current events query about economic policy",
+            "query": "What are the recent changes to Federal Reserve interest rates and their economic impact?"
+        },
+        {
+            "description": "Mixed query with academic and code aspects",
+            "query": "How are transformer models being implemented for natural language processing tasks?"
+        }
+    ]
+    
+    results = []
+    
+    for test_case in test_queries:
+        query = test_case["query"]
+        description = test_case["description"]
+        
+        print(f"\n=== Testing: {description} ===")
+        print(f"Query: {query}")
+        
+        # Process the query
+        structured_query = await query_processor.process_query(query)
+        
+        # Get domain classification results
+        domain = structured_query.get('domain', 'general')
+        domain_confidence = structured_query.get('domain_confidence', 0.0)
+        is_academic = structured_query.get('is_academic', False)
+        is_code = structured_query.get('is_code', False)
+        is_current_events = structured_query.get('is_current_events', False)
+        
+        print(f"Domain: {domain} (confidence: {domain_confidence})")
+        print(f"Is academic: {is_academic}")
+        print(f"Is code: {is_code}")
+        print(f"Is current events: {is_current_events}")
+        
+        # Execute search with default search engines based on classification
+        search_results = await search_executor.execute_search(structured_query)
+        
+        # Get the search engines that were selected
+        selected_engines = list(search_results.keys())
+        print(f"Selected search engines: {selected_engines}")
+        
+        # Store the results
+        result = {
+            "query": query,
+            "description": description,
+            "domain": domain,
+            "domain_confidence": domain_confidence,
+            "is_academic": is_academic,
+            "is_code": is_code,
+            "is_current_events": is_current_events,
+            "selected_engines": selected_engines,
+            "num_results_per_engine": {engine: len(results) for engine, results in search_results.items()}
+        }
+        
+        results.append(result)
+    
+    # Save results to a file
+    with open('query_classification_search_results.json', 'w') as f:
+        json.dump(results, indent=2, fp=f)
+    
+    print(f"\nResults saved to query_classification_search_results.json")
+
+
+if __name__ == "__main__":
+    asyncio.run(test_query_classification_search_integration())
--- a/tests/query/test_domain_classification.py
+++ b/tests/query/test_domain_classification.py
@ -0,0 +1,209 @@
+"""
+Test the query domain classification functionality.
+
+This script tests the new LLM-based query domain classification functionality
+to ensure it correctly classifies queries into academic, code, current_events,
+and general categories.
+"""
+
+import os
+import sys
+import json
+import asyncio
+from typing import Dict, Any, List
+
+# Add parent directory to path
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
+
+from query.llm_interface import get_llm_interface
+from query.query_processor import get_query_processor
+
+
+async def test_classify_query_domain():
+    """Test the classify_query_domain function."""
+    llm_interface = get_llm_interface()
+    
+    test_queries = [
+        # Academic queries
+        "What are the technological, economic, and social implications of large language models in today's society?",
+        "What is the current state of research on quantum computing algorithms?",
+        "How has climate change affected biodiversity in marine ecosystems?",
+        
+        # Code queries
+        "How do I implement a transformer model in PyTorch for text classification?",
+        "What's the best way to optimize a recursive function in Python?",
+        "Explain how to use React hooks with TypeScript",
+        
+        # Current events queries
+        "What are the latest developments in the Ukraine conflict?",
+        "How has the Federal Reserve's recent interest rate decision affected the stock market?",
+        "What were the outcomes of the recent climate summit?",
+        
+        # Mixed or general queries
+        "How are LLMs being used to detect and prevent cyber attacks?",
+        "What are the best practices for remote work?",
+        "Compare electric vehicles to traditional gas-powered cars"
+    ]
+    
+    results = []
+    
+    for query in test_queries:
+        print(f"\nClassifying query: {query}")
+        domain_classification = await llm_interface.classify_query_domain(query)
+        
+        print(f"Primary type: {domain_classification.get('primary_type')} (confidence: {domain_classification.get('confidence')})")
+        
+        if domain_classification.get('secondary_types'):
+            for sec_type in domain_classification.get('secondary_types'):
+                print(f"Secondary type: {sec_type['type']} (confidence: {sec_type['confidence']})")
+        
+        print(f"Reasoning: {domain_classification.get('reasoning', 'None provided')}")
+        
+        results.append({
+            'query': query,
+            'classification': domain_classification
+        })
+    
+    # Save results to a file
+    with open('domain_classification_results.json', 'w') as f:
+        json.dump(results, indent=2, fp=f)
+    
+    print(f"\nResults saved to domain_classification_results.json")
+
+
+async def test_query_processor_with_domain_classification():
+    """Test the query processor with the new domain classification."""
+    query_processor = get_query_processor()
+    
+    test_queries = [
+        "What are the technological implications of large language models?",
+        "How do I implement a transformer model in PyTorch?",
+        "What are the latest developments in the Ukraine conflict?",
+        "How are LLMs being used to detect cyber attacks?"
+    ]
+    
+    results = []
+    
+    for query in test_queries:
+        print(f"\nProcessing query: {query}")
+        structured_query = await query_processor.process_query(query)
+        
+        print(f"Domain: {structured_query.get('domain')} (confidence: {structured_query.get('domain_confidence')})")
+        print(f"Is academic: {structured_query.get('is_academic')}")
+        print(f"Is code: {structured_query.get('is_code')}")
+        print(f"Is current events: {structured_query.get('is_current_events')}")
+        
+        if structured_query.get('secondary_domains'):
+            for domain in structured_query.get('secondary_domains'):
+                print(f"Secondary domain: {domain['type']} (confidence: {domain['confidence']})")
+        
+        print(f"Reasoning: {structured_query.get('classification_reasoning', 'None provided')}")
+        
+        results.append({
+            'query': query,
+            'structured_query': {
+                'domain': structured_query.get('domain'),
+                'domain_confidence': structured_query.get('domain_confidence'),
+                'is_academic': structured_query.get('is_academic'),
+                'is_code': structured_query.get('is_code'),
+                'is_current_events': structured_query.get('is_current_events'),
+                'secondary_domains': structured_query.get('secondary_domains'),
+                'classification_reasoning': structured_query.get('classification_reasoning')
+            }
+        })
+    
+    # Save results to a file
+    with open('query_processor_domain_results.json', 'w') as f:
+        json.dump(results, indent=2, fp=f)
+    
+    print(f"\nResults saved to query_processor_domain_results.json")
+
+
+async def compare_with_keyword_classification():
+    """Compare LLM-based classification with keyword-based classification."""
+    query_processor = get_query_processor()
+    
+    # Monkey patch the query processor to use keyword-based classification
+    original_structure_query_with_llm = query_processor._structure_query_with_llm
+    
+    # Test queries that might be challenging for keyword-based approach
+    test_queries = [
+        "How do language models work internally?",  # Could be academic or code
+        "What are the best machine learning models for text generation?",  # "models" could trigger code
+        "How has ChatGPT changed the AI landscape?",  # Recent but academic topic
+        "What techniques help in understanding neural networks?",  # Could be academic or code
+        "How are transformers used in NLP applications?",  # Ambiguous - could mean electrical transformers or ML
+    ]
+    
+    results = []
+    
+    for query in test_queries:
+        print(f"\nProcessing query with both methods: {query}")
+        
+        # First, use LLM-based classification (normal operation)
+        structured_query_llm = await query_processor.process_query(query)
+        
+        # Now, force keyword-based classification by monkey patching
+        query_processor._structure_query_with_llm = query_processor._structure_query
+        structured_query_keyword = await query_processor.process_query(query)
+        
+        # Restore original method
+        query_processor._structure_query_with_llm = original_structure_query_with_llm
+        
+        # Compare results
+        print(f"LLM Classification:")
+        print(f"  Domain: {structured_query_llm.get('domain')}")
+        print(f"  Is academic: {structured_query_llm.get('is_academic')}")
+        print(f"  Is code: {structured_query_llm.get('is_code')}")
+        print(f"  Is current events: {structured_query_llm.get('is_current_events')}")
+        
+        print(f"Keyword Classification:")
+        print(f"  Is academic: {structured_query_keyword.get('is_academic')}")
+        print(f"  Is code: {structured_query_keyword.get('is_code')}")
+        print(f"  Is current events: {structured_query_keyword.get('is_current_events')}")
+        
+        results.append({
+            'query': query,
+            'llm_classification': {
+                'domain': structured_query_llm.get('domain'),
+                'is_academic': structured_query_llm.get('is_academic'),
+                'is_code': structured_query_llm.get('is_code'),
+                'is_current_events': structured_query_llm.get('is_current_events')
+            },
+            'keyword_classification': {
+                'is_academic': structured_query_keyword.get('is_academic'),
+                'is_code': structured_query_keyword.get('is_code'),
+                'is_current_events': structured_query_keyword.get('is_current_events')
+            }
+        })
+    
+    # Save comparison results to a file
+    with open('classification_comparison_results.json', 'w') as f:
+        json.dump(results, indent=2, fp=f)
+    
+    print(f"\nComparison results saved to classification_comparison_results.json")
+
+
+async def main():
+    """Run tests for query domain classification."""
+    # Choose which test to run
+    test_type = 1  # Change to 1, 2, or 3 to run different tests
+    
+    if test_type == 1:
+        print("=== Testing classify_query_domain function ===")
+        await test_classify_query_domain()
+    elif test_type == 2:
+        print("=== Testing query processor with domain classification ===")
+        await test_query_processor_with_domain_classification()
+    elif test_type == 3:
+        print("=== Comparing LLM and keyword classifications ===")
+        await compare_with_keyword_classification()
+    else:
+        print("=== Running all tests ===")
+        await test_classify_query_domain()
+        await test_query_processor_with_domain_classification()
+        await compare_with_keyword_classification()
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/ui/gradio_interface.py
+++ b/ui/gradio_interface.py
@ -656,9 +656,13 @@ class GradioInterface:
            
            # Delete the report file
            file_path = report_to_delete.get('file_path')
+            print(f"Deleting report: report_id={report_id}, file_path={file_path}")
            if file_path and Path(file_path).exists():
+                print(f"File exists: {Path(file_path).exists()}")
                Path(file_path).unlink()
                print(f"Deleted report file: {file_path}")
+            else:
+                print(f"File not found or file_path is missing")
            
            # Remove from metadata
            all_metadata['reports'] = [r for r in all_metadata.get('reports', []) if r.get('id') != report_id]
@ -783,7 +787,7 @@ class GradioInterface:
            # If no reports are selected, just refresh the display
            reports_data = self._get_reports_for_display()
            choices = self._get_report_choices(reports_data)
-            return reports_data, choices, []
+            return reports_data, choices, "No reports selected for deletion."
        
        print(f"Selected choices for deletion: {selected_choices}")
        
@ -791,12 +795,18 @@ class GradioInterface:
        selected_report_ids = []
        for choice in selected_choices:
            try:
+                # Convert to string and handle different input formats
+                choice_str = str(choice).strip().strip('"\'')
+                print(f"Processing choice: '{choice_str}'")
+                
                # Split at the first colon to get the ID
-                if ':' in choice:
-                    report_id = choice.split(':', 1)[0].strip()
+                if ':' in choice_str:
+                    report_id = choice_str.split(':', 1)[0].strip()
                    selected_report_ids.append(report_id)
                else:
-                    print(f"Warning: Invalid choice format: {choice}")
+                    # If no colon, use the entire string as ID
+                    selected_report_ids.append(choice_str)
+                    print(f"Using full string as ID: '{choice_str}'")
            except Exception as e:
                print(f"Error processing choice {choice}: {e}")
        
@ -816,7 +826,8 @@ class GradioInterface:
        # Refresh the table and choices
        reports_data = self._get_reports_for_display()
        choices = self._get_report_choices(reports_data)
-        return reports_data, choices, []
+        status_message = f"Deleted {deleted_count} report(s)."
+        return reports_data, choices, status_message
    
    def _download_selected_reports(self, selected_choices):
        """Prepare selected reports for download
@ -836,12 +847,18 @@ class GradioInterface:
        selected_report_ids = []
        for choice in selected_choices:
            try:
+                # Convert to string and handle different input formats
+                choice_str = str(choice).strip().strip('"\'')
+                print(f"Processing choice: '{choice_str}'")
+                
                # Split at the first colon to get the ID
-                if ':' in choice:
-                    report_id = choice.split(':', 1)[0].strip()
+                if ':' in choice_str:
+                    report_id = choice_str.split(':', 1)[0].strip()
                    selected_report_ids.append(report_id)
                else:
-                    print(f"Warning: Invalid choice format: {choice}")
+                    # If no colon, use the entire string as ID
+                    selected_report_ids.append(choice_str)
+                    print(f"Using full string as ID: '{choice_str}'")
            except Exception as e:
                print(f"Error processing choice {choice}: {e}")
        
@ -855,6 +872,7 @@ class GradioInterface:
            report = next((r for r in all_reports if r.get('id') == report_id), None)
            if report and "file_path" in report:
                file_path = report["file_path"]
+                print(f"Downloading report: report_id={report_id}, file_path={file_path}")
                # Verify the file exists
                if os.path.exists(file_path):
                    files_to_download.append(file_path)
@ -863,7 +881,7 @@ class GradioInterface:
                    print(f"Warning: File does not exist: {file_path}")
            else:
                print(f"Warning: Could not find report with ID {report_id}")
-        
+
        return files_to_download
    
    def _get_report_choices(self, reports_data):
@ -904,6 +922,81 @@ class GradioInterface:
                continue
                
        return choices
+        
+    def _refresh_reports_with_html(self):
+        """Refresh the reports list with updated HTML
+        
+        Returns:
+            tuple: Updated reports data, HTML content, and reset hidden field value
+        """
+        reports_data = self._get_reports_for_display()
+        choices = self._get_report_choices(reports_data)
+        html_content = create_checkbox_html(choices)
+        return reports_data, html_content, "[]"  # Reset the hidden field
+        
+    def _delete_selected_reports_with_html(self, selected_json):
+        """Delete selected reports and return updated HTML
+        
+        Args:
+            selected_json (str): JSON string containing selected report IDs
+            
+        Returns:
+            tuple: Updated reports data, HTML content, reset hidden field value, and status message
+        """
+        try:
+            # Parse JSON with error handling
+            if not selected_json or selected_json == "[]":
+                selected = []
+            else:
+                try:
+                    selected = json.loads(selected_json)
+                    print(f"Parsed JSON selections: {selected}")
+                except Exception as json_err:
+                    print(f"JSON parse error: {json_err}")
+                    # If JSON parsing fails, try to extract values directly
+                    selected = [s.strip(' "') for s in selected_json.strip('[]').split(',')]
+                    print(f"Fallback parsing to: {selected}")
+            
+            # Delete reports
+            updated_table, _, message = self._delete_selected_reports(selected)
+            choices = self._get_report_choices(updated_table)
+            html_content = create_checkbox_html(choices)
+            return updated_table, html_content, "[]", f"{message}"
+        except Exception as e:
+            import traceback
+            traceback.print_exc()
+            return self._get_reports_for_display(), create_checkbox_html([]), "[]", f"Error: {str(e)}"
+    
+    def _download_with_html(self, selected_json):
+        """Prepare selected reports for download with improved JSON parsing
+        
+        Args:
+            selected_json (str): JSON string containing selected report IDs
+            
+        Returns:
+            list: Files prepared for download
+        """
+        try:
+            # Parse JSON with error handling
+            if not selected_json or selected_json == "[]":
+                selected = []
+            else:
+                try:
+                    selected = json.loads(selected_json)
+                    print(f"Parsed JSON selections for download: {selected}")
+                except Exception as json_err:
+                    print(f"JSON parse error: {json_err}")
+                    # If JSON parsing fails, try to extract values directly
+                    selected = [s.strip(' "') for s in selected_json.strip('[]').split(',')]
+                    print(f"Fallback parsing to: {selected}")
+            
+            # Get file paths for download
+            files = self._download_selected_reports(selected)
+            return files
+        except Exception as e:
+            import traceback
+            traceback.print_exc()
+            return []
    
    def _cleanup_old_reports(self, days):
        """Delete reports older than the specified number of days
@ -1189,138 +1282,57 @@ class GradioInterface:
                    gr.Markdown(f"### Detail Levels\n{detail_levels_info}")
                    gr.Markdown(f"### Query Types\n{query_types_info}")
                
-                # Report Management Tab
+                # Report Management Tab - Reimplemented from scratch
                with gr.TabItem("Manage Reports"):
                    with gr.Row():
                        gr.Markdown("## Report Management")
                    
                    with gr.Row():
-                        gr.Markdown("Select reports to download or delete. You can also filter and sort the reports.")
+                        gr.Markdown("Select reports to download or delete. You can filter and sort the reports using the table controls.")
                    
-                    # Create a state to store the current reports
-                    reports_state = gr.State([])
+                    # Get the reports data
+                    reports_data = self._get_reports_for_display()
                    
-                    # Only include one view of the reports with a clean selection interface
+                    # Create a state to store selected report IDs
+                    selected_report_ids = gr.State([])
+                    
+                    # We've removed the DataTable as requested by the user
+                    
+                    # Selection controls
                    with gr.Row():
-                        with gr.Column():
-                            gr.Markdown("### Reports")
-                            # Get the reports data
-                            reports_data = self._get_reports_for_display()
-                            # This hidden table is just used to store the data
-                            reports_table = gr.Dataframe(
-                                headers=["ID", "Query", "Model", "Detail Level", "Created", "Size", "Filename"],
-                                datatype=["str", "str", "str", "str", "str", "str", "str"],
-                                value=reports_data,
-                                visible=False,  # Hide this table
-                                interactive=False
-                            )
-                            
-                            # Get the choices for the checkbox group
-                            initial_choices = self._get_report_choices(reports_data)
-                            print(f"Initial choices generated: {len(initial_choices)}")
-                            if not initial_choices:
-                                initial_choices = ["No reports available"]
-                            
-                            # Use a cleaner component approach with better styling
-                            gr.Markdown("##### Select reports below for download or deletion")
-                            
-                            # Create a completely custom HTML solution for maximum control
-                            # Prepare the HTML for the checkboxes
-                            html_choices = []
-                            for i, choice in enumerate(initial_choices):
-                                html_choices.append(f'<div style="padding: 5px; margin-bottom: 8px;">')
-                                html_choices.append(f'<label style="display: block; width: 100%; cursor: pointer; color: #eee;">')
-                                html_choices.append(f'<input type="checkbox" id="report-{i}" name="report" value="{choice}"> {choice}')
-                                html_choices.append('</label>')
-                                html_choices.append('</div>')
-                                
-                            # Create the HTML string with all the checkbox markup and JavaScript functionality
-                            html_content = f"""
-                            <div style="border: 1px solid #555; border-radius: 5px; margin-bottom: 15px; background-color: #2d2d2d; color: #eee;">
-                                <div style="padding: 10px; border-bottom: 1px solid #555; background-color: #3a3a3a;">
-                                    <label style="display: block; font-weight: bold; cursor: pointer;">
-                                        <input type="checkbox" id="select-all-checkbox" onclick="toggleAllReports()"> Check/Uncheck All
-                                    </label>
-                                </div>
-                                
-                                <div id="reports-container" style="max-height: 500px; overflow-y: auto; padding: 10px;">
-                                    {''.join(html_choices)}
-                                </div>
-                            </div>
-                            
-                            <script>
-                            // Toggle all checkboxes
-                            function toggleAllReports() {{  
-                                const checkAll = document.getElementById('select-all-checkbox');
-                                const checkboxes = document.getElementsByName('report');
-                                for (let i = 0; i < checkboxes.length; i++) {{
-                                    checkboxes[i].checked = checkAll.checked;
-                                }}
-                                updateHiddenField();
-                            }}
-                            
-                            // Get selected values and update the hidden field
-                            function updateHiddenField() {{
-                                const checkboxes = document.getElementsByName('report');
-                                const selected = [];
-                                for (let i = 0; i < checkboxes.length; i++) {{
-                                    if (checkboxes[i].checked) {{
-                                        selected.push(checkboxes[i].value);
-                                    }}
-                                }}
-                                // Find the hidden field and set its value
-                                // This needs to match the ID we give to the gr.CheckboxGroup below
-                                const hiddenField = document.querySelector('#reports-hidden-value textarea');
-                                if (hiddenField) {{
-                                    // Make sure we always have valid JSON, even if empty
-                                    hiddenField.value = JSON.stringify(selected);
-                                    console.log('Updated hidden field with: ' + hiddenField.value);
-                                    // Trigger a change event to notify Gradio
-                                    const event = new Event('input', {{ bubbles: true }});
-                                    hiddenField.dispatchEvent(event);
-                                }}
-                            }}
-                            
-                            // Add event listeners to all checkbox changes
-                            document.addEventListener('DOMContentLoaded', function() {{
-                                const checkboxes = document.getElementsByName('report');
-                                for (let i = 0; i < checkboxes.length; i++) {{
-                                    checkboxes[i].addEventListener('change', updateHiddenField);
-                                }}
-                            }});
-                            </script>
-                            """
-                            
-                            # Create HTML component with our custom checkbox implementation
-                            custom_html = gr.HTML(html_content)
-                            
-                            # Create a hidden Textbox to store the selected values as JSON
-                            reports_checkboxes = gr.Textbox(
-                                value="[]",  # Empty array as initial value
-                                visible=False,  # Hide this
-                                elem_id="reports-hidden-value"
-                            )
-                            
-                            gr.Markdown("*Check the boxes next to the reports you want to manage*")
-                    
-                    # Buttons for report management
-                    with gr.Row():
-                        with gr.Column(scale=1):
-                            refresh_button = gr.Button("Refresh List")
-                        with gr.Column(scale=1):
-                            download_button = gr.Button("Download Selected")
-                        with gr.Column(scale=1):
-                            delete_button = gr.Button("Delete Selected", variant="stop")
                        with gr.Column(scale=2):
-                            cleanup_days = gr.Slider(
-                                minimum=0,
-                                maximum=90,
-                                value=30,
-                                step=1,
-                                label="Delete Reports Older Than (Days)",
-                                info="Set to 0 to disable automatic cleanup"
+                            # Create a checkbox group for selecting reports
+                            report_choices = self._get_report_choices(reports_data)
+                            reports_checkbox_group = gr.CheckboxGroup(
+                                choices=report_choices,
+                                label="Select Reports",
+                                info="Check the reports you want to download or delete",
+                                interactive=True
                            )
-                            cleanup_button = gr.Button("Clean Up Old Reports")
+                        
+                        with gr.Column(scale=1):
+                            # Action buttons
+                            with gr.Row():
+                                refresh_button = gr.Button("Refresh List", size="sm")
+                            
+                            with gr.Row():
+                                select_all_button = gr.Button("Select All", size="sm")
+                                clear_selection_button = gr.Button("Clear Selection", size="sm")
+                            
+                            with gr.Row():
+                                download_button = gr.Button("Download Selected", size="sm")
+                                delete_button = gr.Button("Delete Selected", variant="stop", size="sm")
+                            
+                            with gr.Row():
+                                cleanup_days = gr.Slider(
+                                    minimum=0,
+                                    maximum=90,
+                                    value=30,
+                                    step=1,
+                                    label="Delete Reports Older Than (Days)",
+                                    info="Set to 0 to disable automatic cleanup"
+                                )
+                                cleanup_button = gr.Button("Clean Up Old Reports", size="sm")
                    
                    # File download component
                    with gr.Row():
@ -1373,232 +1385,154 @@ class GradioInterface:
            )
            
            # Report Management Tab Event Handlers
-            def refresh_reports():
+            
+            # Refresh reports list
+            def refresh_reports_list():
+                """Refresh the reports list and update the UI components"""
                reports_data = self._get_reports_for_display()
-                choices = self._get_report_choices(reports_data)
-                return reports_data, choices
+                report_choices = self._get_report_choices(reports_data)
+                return reports_data, report_choices, "Reports list refreshed."
            
            refresh_button.click(
-                fn=refresh_reports,
+                fn=refresh_reports_list,
                inputs=[],
-                outputs=[reports_table, reports_checkboxes]
+                outputs=[reports_checkbox_group, reports_checkbox_group, status_message]
            )
            
-            # Add wrapper to parse JSON and handle download
-            def download_with_logging(selected_json):
-                try:
-                    # Parse the JSON string from the hidden textbox
-                    import json
-                    print(f"Raw selected_json: '{selected_json}'")
-                    
-                    # Make sure we have valid JSON before parsing
-                    if not selected_json or selected_json.strip() == "":
-                        selected = []
-                    else:
-                        # Handle potential edge cases by cleaning up the input
-                        cleaned_json = selected_json.strip()
-                        if not (cleaned_json.startswith('[') and cleaned_json.endswith(']')):
-                            cleaned_json = f"[{cleaned_json}]"
-                        
-                        selected = json.loads(cleaned_json)
-                        
-                    print(f"Download button clicked with selections: {selected}")
-                    files = self._download_selected_reports(selected)
-                    print(f"Files prepared for download: {len(files)}")
-                    return files
-                except Exception as e:
-                    print(f"Error processing selections for download: {e}")
-                    import traceback
-                    traceback.print_exc()
-                    return []
+            # Select all reports
+            def select_all_reports():
+                """Select all reports in the checkbox group"""
+                report_choices = self._get_report_choices(self._get_reports_for_display())
+                return report_choices, "Selected all reports."
+            
+            select_all_button.click(
+                fn=select_all_reports,
+                inputs=[],
+                outputs=[reports_checkbox_group, status_message]
+            )
+            
+            # Clear selection
+            def clear_selection():
+                """Clear the selection in the checkbox group"""
+                return [], "Selection cleared."
+            
+            clear_selection_button.click(
+                fn=clear_selection,
+                inputs=[],
+                outputs=[reports_checkbox_group, status_message]
+            )
+            
+            # Download selected reports
+            def download_selected_reports(selected_choices):
+                """Download selected reports"""
+                if not selected_choices:
+                    return [], "No reports selected for download."
                
-            # Connect download button directly to our handler
+                print(f"Selected choices for download: {selected_choices}")
+                files = self._download_selected_reports(selected_choices)
+                
+                if files:
+                    return files, f"Prepared {len(files)} report(s) for download."
+                else:
+                    return [], "No files found for the selected reports."
+            
            download_button.click(
-                fn=download_with_logging,
-                inputs=reports_checkboxes,  # Now contains JSON string of selections
-                outputs=file_output
+                fn=download_selected_reports,
+                inputs=[reports_checkbox_group],
+                outputs=[file_output, status_message]
            )
            
-            # No need for toggle functionality as it's handled by JavaScript in the HTML component
+            # Delete selected reports
+            def delete_selected_reports(selected_choices):
+                """Delete selected reports and update the UI"""
+                if not selected_choices:
+                    return self._get_reports_for_display(), [], "No reports selected for deletion."
                
-            # Add logging wrapper for delete function
-            def delete_with_logging(selected):
-                print(f"Delete button clicked with selections: {selected}")
-                updated_table, updated_choices, message = self._delete_selected_reports(selected)
-                print(f"After deletion: {len(updated_table)} reports, {len(updated_choices)} choices")
-                return updated_table, updated_choices, message
+                print(f"Selected choices for deletion: {selected_choices}")
                
-            # Update delete handler to parse JSON with improved error handling
-            def delete_with_reset(selected_json):
-                try:
-                    # Parse the JSON string from the hidden textbox
-                    import json
-                    print(f"Raw selected_json for delete: '{selected_json}'")
-                    
-                    # Make sure we have valid JSON before parsing
-                    if not selected_json or selected_json.strip() == "":
-                        selected = []
-                    else:
-                        # Handle potential edge cases by cleaning up the input
-                        cleaned_json = selected_json.strip()
-                        if not (cleaned_json.startswith('[') and cleaned_json.endswith(']')):
-                            cleaned_json = f"[{cleaned_json}]"
-                        
-                        selected = json.loads(cleaned_json)
-                    
-                    print(f"Delete button clicked with selections: {selected}")
-                    updated_table, updated_choices, message = self._delete_selected_reports(selected)
-                    print(f"After deletion: {len(updated_table)} reports, {len(updated_choices)} choices")
-                    
-                    # Generate new HTML after deletion
-                    html_choices = []
-                    for i, choice in enumerate(updated_choices):
-                        html_choices.append(f'<div style="padding: 5px; margin-bottom: 8px;">')
-                        html_choices.append(f'<label style="display: block; width: 100%; cursor: pointer;">')
-                        html_choices.append(f'<input type="checkbox" id="report-{i}" name="report" value="{choice}"> {choice}')
-                        html_choices.append('</label>')
-                        html_choices.append('</div>')
-                    
-                    html_content = f"""
-                    <div style="border: 1px solid #ddd; border-radius: 5px; margin-bottom: 15px;">
-                        <div style="padding: 10px; border-bottom: 1px solid #eee; background-color: #f8f8f8;">
-                            <label style="display: block; font-weight: bold; cursor: pointer;">
-                                <input type="checkbox" id="select-all-checkbox" onclick="toggleAllReports()"> Check/Uncheck All
-                            </label>
-                        </div>
-                        
-                        <div id="reports-container" style="max-height: 500px; overflow-y: auto; padding: 10px;">
-                            {''.join(html_choices)}
-                        </div>
-                    </div>
-                    
-                    <script>
-                    // Toggle all checkboxes
-                    function toggleAllReports() {{  
-                        const checkAll = document.getElementById('select-all-checkbox');
-                        const checkboxes = document.getElementsByName('report');
-                        for (let i = 0; i < checkboxes.length; i++) {{
-                            checkboxes[i].checked = checkAll.checked;
-                        }}
-                        updateHiddenField();
-                    }}
-                    
-                    // Get selected values and update the hidden field
-                    function updateHiddenField() {{
-                        const checkboxes = document.getElementsByName('report');
-                        const selected = [];
-                        for (let i = 0; i < checkboxes.length; i++) {{
-                            if (checkboxes[i].checked) {{
-                                selected.push(checkboxes[i].value);
-                            }}
-                        }}
-                        // Find the hidden field and set its value
-                        const hiddenField = document.querySelector('#reports-hidden-value textarea');
-                        if (hiddenField) {{
-                            hiddenField.value = JSON.stringify(selected);
-                            // Trigger a change event to notify Gradio
-                            const event = new Event('input', {{ bubbles: true }});
-                            hiddenField.dispatchEvent(event);
-                        }}
-                    }}
-                    
-                    // Add event listeners to all checkbox changes
-                    document.addEventListener('DOMContentLoaded', function() {{
-                        const checkboxes = document.getElementsByName('report');
-                        for (let i = 0; i < checkboxes.length; i++) {{
-                            checkboxes[i].addEventListener('change', updateHiddenField);
-                        }}
-                    }});
-                    </script>
-                    """
-                    
-                    # Reset hidden field
-                    return updated_table, html_content, "[]", message
-                except Exception as e:
-                    print(f"Error processing selections: {e}")
-                    return reports_table, custom_html.value, "[]", f"Error: {str(e)}"
+                # Extract report IDs from selected choices
+                selected_report_ids = []
+                for choice in selected_choices:
+                    try:
+                        # Split at the first colon to get the ID
+                        if ':' in choice:
+                            report_id = choice.split(':', 1)[0].strip()
+                            selected_report_ids.append(report_id)
+                        else:
+                            # If no colon, use the entire string as ID
+                            selected_report_ids.append(choice)
+                    except Exception as e:
+                        print(f"Error processing choice {choice}: {e}")
                
+                # Delete selected reports
+                deleted_count = 0
+                for report_id in selected_report_ids:
+                    if self.delete_report(report_id):
+                        deleted_count += 1
+                
+                # Refresh the table and choices
+                updated_reports_data = self._get_reports_for_display()
+                updated_choices = self._get_report_choices(updated_reports_data)
+
+                return updated_choices, f"Deleted {deleted_count} report(s)."
+
            delete_button.click(
-                fn=delete_with_reset,
-                inputs=reports_checkboxes,
-                outputs=[reports_table, custom_html, reports_checkboxes, status_message]
-            ).then(
-                fn=lambda msg: f"{msg} Selected reports deleted successfully.",
-                inputs=[status_message],
-                outputs=[status_message]
+                fn=delete_selected_reports,
+                inputs=[reports_checkbox_group],
+                outputs=[reports_checkbox_group, status_message]
            )
            
-            def cleanup_with_refresh(days):
-                updated_table = self._cleanup_old_reports(days)
-                choices = self._get_report_choices(updated_table)
-                message = f"Reports older than {days} days have been deleted."
-                print(message)
-                return updated_table, choices, message
-            
-            # Note: We need to make sure this runs properly and updates both the table and checkboxes
-            # The built-in Gradio progress tracking (gr.Progress) is used instead
-            # This is passed to the generate_report method and handles progress updates
+            # Clean up old reports
+            def cleanup_old_reports(days):
+                """Delete reports older than the specified number of days"""
+                if days <= 0:
+                    return self._get_reports_for_display(), self._get_report_choices(self._get_reports_for_display()), "Cleanup skipped - days parameter is 0 or negative."
+                
+                updated_reports_data = self._cleanup_old_reports(days)
+                updated_choices = self._get_report_choices(updated_reports_data)
+                
+                return updated_reports_data, updated_choices, f"Reports older than {days} days have been deleted."
            
            cleanup_button.click(
-                fn=cleanup_with_refresh,
-                inputs=cleanup_days,
-                outputs=[reports_table, reports_checkboxes, status_message]
-            ).then(
-                # Add a then function to ensure the UI updates properly
-                fn=lambda: "Report list has been refreshed.",
-                inputs=[],
-                outputs=[status_message]
+                fn=cleanup_old_reports,
+                inputs=[cleanup_days],
+                outputs=[reports_checkbox_group, status_message]
            )
            
            # Migration button event handler
-            def migrate_and_refresh():
+            def migrate_existing_reports():
+                """Migrate existing reports from the root directory to the reports directory structure"""
                print("Starting migration of existing reports...")
                status = self.migrate_existing_reports()
                print("Migration completed, refreshing display...")
-                reports_data = self._get_reports_for_display()
-                print(f"Got {len(reports_data)} reports for display")
-                choices = self._get_report_choices(reports_data)
-                print(f"Generated {len(choices)} choices for selection")
-                return status, reports_data, choices
+                
+                # Refresh the reports list
+                updated_reports_data = self._get_reports_for_display()
+                updated_choices = self._get_report_choices(updated_reports_data)
+                
+                return status, updated_reports_data, updated_choices
            
            migrate_button.click(
-                fn=migrate_and_refresh,
+                fn=migrate_existing_reports,
                inputs=[],
-                outputs=[status_message, reports_table, reports_checkboxes]
-            ).then(
-                # Add a confirmation message after migration completes
-                fn=lambda msg: f"{msg} Report list has been refreshed.",
-                inputs=[status_message],
-                outputs=[status_message]
+                outputs=[status_message, reports_checkbox_group]
            )
            
-            # Initialize the checkboxes when the table is first loaded
-            # reports_table.change(
-            #     fn=lambda table: self._get_report_choices(table),
-            #     inputs=reports_table,
-            #     outputs=reports_checkboxes
-            # )
-            
-            # Initialize both the table and checkboxes on page load
+            # Initialize the UI on page load
            def init_reports_ui():
+                """Initialize the reports UI with current data"""
                print("Initializing reports UI...")
                reports_data = self._get_reports_for_display()
                choices = self._get_report_choices(reports_data)
                
-                # Log the actual choices for debugging
                print(f"Initializing reports UI with {len(reports_data)} reports and {len(choices)} choices")
-                for i, choice in enumerate(choices[:5]):
-                    print(f"Sample choice {i}: {choice}")
-                if len(choices) > 5:
-                    print(f"...and {len(choices) - 5} more choices")
-                    
-                status = "Reports management initialized successfully."
-                return reports_data, choices, status
                
+                return choices, "Reports management initialized successfully."
+            
            interface.load(
                fn=init_reports_ui,
                inputs=[],
-                outputs=[reports_table, reports_checkboxes, status_message]
+                outputs=[reports_checkbox_group, status_message]
            )
            
        return interface