ira/decision_log_addition.md at 79d2d93af9a89bdc27839189f4f4d27f1c088ca3

2.6 KiB

Raw Blame History

2025-03-18: LLM-Based Query Classification Implementation

Context

The project was using a keyword-based approach to classify queries into different domains (academic, code, current events). This approach had several limitations:

Reliance on static keyword lists that needed constant maintenance
Inability to understand the semantic meaning of queries
False classifications for ambiguous queries or those containing keywords with multiple meanings
Difficulty handling emerging topics without updating keyword lists

Decision

Replace the keyword-based query classification with an LLM-based approach:
- Implement a new classify_query_domain method in the LLMInterface class
- Create a new query structuring method that uses the LLM classification results
- Retain the keyword-based method as a fallback
- Add confidence scores and reasoning to the classification results
Enhance the structured query format:
- Add primary domain and confidence
- Include secondary domains with confidence scores
- Add classification reasoning
- Maintain backward compatibility with existing search executor
Use a 0.3 confidence threshold for secondary domains:
- Set domain flags (is_academic, is_code, is_current_events) based on primary domain
- Also set flags for secondary domains with confidence scores above 0.3

Rationale

LLM-based approach provides better semantic understanding of queries
Multi-domain classification with confidence scores handles complex queries better
Self-explaining classifications with reasoning aids debugging and transparency
The approach automatically adapts to new topics without code changes
Retaining keyword-based fallback ensures system resilience

Alternatives Considered

Expanding the keyword lists:
- Would still lack semantic understanding
- Increasing maintenance burden
- False positives would still occur
Using embedding similarity to predefined domain descriptions:
- Potentially more computationally expensive
- Less explainable than the LLM's reasoning
- Would require managing embedding models
Creating a custom classifier:
- Would require labeled training data
- More development effort
- Less flexible than the LLM approach

Impact

More accurate query classification, especially for ambiguous or multi-domain queries
Reduction in maintenance overhead for keyword lists
Better search engine selection based on query domains
Improved report generation due to more accurate query understanding
Enhanced debugging capabilities with classification reasoning

2.6 KiB Raw Blame History

2025-03-18: LLM-Based Query Classification Implementation

Context

Decision

Rationale

Alternatives Considered

Impact

2.6 KiB

Raw Blame History