5.1 KiB

Raw Blame History

LLM-Based Query Classification

Overview

This document describes the implementation of LLM-based query domain classification in the sim-search project, replacing the previous keyword-based approach.

Motivation

The previous keyword-based classification had several limitations:

Relied on static lists of keywords that needed constant updating
Could not capture the semantic meaning of queries
Generated false classifications for ambiguous or novel queries
Required significant maintenance to keep keyword lists updated

Implementation

New Components

LLM Interface Extension:
- Added classify_query_domain() method to LLMInterface class
- Added _classify_query_domain_impl() private implementation method
- Configured to use the fast Llama-3.1-8b-instant model by default
Query Processor Updates:
- Added _structure_query_with_llm() method that uses the LLM classification results
- Updated process_query() to use both query type and domain classification
- Retained keyword-based method as a fallback in case of LLM API failures
Structured Query Enhancements:
- Added new fields to the structured query:
  - domain: Primary domain type (academic, code, current_events, general)
  - domain_confidence: Confidence score for the primary domain
  - secondary_domains: Array of secondary domains with confidence scores
  - classification_reasoning: Explanation of the classification
Configuration Updates:
- Added classify_query_domain to the module-specific model assignments
- Using the same Llama-3.1-8b-instant model for domain classification as for other query processing tasks
Logging and Monitoring:
- Added detailed logging of domain classification results
- Log secondary domains with confidence scores
- Log the reasoning behind classifications
Error Handling:
- Added fallback to keyword-based classification if LLM-based classification fails
- Implemented robust JSON parsing with fallbacks to default values
- Added explicit error messages for troubleshooting

Classification Process

The query domain classification process works as follows:

The query is sent to the LLM with a prompt specifying the four domain types
The LLM returns a JSON response containing:
- Primary domain type with confidence score
- Array of secondary domain types with confidence scores
- Reasoning for the classification
The response is parsed and integrated into the structured query
The is_academic, is_code, and is_current_events flags are set based on:
- Primary domain matching the type
- Any secondary domain matching the type with confidence above 0.3
The structured query is then used by downstream components like the search executor

Benefits

The new approach offers several advantages:

Semantic Understanding: Captures the meaning and intent of queries rather than just keyword matching
Multi-Domain Recognition: Recognizes when queries span multiple domains with confidence scores
Self-Explaining: Provides reasoning for classifications, aiding debugging and transparency
Adaptability: Automatically adapts to new topics and terminology without code changes
Confidence Scoring: Indicates how confident the system is in its classification

Testing and Validation

A comprehensive test script (test_domain_classification.py) was created to:

Test the raw domain classification function with a variety of queries
Test the query processor's integration with domain classification
Compare the LLM-based approach with the previous keyword-based approach

Examples

Academic Query Example

Query: "What are the technological, economic, and social implications of large language models in today's society?"

LLM Classification:

{
  "primary_type": "academic",
  "confidence": 0.9,
  "secondary_types": [
    {"type": "general", "confidence": 0.4}
  ],
  "reasoning": "This query is asking about implications of LLMs across multiple domains (technological, economic, and social) which is a scholarly research topic that would be well-addressed by academic sources."
}

Code Query Example

Query: "How do I implement a transformer model in PyTorch for text classification?"