ira/docs/llm_query_classification.md

5.1 KiB

LLM-Based Query Classification

Overview

This document describes the implementation of LLM-based query domain classification in the sim-search project, replacing the previous keyword-based approach.

Motivation

The previous keyword-based classification had several limitations:

  • Relied on static lists of keywords that needed constant updating
  • Could not capture the semantic meaning of queries
  • Generated false classifications for ambiguous or novel queries
  • Required significant maintenance to keep keyword lists updated

Implementation

New Components

  1. LLM Interface Extension:

    • Added classify_query_domain() method to LLMInterface class
    • Added _classify_query_domain_impl() private implementation method
    • Configured to use the fast Llama-3.1-8b-instant model by default
  2. Query Processor Updates:

    • Added _structure_query_with_llm() method that uses the LLM classification results
    • Updated process_query() to use both query type and domain classification
    • Retained keyword-based method as a fallback in case of LLM API failures
  3. Structured Query Enhancements:

    • Added new fields to the structured query:
      • domain: Primary domain type (academic, code, current_events, general)
      • domain_confidence: Confidence score for the primary domain
      • secondary_domains: Array of secondary domains with confidence scores
      • classification_reasoning: Explanation of the classification
  4. Configuration Updates:

    • Added classify_query_domain to the module-specific model assignments
    • Using the same Llama-3.1-8b-instant model for domain classification as for other query processing tasks
  5. Logging and Monitoring:

    • Added detailed logging of domain classification results
    • Log secondary domains with confidence scores
    • Log the reasoning behind classifications
  6. Error Handling:

    • Added fallback to keyword-based classification if LLM-based classification fails
    • Implemented robust JSON parsing with fallbacks to default values
    • Added explicit error messages for troubleshooting

Classification Process

The query domain classification process works as follows:

  1. The query is sent to the LLM with a prompt specifying the four domain types
  2. The LLM returns a JSON response containing:
    • Primary domain type with confidence score
    • Array of secondary domain types with confidence scores
    • Reasoning for the classification
  3. The response is parsed and integrated into the structured query
  4. The is_academic, is_code, and is_current_events flags are set based on:
    • Primary domain matching the type
    • Any secondary domain matching the type with confidence above 0.3
  5. The structured query is then used by downstream components like the search executor

Benefits

The new approach offers several advantages:

  1. Semantic Understanding: Captures the meaning and intent of queries rather than just keyword matching
  2. Multi-Domain Recognition: Recognizes when queries span multiple domains with confidence scores
  3. Self-Explaining: Provides reasoning for classifications, aiding debugging and transparency
  4. Adaptability: Automatically adapts to new topics and terminology without code changes
  5. Confidence Scoring: Indicates how confident the system is in its classification

Testing and Validation

A comprehensive test script (test_domain_classification.py) was created to:

  1. Test the raw domain classification function with a variety of queries
  2. Test the query processor's integration with domain classification
  3. Compare the LLM-based approach with the previous keyword-based approach

Examples

Academic Query Example

Query: "What are the technological, economic, and social implications of large language models in today's society?"

LLM Classification:

{
  "primary_type": "academic",
  "confidence": 0.9,
  "secondary_types": [
    {"type": "general", "confidence": 0.4}
  ],
  "reasoning": "This query is asking about implications of LLMs across multiple domains (technological, economic, and social) which is a scholarly research topic that would be well-addressed by academic sources."
}

Code Query Example

Query: "How do I implement a transformer model in PyTorch for text classification?"

LLM Classification:

{
  "primary_type": "code",
  "confidence": 0.95,
  "secondary_types": [
    {"type": "academic", "confidence": 0.4}
  ],
  "reasoning": "This is primarily a programming question about implementing a specific model in PyTorch, which is a coding framework. It has academic aspects since it relates to machine learning models, but the focus is on implementation."
}

Future Improvements

Potential enhancements for the future:

  1. Caching: Add caching for frequently asked or similar queries to reduce API calls
  2. Few-Shot Learning: Add examples in the prompt to improve classification accuracy
  3. Expanded Domains: Consider additional domain categories beyond the current four
  4. UI Integration: Expose classification reasoning in the UI for advanced users
  5. Classification Feedback Loop: Allow users to correct misclassifications to improve the system over time