16 KiB
LLM-Based Query Classification Implementation Plan
Overview
This document outlines a plan to replace the current keyword-based query classification system with an LLM-based approach. The current system uses predefined keyword lists to determine if a query is academic, code-related, or about current events. This approach is limited by the static nature of the keywords and doesn't capture the semantic meaning of queries. Switching to an LLM-based classification will provide more accurate and adaptable query typing.
Current Limitations
-
Keyword Dependency:
- The system relies on static lists of keywords that need constant updating
- Many relevant terms are likely to be missing, especially for emerging topics
- Some words have different meanings in different contexts (e.g., "model" can refer to code or academic concepts)
-
False Classifications:
- Queries about LLMs being incorrectly classified as code-related instead of academic
- General queries potentially being misclassified if they happen to contain certain keywords
- No way to handle queries that span multiple categories
-
Maintenance Burden:
- Need to regularly update keyword lists for each category
- Complex if/then logic to determine query types
- Hard to adapt to new research domains or technologies
Proposed Solution
Replace the keyword-based classification with an LLM-based classification that:
- Uses semantic understanding to determine query intent and domain
- Can classify queries into multiple categories with confidence scores
- Provides reasoning for the classification
- Can adapt to new topics without code changes
Technical Implementation
1. Extend LLM Interface with Domain Classification
Add a new method to the LLMInterface
class in query/llm_interface.py
:
async def classify_query_domain(self, query: str) -> Dict[str, Any]:
"""
Classify a query's domain type (academic, code, current_events, general).
Args:
query: The query to classify
Returns:
Dictionary with query domain type and confidence scores
"""
# Get the model assigned to this function
model_name = self.config.get_module_model('query_processing', 'classify_query_domain')
# Create a new interface with the assigned model if different from current
if model_name != self.model_name:
interface = LLMInterface(model_name)
return await interface._classify_query_domain_impl(query)
return await self._classify_query_domain_impl(query)
async def _classify_query_domain_impl(self, query: str) -> Dict[str, Any]:
"""Implementation of query domain classification."""
messages = [
{"role": "system", "content": """You are an expert query classifier.
Analyze the given query and classify it into the following domain types:
- academic: Related to scholarly research, scientific studies, academic papers, formal theories, university-level research topics, or scholarly fields of study
- code: Related to programming, software development, technical implementation, coding languages, frameworks, or technology implementation questions
- current_events: Related to recent news, ongoing developments, time-sensitive information, current politics, breaking stories, or real-time events
- general: General information seeking that doesn't fit the above categories
You may assign multiple types if the query spans several domains.
Respond with a JSON object containing:
{
"primary_type": "the most appropriate type",
"confidence": 0.X,
"secondary_types": [{"type": "another_applicable_type", "confidence": 0.X}, ...],
"reasoning": "brief explanation of your classification"
}
"""},
{"role": "user", "content": query}
]
# Generate classification
response = await self.generate_completion(messages)
# Parse JSON response
try:
classification = json.loads(response)
return classification
except json.JSONDecodeError:
# Fallback to default classification if parsing fails
print(f"Error parsing domain classification response: {response}")
return {
"primary_type": "general",
"confidence": 0.5,
"secondary_types": [],
"reasoning": "Failed to parse classification response"
}
2. Update QueryProcessor Class
Modify the QueryProcessor
class in query/query_processor.py
to use the new LLM-based classification:
async def process_query(self, query: str) -> Dict[str, Any]:
"""
Process a user query.
Args:
query: The raw user query
Returns:
Dictionary containing the processed query information
"""
logger.info(f"Processing query: {query}")
# Enhance the query
enhanced_query = await self.llm_interface.enhance_query(query)
logger.info(f"Enhanced query: {enhanced_query}")
# Classify the query type (factual, exploratory, comparative)
query_type_classification = await self.llm_interface.classify_query(query)
logger.info(f"Query type classification: {query_type_classification}")
# Classify the query domain (academic, code, current_events, general)
domain_classification = await self.llm_interface.classify_query_domain(query)
logger.info(f"Query domain classification: {domain_classification}")
# Extract entities from the classification
entities = query_type_classification.get('entities', [])
# Structure the query using the new classification approach
structured_query = self._structure_query_with_llm(
query,
enhanced_query,
query_type_classification,
domain_classification
)
# Decompose the query into sub-questions (if complex enough)
structured_query = await self.query_decomposer.decompose_query(query, structured_query)
# Log the number of sub-questions if any
if 'sub_questions' in structured_query and structured_query['sub_questions']:
logger.info(f"Decomposed into {len(structured_query['sub_questions'])} sub-questions")
else:
logger.info("Query was not decomposed into sub-questions")
return structured_query
def _structure_query_with_llm(self, original_query: str, enhanced_query: str,
type_classification: Dict[str, Any],
domain_classification: Dict[str, Any]) -> Dict[str, Any]:
"""
Structure a query using LLM classification results.
Args:
original_query: The original user query
enhanced_query: The enhanced query
type_classification: Classification of query type (factual, exploratory, comparative)
domain_classification: Classification of query domain (academic, code, current_events)
Returns:
Dictionary containing the structured query
"""
# Get primary domain and confidence
primary_domain = domain_classification.get('primary_type', 'general')
primary_confidence = domain_classification.get('confidence', 0.5)
# Get secondary domains
secondary_domains = domain_classification.get('secondary_types', [])
# Determine domain flags
is_academic = primary_domain == 'academic' or any(d['type'] == 'academic' for d in secondary_domains)
is_code = primary_domain == 'code' or any(d['type'] == 'code' for d in secondary_domains)
is_current_events = primary_domain == 'current_events' or any(d['type'] == 'current_events' for d in secondary_domains)
# Higher threshold for secondary domains to avoid false positives
if primary_domain != 'academic' and any(d['type'] == 'academic' and d['confidence'] >= 0.3 for d in secondary_domains):
is_academic = True
if primary_domain != 'code' and any(d['type'] == 'code' and d['confidence'] >= 0.3 for d in secondary_domains):
is_code = True
if primary_domain != 'current_events' and any(d['type'] == 'current_events' and d['confidence'] >= 0.3 for d in secondary_domains):
is_current_events = True
return {
'original_query': original_query,
'enhanced_query': enhanced_query,
'type': type_classification.get('type', 'unknown'),
'intent': type_classification.get('intent', 'research'),
'entities': type_classification.get('entities', []),
'domain': primary_domain,
'domain_confidence': primary_confidence,
'secondary_domains': secondary_domains,
'classification_reasoning': domain_classification.get('reasoning', ''),
'timestamp': None, # Will be filled in by the caller
'is_current_events': is_current_events,
'is_academic': is_academic,
'is_code': is_code,
'metadata': {
'type_classification': type_classification,
'domain_classification': domain_classification
}
}
3. Remove Legacy Keyword-Based Classification Methods
Once the new LLM-based classification is working correctly, remove or deprecate the old keyword-based methods:
_is_current_events_query
_is_academic_query
_is_code_query
And the original _structure_query
method.
4. Update Search Executor Integration
The SearchExecutor
class already looks for the flags in the structured query:
is_academic
is_code
is_current_events
So no changes are needed to the execute_search
method. The improved classification will simply provide more accurate flags.
5. Update Configuration
Add the new classify_query_domain
function to the module model configuration to allow different models to be assigned to this function:
module_models:
query_processing:
enhance_query: llama-3.1-8b-instant # Fast model for query enhancement
classify_query: llama-3.1-8b-instant # Fast model for query type classification
classify_query_domain: llama-3.1-8b-instant # Fast model for domain classification
generate_search_queries: llama-3.1-8b-instant # Fast model for search query generation
6. Testing Plan
-
Unit Tests:
- Create test cases for
classify_query_domain
with various query types - Verify correct classification of academic, code, and current events queries
- Test edge cases and queries that span multiple domains
- Create test cases for
-
Integration Tests:
- Test the full query processing pipeline with the new classification
- Verify that the correct search engines are selected based on the classification
- Compare results with the old keyword-based approach
-
Regression Testing:
- Ensure that all existing functionality works with the new classification
- Verify that no existing test cases fail
7. Logging and Monitoring
Add detailed logging to monitor the performance of the new classification:
logger.info(f"Query domain classification: primary={domain_classification.get('primary_type')} confidence={domain_classification.get('confidence')}")
if domain_classification.get('secondary_types'):
for sec_type in domain_classification.get('secondary_types'):
logger.info(f"Secondary domain: {sec_type['type']} confidence={sec_type['confidence']}")
logger.info(f"Classification reasoning: {domain_classification.get('reasoning', 'None provided')}")
8. Fallback Mechanism
Implement a fallback to the keyword-based approach if the LLM classification fails:
try:
domain_classification = await self.llm_interface.classify_query_domain(query)
structured_query = self._structure_query_with_llm(query, enhanced_query, query_type_classification, domain_classification)
except Exception as e:
logger.error(f"LLM domain classification failed: {e}. Falling back to keyword-based classification.")
# Fallback to keyword-based approach
structured_query = self._structure_query(query, enhanced_query, query_type_classification)
Timeline and Resources
Phase 1: Development (2-3 days)
- Implement the new
classify_query_domain
method inLLMInterface
- Create the new
_structure_query_with_llm
method inQueryProcessor
- Update the
process_query
method to use the new approach - Add configuration for the new function
Phase 2: Testing (1-2 days)
- Create test cases for the new classification
- Test with various query types
- Compare with the old approach
Phase 3: Deployment and Monitoring (1 day)
- Deploy the new version
- Monitor logs for classification issues
- Adjust prompts and thresholds as needed
Phase 4: Cleanup (1 day)
- Remove the old keyword-based methods
- Update documentation
Expected Outcomes
-
Improved Classification Accuracy:
- More accurate identification of academic, code, and current events queries
- Better handling of queries that span multiple domains
- Proper classification of queries about emerging topics (like LLMs)
-
Reduced Maintenance:
- No need to update keyword lists
- Adaptability to new domains without code changes
-
Enhanced User Experience:
- More relevant search results
- Better report generation due to proper query classification
-
System Robustness:
- Graceful handling of edge cases
- Better explanation of classification decisions
- Proper confidence scoring for ambiguous queries
Examples
To illustrate how the new approach would work, here are some examples:
Example 1: Academic Query
Query: "What are the technological, economic, and social implications of large language models in today's society?"
Current Classification: Might be misclassified as code-related due to "models"
LLM Classification:
{
"primary_type": "academic",
"confidence": 0.9,
"secondary_types": [
{"type": "general", "confidence": 0.4}
],
"reasoning": "This query is asking about implications of LLMs across multiple domains (technological, economic, and social) which is a scholarly research topic that would be well-addressed by academic sources."
}
Example 2: Code Query
Query: "How do I implement a transformer model in PyTorch for text classification?"
Current Classification: Might be correctly classified as code-related due to "implement", "model", and "PyTorch"
LLM Classification:
{
"primary_type": "code",
"confidence": 0.95,
"secondary_types": [
{"type": "academic", "confidence": 0.4}
],
"reasoning": "This is primarily a programming question about implementing a specific model in PyTorch, which is a coding framework. It has academic aspects since it relates to machine learning models, but the focus is on implementation."
}
Example 3: Current Events Query
Query: "What are the latest developments in the Ukraine conflict?"
Current Classification: Likely correct if "Ukraine" is in the current events entities list
LLM Classification:
{
"primary_type": "current_events",
"confidence": 0.95,
"secondary_types": [],
"reasoning": "This query is asking about 'latest developments' in an ongoing conflict, which clearly indicates a focus on recent news and time-sensitive information."
}
Example 4: Mixed Query
Query: "How are LLMs being used to detect and prevent cyber attacks?"
Current Classification: Might have mixed signals from both academic and code keywords
LLM Classification:
{
"primary_type": "academic",
"confidence": 0.7,
"secondary_types": [
{"type": "code", "confidence": 0.6},
{"type": "current_events", "confidence": 0.3}
],
"reasoning": "This query relates to research on LLM applications in cybersecurity (academic), has technical implementation aspects (code), and could relate to recent developments in the field (current events). The primary focus appears to be on research and study of this application."
}
Conclusion
Replacing the keyword-based classification with an LLM-based approach will significantly improve the accuracy and adaptability of the query classification system. This will lead to better search results and report generation, particularly for complex or multi-domain queries like those about large language models. The implementation can be completed in 5-7 days and will reduce ongoing maintenance work by eliminating the need to update keyword lists.