From e748c345e295d4c8819084991f95016baff3bbaa Mon Sep 17 00:00:00 2001 From: Steve White Date: Fri, 28 Feb 2025 08:06:55 -0600 Subject: [PATCH] Update interfaces.md with documentation for reranker functionality --- .note/interfaces.md | 111 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/.note/interfaces.md b/.note/interfaces.md index ed0e24b..54cbac2 100644 --- a/.note/interfaces.md +++ b/.note/interfaces.md @@ -514,6 +514,117 @@ rate_limits = handler.get_rate_limit_info() - **Description**: Gets information about the API's rate limits - **Returns**: Dict[str, Any] - Dictionary with rate limit information +## Ranking Module + +### JinaReranker Class + +The `JinaReranker` class provides document reranking functionality using Jina AI's Reranker API. + +#### Initialization +```python +reranker = JinaReranker( + api_key=None, # Optional, will use environment variable if not provided + model="jina-reranker-v2-base-multilingual", # Default model + endpoint="https://api.jina.ai/v1/rerank" # Default endpoint +) +``` +- **Description**: Initializes the JinaReranker with the specified API key, model, and endpoint +- **Parameters**: + - `api_key` (Optional[str]): Jina AI API key (defaults to environment variable) + - `model` (str): The reranker model to use + - `endpoint` (str): The API endpoint +- **Requirements**: JINA_API_KEY environment variable must be set if api_key is not provided +- **Raises**: ValueError if API key is not available + +#### rerank +```python +reranked_docs = reranker.rerank(query, documents, top_n=None) +``` +- **Description**: Reranks a list of documents based on their relevance to the query +- **Parameters**: + - `query` (str): The query string + - `documents` (List[str]): List of document strings to rerank + - `top_n` (Optional[int]): Number of top documents to return (defaults to all) +- **Returns**: List[Dict[str, Any]] - List of reranked documents with scores +- **Example Return Format**: +```json +[ + { + "index": 0, + "score": 0.95, + "document": "Document content here" + }, + { + "index": 3, + "score": 0.82, + "document": "Another document content" + } +] +``` + +#### get_jina_reranker +```python +reranker = get_jina_reranker() +``` +- **Description**: Factory function to get a JinaReranker instance with configuration from the config file +- **Returns**: JinaReranker - Initialized reranker instance +- **Raises**: ValueError if API key is not available + +### Usage Examples + +#### Basic Usage +```python +from ranking.jina_reranker import JinaReranker + +reranker = JinaReranker() +query = "What is quantum computing?" +documents = [ + "Quantum computing is a computation system that uses quantum mechanics.", + "Classical computers use bits while quantum computers use qubits.", + "Artificial intelligence is transforming various industries." +] + +reranked = reranker.rerank(query, documents) +for doc in reranked: + print(f"Score: {doc['score']}, Document: {doc['document']}") +``` + +#### Integration with ResultCollector +```python +from execution.result_collector import ResultCollector +from ranking.jina_reranker import get_jina_reranker + +# Initialize components +reranker = get_jina_reranker() +collector = ResultCollector(reranker=reranker) + +# Process search results with reranking +reranked_results = collector.process_results( + search_results, + dedup=True, + max_results=20, + use_reranker=True +) +``` + +#### Testing +```python +# Simple test script +import json +from ranking.jina_reranker import get_jina_reranker + +reranker = get_jina_reranker() +query = "What is quantum computing?" +documents = [ + "Quantum computing is a type of computation that harnesses quantum mechanics.", + "Classical computers use bits, while quantum computers use qubits.", + "Machine learning is a subset of artificial intelligence." +] + +reranked = reranker.rerank(query, documents) +print(json.dumps(reranked, indent=2)) +``` + ## Search Execution Testing The search execution module has been tested to ensure it correctly executes search queries across multiple search engines and processes the results.