|
||
---|---|---|
.gradio | ||
.note | ||
config | ||
examples | ||
execution | ||
query | ||
ranking | ||
report | ||
scripts | ||
tests | ||
ui | ||
utils | ||
.gitignore | ||
.windsurfrules | ||
README.md | ||
jina-ai-metaprompt.md | ||
report.md | ||
report_115857.md | ||
report_20250228_090933_deepseek-r1-distill-llama-70b-specdec.md | ||
requirements.txt | ||
run_ui.py | ||
test_report.md |
README.md
Intelligent Research System
An end-to-end research automation system that handles the entire process from query to final report, leveraging multiple search sources and semantic similarity to produce comprehensive research results.
Overview
This system automates the research process by:
- Processing and enhancing user queries
- Executing searches across multiple engines (Serper, Google Scholar, arXiv)
- Ranking and filtering results based on relevance
- Generating comprehensive research reports
Features
- Query Processing: Enhances user queries with additional context and classifies them by type and intent
- Multi-Source Search: Executes searches across Serper (Google), Google Scholar, and arXiv
- Intelligent Ranking: Uses Jina AI's Re-Ranker to prioritize the most relevant results
- Result Deduplication: Removes duplicate results across different search engines
- Modular Architecture: Easily extensible with new search engines and LLM providers
Components
- Query Processor: Enhances and classifies user queries
- Search Executor: Executes searches across multiple engines
- Result Collector: Processes and organizes search results
- Document Ranker: Ranks documents by relevance
- Report Generator: Synthesizes information into a coherent report (coming soon)
Getting Started
Prerequisites
- Python 3.8+
- API keys for:
- Serper API (for Google and Scholar search)
- Groq (or other LLM provider)
- Jina AI (for reranking)
Installation
- Clone the repository:
git clone https://github.com/yourusername/sim-search.git
cd sim-search
- Install dependencies:
pip install -r requirements.txt
- Create a configuration file:
cp config/config.yaml.example config/config.yaml
- Edit the configuration file to add your API keys:
api_keys:
serper: "your-serper-api-key"
groq: "your-groq-api-key"
jina: "your-jina-api-key"
Usage
Basic Usage
from query.query_processor import QueryProcessor
from execution.search_executor import SearchExecutor
from execution.result_collector import ResultCollector
# Initialize components
query_processor = QueryProcessor()
search_executor = SearchExecutor()
result_collector = ResultCollector()
# Process a query
processed_query = query_processor.process_query("What are the latest advancements in quantum computing?")
# Execute search
search_results = search_executor.execute_search(processed_query)
# Process results
processed_results = result_collector.process_results(search_results)
# Print top results
for i, result in enumerate(processed_results[:5]):
print(f"{i+1}. {result['title']}")
print(f" URL: {result['url']}")
print(f" Snippet: {result['snippet'][:100]}...")
print()
Testing
Run the test scripts to verify functionality:
# Test search execution
python test_search_execution.py
# Test all search handlers
python test_all_handlers.py
Project Structure
sim-search/
├── config/ # Configuration management
├── query/ # Query processing
├── execution/ # Search execution
│ └── api_handlers/ # Search API handlers
├── ranking/ # Document ranking
├── test_*.py # Test scripts
└── requirements.txt # Dependencies
LLM Providers
The system supports multiple LLM providers through the LiteLLM interface:
- Groq (currently using Llama 3.1-8b-instant)
- OpenAI
- Anthropic
- OpenRouter
- Azure OpenAI
License
This project is licensed under the MIT License - see the LICENSE file for details.