# Intelligent Research System An end-to-end research automation system that handles the entire process from query to final report, leveraging multiple search sources and semantic similarity to produce comprehensive research results. ## Overview This system automates the research process by: 1. Processing and enhancing user queries 2. Executing searches across multiple engines (Serper, Google Scholar, arXiv) 3. Ranking and filtering results based on relevance 4. Generating comprehensive research reports ## Features - **Query Processing**: Enhances user queries with additional context and classifies them by type and intent - **Multi-Source Search**: Executes searches across general web (Serper/Google), academic sources, and current news - **Specialized Search Handlers**: - **Current Events**: Optimized news search for recent developments - **Academic Research**: Specialized academic search with OpenAlex, CORE, arXiv, and Google Scholar - **Open Access Detection**: Finds freely available versions of paywalled papers using Unpaywall - **Code/Programming**: Specialized code search using GitHub and StackExchange - **Intelligent Ranking**: Uses Jina AI's Re-Ranker to prioritize the most relevant results - **Result Deduplication**: Removes duplicate results across different search engines - **Modular Architecture**: Easily extensible with new search engines and LLM providers ## Components - **Query Processor**: Enhances and classifies user queries - **Search Executor**: Executes searches across multiple engines - **Result Collector**: Processes and organizes search results - **Document Ranker**: Ranks documents by relevance - **Report Generator**: Synthesizes information into coherent reports with specialized templates for different query types ## Getting Started ### Prerequisites - Python 3.8+ - API keys for: - Serper API (for Google and Scholar search) - NewsAPI (for current events search) - CORE API (for open access academic search) - GitHub API (for code search) - StackExchange API (for programming Q&A content) - Groq (or other LLM provider) - Jina AI (for reranking) - Email for OpenAlex and Unpaywall (recommended but not required) ### Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/sim-search.git cd sim-search ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Create a configuration file: ```bash cp config/config.yaml.example config/config.yaml ``` 4. Edit the configuration file to add your API keys: ```yaml api_keys: serper: "your-serper-api-key" newsapi: "your-newsapi-key" groq: "your-groq-api-key" jina: "your-jina-api-key" github: "your-github-api-key" stackexchange: "your-stackexchange-api-key" ``` ### Usage #### Basic Usage ```python from query.query_processor import QueryProcessor from execution.search_executor import SearchExecutor from execution.result_collector import ResultCollector # Initialize components query_processor = QueryProcessor() search_executor = SearchExecutor() result_collector = ResultCollector() # Process a query processed_query = query_processor.process_query("What are the latest advancements in quantum computing?") # Execute search search_results = search_executor.execute_search(processed_query) # Process results processed_results = result_collector.process_results(search_results) # Print top results for i, result in enumerate(processed_results[:5]): print(f"{i+1}. {result['title']}") print(f" URL: {result['url']}") print(f" Snippet: {result['snippet'][:100]}...") print() ``` #### Testing Run the test scripts to verify functionality: ```bash # Test search execution python test_search_execution.py # Test all search handlers python test_all_handlers.py ``` ## Project Structure ``` sim-search/ ├── config/ # Configuration management ├── query/ # Query processing ├── execution/ # Search execution │ └── api_handlers/ # Search API handlers ├── ranking/ # Document ranking ├── test_*.py # Test scripts └── requirements.txt # Dependencies ``` ## LLM Providers The system supports multiple LLM providers through the LiteLLM interface: - Groq (currently using Llama 3.1-8b-instant) - OpenAI - Anthropic - OpenRouter - Azure OpenAI ## License This project is licensed under the MIT License - see the LICENSE file for details. ## Acknowledgments - [Jina AI](https://jina.ai/) for their embedding and reranking APIs - [Serper](https://serper.dev/) for their Google search API - [NewsAPI](https://newsapi.org/) for their news search API - [OpenAlex](https://openalex.org/) for their academic search API - [CORE](https://core.ac.uk/) for their open access academic search API - [Unpaywall](https://unpaywall.org/) for their open access discovery API - [Groq](https://groq.com/) for their fast LLM inference - [GitHub](https://github.com/) for their code search API - [StackExchange](https://stackexchange.com/) for their programming Q&A API