Intelligent Research Assistant
Go to file
Steve White 79d2d93af9 Implement API and React frontend specifications
This commit adds:
1. Comprehensive FastAPI routes for search, report, and authentication
2. Fixed Pydantic model compatibility issues with model_dump()
3. Added detailed API specification documentation in api_specification.md
4. Added React implementation plan with component designs and architecture
5. Improved test coverage for API endpoints
6. Added progress tracking for report generation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-03-20 21:57:56 -05:00
.gradio Integrate Jina Reranker with ResultCollector for semantic ranking 2025-02-27 16:59:54 -06:00
.note Implement API and React frontend specifications 2025-03-20 21:57:56 -05:00
config massive changes 2025-03-14 16:14:09 -05:00
docs Update project documentation and memory bank entries. Add new integration tests for query classification. 2025-03-20 08:14:53 -05:00
examples massive changes 2025-03-14 16:14:09 -05:00
execution Claude added decomposition; broke report. 2025-03-18 12:20:23 -05:00
query Update project documentation and memory bank entries. Add new integration tests for query classification. 2025-03-20 08:14:53 -05:00
ranking Fix Jina Reranker API integration with proper request and response handling 2025-02-27 17:16:52 -06:00
report Update project documentation and memory bank entries. Add new integration tests for query classification. 2025-03-20 08:14:53 -05:00
scripts Add code search capability with GitHub and StackExchange APIs 2025-03-14 16:12:26 -05:00
sim-search-api Implement API and React frontend specifications 2025-03-20 21:57:56 -05:00
tests Update project documentation and memory bank entries. Add new integration tests for query classification. 2025-03-20 08:14:53 -05:00
ui Update project documentation and memory bank entries. Add new integration tests for query classification. 2025-03-20 08:14:53 -05:00
utils Clean up repository: Remove unused test files and add new test directories 2025-03-11 16:56:58 -05:00
.clinerules massive changes 2025-03-14 16:14:09 -05:00
.gitignore massive changes 2025-03-14 16:14:09 -05:00
.windsurfrules Clean up repository: Remove unused test files and add new test directories 2025-03-11 16:56:58 -05:00
README.md Add code search capability with GitHub and StackExchange APIs 2025-03-14 16:12:26 -05:00
jina-ai-metaprompt.md Initial commit: Intelligent Research System with search execution module 2025-02-27 16:21:54 -06:00
report.md Clean up repository: Remove unused test files and add new test directories 2025-03-11 16:56:58 -05:00
report_115857.md Add support for custom models and thinking tag processing 2025-02-28 09:19:27 -06:00
report_20250228_090933_deepseek-r1-distill-llama-70b-specdec.md Add support for custom models and thinking tag processing 2025-02-28 09:19:27 -06:00
requirements.txt massive changes 2025-03-14 16:14:09 -05:00
run_ui.py Add progress tracking to report generation UI 2025-03-12 11:20:40 -05:00
test_openrouter.py Claude added decomposition; broke report. 2025-03-18 12:20:23 -05:00
test_openrouter_config.py Claude added decomposition; broke report. 2025-03-18 12:20:23 -05:00
test_report.md Clean up repository: Remove unused test files and add new test directories 2025-03-11 16:56:58 -05:00
update_max_tokens.py Claude added decomposition; broke report. 2025-03-18 12:20:23 -05:00

README.md

Intelligent Research System

An end-to-end research automation system that handles the entire process from query to final report, leveraging multiple search sources and semantic similarity to produce comprehensive research results.

Overview

This system automates the research process by:

  1. Processing and enhancing user queries
  2. Executing searches across multiple engines (Serper, Google Scholar, arXiv)
  3. Ranking and filtering results based on relevance
  4. Generating comprehensive research reports

Features

  • Query Processing: Enhances user queries with additional context and classifies them by type and intent
  • Multi-Source Search: Executes searches across general web (Serper/Google), academic sources, and current news
  • Specialized Search Handlers:
    • Current Events: Optimized news search for recent developments
    • Academic Research: Specialized academic search with OpenAlex, CORE, arXiv, and Google Scholar
    • Open Access Detection: Finds freely available versions of paywalled papers using Unpaywall
    • Code/Programming: Specialized code search using GitHub and StackExchange
  • Intelligent Ranking: Uses Jina AI's Re-Ranker to prioritize the most relevant results
  • Result Deduplication: Removes duplicate results across different search engines
  • Modular Architecture: Easily extensible with new search engines and LLM providers

Components

  • Query Processor: Enhances and classifies user queries
  • Search Executor: Executes searches across multiple engines
  • Result Collector: Processes and organizes search results
  • Document Ranker: Ranks documents by relevance
  • Report Generator: Synthesizes information into coherent reports with specialized templates for different query types

Getting Started

Prerequisites

  • Python 3.8+
  • API keys for:
    • Serper API (for Google and Scholar search)
    • NewsAPI (for current events search)
    • CORE API (for open access academic search)
    • GitHub API (for code search)
    • StackExchange API (for programming Q&A content)
    • Groq (or other LLM provider)
    • Jina AI (for reranking)
    • Email for OpenAlex and Unpaywall (recommended but not required)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/sim-search.git
cd sim-search
  1. Install dependencies:
pip install -r requirements.txt
  1. Create a configuration file:
cp config/config.yaml.example config/config.yaml
  1. Edit the configuration file to add your API keys:
api_keys:
  serper: "your-serper-api-key"
  newsapi: "your-newsapi-key"
  groq: "your-groq-api-key"
  jina: "your-jina-api-key"
  github: "your-github-api-key"
  stackexchange: "your-stackexchange-api-key"

Usage

Basic Usage

from query.query_processor import QueryProcessor
from execution.search_executor import SearchExecutor
from execution.result_collector import ResultCollector

# Initialize components
query_processor = QueryProcessor()
search_executor = SearchExecutor()
result_collector = ResultCollector()

# Process a query
processed_query = query_processor.process_query("What are the latest advancements in quantum computing?")

# Execute search
search_results = search_executor.execute_search(processed_query)

# Process results
processed_results = result_collector.process_results(search_results)

# Print top results
for i, result in enumerate(processed_results[:5]):
    print(f"{i+1}. {result['title']}")
    print(f"   URL: {result['url']}")
    print(f"   Snippet: {result['snippet'][:100]}...")
    print()

Testing

Run the test scripts to verify functionality:

# Test search execution
python test_search_execution.py

# Test all search handlers
python test_all_handlers.py

Project Structure

sim-search/
├── config/                 # Configuration management
├── query/                  # Query processing
├── execution/              # Search execution
│   └── api_handlers/       # Search API handlers
├── ranking/                # Document ranking
├── test_*.py               # Test scripts
└── requirements.txt        # Dependencies

LLM Providers

The system supports multiple LLM providers through the LiteLLM interface:

  • Groq (currently using Llama 3.1-8b-instant)
  • OpenAI
  • Anthropic
  • OpenRouter
  • Azure OpenAI

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Jina AI for their embedding and reranking APIs
  • Serper for their Google search API
  • NewsAPI for their news search API
  • OpenAlex for their academic search API
  • CORE for their open access academic search API
  • Unpaywall for their open access discovery API
  • Groq for their fast LLM inference
  • GitHub for their code search API
  • StackExchange for their programming Q&A API