papers/README.md

2.9 KiB

Papers

A Go CLI tool for fetching, processing, and analyzing academic papers from arXiv using LLM-based evaluation.

Features

  • Fetch papers from arXiv API based on date range and search query
  • Process papers using configurable LLM models (default: phi-4)
  • Generate both JSON and Markdown outputs
  • Customizable evaluation criteria
  • Rate-limited API requests (2-second delay between requests)

Installation

go install gitea.r8z.us/stwhite/papers@latest

Usage

Basic usage:

papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key"

With custom model and output paths:

papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key" \
  -model "gpt-4" -json-output "results.json" -md-output "summary.md"

Fetch papers without processing:

papers -search-only -start 20240101 -end 20240131 -query "machine learning"

Use input file:

papers -input papers.json -api-key "your-key"

Required Flags

  • -start: Start date (YYYYMMDD format)
  • -end: End date (YYYYMMDD format)
  • -query: Search query

Optional Flags

  • -search-only: Fetch papers from arXiv and save to JSON file without processing
  • -input: Input JSON file containing papers (optional)
  • -maxResults: Maximum number of results to fetch (1-2000, default: 100)
  • -model: LLM model to use for processing (default: "phi-4")
  • -api-endpoint: API endpoint URL (default: "http://localhost:1234/v1/chat/completions")
  • -criteria: Path to evaluation criteria markdown file (default: "criteria.md")
  • -json-output: Custom JSON output file path (default: YYYYMMDD-YYYYMMDD-query.json)
  • -md-output: Custom Markdown output file path (default: YYYYMMDD-YYYYMMDD-query.md)

Pipeline

  1. Fetch: Retrieves papers from arXiv based on specified date range and query
  2. Save: Stores raw paper data in JSON format
  3. Process: Evaluates papers using the specified LLM model according to criteria
  4. Format: Generates both JSON and Markdown outputs of the processed results

Output Files

The tool generates two types of output files:

  1. JSON Output: Contains the raw processing results

    • Default name format: YYYYMMDD-YYYYMMDD-query.json
    • Can be customized with -json-output flag
  2. Markdown Output: Human-readable formatted results

    • Default name format: YYYYMMDD-YYYYMMDD-query.md
    • Can be customized with -md-output flag

Dependencies

Error Handling

The tool includes various error checks:

  • Date format validation (YYYYMMDD)
  • Required flag validation
  • Maximum results range validation (1-2000)
  • File system operations verification
  • API request error handling

License

[License information not provided in source]