Steve White 4813904fc7 | ||
---|---|---|
.clinerules | ||
.gitignore | ||
README.md | ||
go.mod | ||
go.sum | ||
papers.go |
README.md
Papers
A Go CLI tool for fetching, processing, and analyzing academic papers from arXiv using LLM-based evaluation.
Features
- Fetch papers from arXiv API based on date range and search query
- Process papers using configurable LLM models (default: phi-4)
- Generate both JSON and Markdown outputs
- Customizable evaluation criteria
- Rate-limited API requests (2-second delay between requests)
Installation
go install gitea.r8z.us/stwhite/papers@latest
Usage
Basic usage:
papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key"
With custom model and output paths:
papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key" \
-model "gpt-4" -json-output "results.json" -md-output "summary.md"
Fetch papers without processing:
papers -search-only -start 20240101 -end 20240131 -query "machine learning"
Use input file:
papers -input papers.json -api-key "your-key"
Required Flags
-start
: Start date (YYYYMMDD format)-end
: End date (YYYYMMDD format)-query
: Search query
Optional Flags
-search-only
: Fetch papers from arXiv and save to JSON file without processing-input
: Input JSON file containing papers (optional)-maxResults
: Maximum number of results to fetch (1-2000, default: 100)-model
: LLM model to use for processing (default: "phi-4")-api-endpoint
: API endpoint URL (default: "http://localhost:1234/v1/chat/completions")-criteria
: Path to evaluation criteria markdown file (default: "criteria.md")-json-output
: Custom JSON output file path (default: YYYYMMDD-YYYYMMDD-query.json)-md-output
: Custom Markdown output file path (default: YYYYMMDD-YYYYMMDD-query.md)
Pipeline
- Fetch: Retrieves papers from arXiv based on specified date range and query
- Save: Stores raw paper data in JSON format
- Process: Evaluates papers using the specified LLM model according to criteria
- Format: Generates both JSON and Markdown outputs of the processed results
Output Files
The tool generates two types of output files:
-
JSON Output: Contains the raw processing results
- Default name format:
YYYYMMDD-YYYYMMDD-query.json
- Can be customized with
-json-output
flag
- Default name format:
-
Markdown Output: Human-readable formatted results
- Default name format:
YYYYMMDD-YYYYMMDD-query.md
- Can be customized with
-md-output
flag
- Default name format:
Dependencies
- arxiva: Paper fetching from arXiv
- paperprocessor: LLM-based paper processing
- paperformatter: Output formatting
Error Handling
The tool includes various error checks:
- Date format validation (YYYYMMDD)
- Required flag validation
- Maximum results range validation (1-2000)
- File system operations verification
- API request error handling
License
[License information not provided in source]