96 lines
2.9 KiB
Markdown
96 lines
2.9 KiB
Markdown
# Papers
|
|
|
|
A Go CLI tool for fetching, processing, and analyzing academic papers from arXiv using LLM-based evaluation.
|
|
|
|
## Features
|
|
|
|
- Fetch papers from arXiv API based on date range and search query
|
|
- Process papers using configurable LLM models (default: phi-4)
|
|
- Generate both JSON and Markdown outputs
|
|
- Customizable evaluation criteria
|
|
- Rate-limited API requests (2-second delay between requests)
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
go install gitea.r8z.us/stwhite/papers@latest
|
|
```
|
|
|
|
## Usage
|
|
|
|
Basic usage:
|
|
```bash
|
|
papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key"
|
|
```
|
|
|
|
With custom model and output paths:
|
|
```bash
|
|
papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key" \
|
|
-model "gpt-4" -json-output "results.json" -md-output "summary.md"
|
|
```
|
|
|
|
Fetch papers without processing:
|
|
```bash
|
|
papers -search-only -start 20240101 -end 20240131 -query "machine learning"
|
|
```
|
|
|
|
Use input file:
|
|
```bash
|
|
papers -input papers.json -api-key "your-key"
|
|
```
|
|
|
|
### Required Flags
|
|
|
|
- `-start`: Start date (YYYYMMDD format)
|
|
- `-end`: End date (YYYYMMDD format)
|
|
- `-query`: Search query
|
|
|
|
### Optional Flags
|
|
|
|
- `-search-only`: Fetch papers from arXiv and save to JSON file without processing
|
|
- `-input`: Input JSON file containing papers (optional)
|
|
- `-maxResults`: Maximum number of results to fetch (1-2000, default: 100)
|
|
- `-model`: LLM model to use for processing (default: "phi-4")
|
|
- `-api-endpoint`: API endpoint URL (default: "http://localhost:1234/v1/chat/completions")
|
|
- `-criteria`: Path to evaluation criteria markdown file (default: "criteria.md")
|
|
- `-json-output`: Custom JSON output file path (default: YYYYMMDD-YYYYMMDD-query.json)
|
|
- `-md-output`: Custom Markdown output file path (default: YYYYMMDD-YYYYMMDD-query.md)
|
|
|
|
## Pipeline
|
|
|
|
1. **Fetch**: Retrieves papers from arXiv based on specified date range and query
|
|
2. **Save**: Stores raw paper data in JSON format
|
|
3. **Process**: Evaluates papers using the specified LLM model according to criteria
|
|
4. **Format**: Generates both JSON and Markdown outputs of the processed results
|
|
|
|
## Output Files
|
|
|
|
The tool generates two types of output files:
|
|
|
|
1. **JSON Output**: Contains the raw processing results
|
|
- Default name format: `YYYYMMDD-YYYYMMDD-query.json`
|
|
- Can be customized with `-json-output` flag
|
|
|
|
2. **Markdown Output**: Human-readable formatted results
|
|
- Default name format: `YYYYMMDD-YYYYMMDD-query.md`
|
|
- Can be customized with `-md-output` flag
|
|
|
|
## Dependencies
|
|
|
|
- [arxiva](gitea.r8z.us/stwhite/arxiva): Paper fetching from arXiv
|
|
- [paperprocessor](gitea.r8z.us/stwhite/paperprocessor): LLM-based paper processing
|
|
- [paperformatter](gitea.r8z.us/stwhite/paperformatter): Output formatting
|
|
|
|
## Error Handling
|
|
|
|
The tool includes various error checks:
|
|
- Date format validation (YYYYMMDD)
|
|
- Required flag validation
|
|
- Maximum results range validation (1-2000)
|
|
- File system operations verification
|
|
- API request error handling
|
|
|
|
## License
|
|
|
|
[License information not provided in source]
|