papers/README.md

# Papers

A Go CLI tool for fetching, processing, and analyzing academic papers from arXiv using LLM-based evaluation.

## Features

- Fetch papers from arXiv API based on date range and search query
- Process papers using configurable LLM models (default: phi-4)
- Generate both JSON and Markdown outputs
- Customizable evaluation criteria
- Rate-limited API requests (2-second delay between requests)

## Installation

```bash
go install gitea.r8z.us/stwhite/papers@latest
```

## Usage

Basic usage:
```bash
papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key"
```

With custom model and output paths:
```bash
papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key" \
  -model "gpt-4" -json-output "results.json" -md-output "summary.md"
```

Fetch papers without processing:
```bash
papers -search-only -start 20240101 -end 20240131 -query "machine learning"
```

Use input file:
```bash
papers -input papers.json -api-key "your-key"
```

### Required Flags

- `-start`: Start date (YYYYMMDD format)
- `-end`: End date (YYYYMMDD format)
- `-query`: Search query

### Optional Flags

- `-search-only`: Fetch papers from arXiv and save to JSON file without processing
- `-input`: Input JSON file containing papers (optional)
- `-maxResults`: Maximum number of results to fetch (1-2000, default: 100)
- `-model`: LLM model to use for processing (default: "phi-4")
- `-api-endpoint`: API endpoint URL (default: "http://localhost:1234/v1/chat/completions")
- `-criteria`: Path to evaluation criteria markdown file (default: "criteria.md")
- `-json-output`: Custom JSON output file path (default: YYYYMMDD-YYYYMMDD-query.json)
- `-md-output`: Custom Markdown output file path (default: YYYYMMDD-YYYYMMDD-query.md)

## Pipeline

1. **Fetch**: Retrieves papers from arXiv based on specified date range and query
2. **Save**: Stores raw paper data in JSON format
3. **Process**: Evaluates papers using the specified LLM model according to criteria
4. **Format**: Generates both JSON and Markdown outputs of the processed results

## Output Files

The tool generates two types of output files:

1. **JSON Output**: Contains the raw processing results
   - Default name format: `YYYYMMDD-YYYYMMDD-query.json`
   - Can be customized with `-json-output` flag

2. **Markdown Output**: Human-readable formatted results
   - Default name format: `YYYYMMDD-YYYYMMDD-query.md`
   - Can be customized with `-md-output` flag

## Dependencies

- [arxiva](gitea.r8z.us/stwhite/arxiva): Paper fetching from arXiv
- [paperprocessor](gitea.r8z.us/stwhite/paperprocessor): LLM-based paper processing
- [paperformatter](gitea.r8z.us/stwhite/paperformatter): Output formatting

## Error Handling

The tool includes various error checks:
- Date format validation (YYYYMMDD)
- Required flag validation
- Maximum results range validation (1-2000)
- File system operations verification
- API request error handling

## License

[License information not provided in source]
Updated README.md and papers to add -search-only and -input options 2025-01-29 15:25:20 +00:00			`# Papers`

			`A Go CLI tool for fetching, processing, and analyzing academic papers from arXiv using LLM-based evaluation.`

			`## Features`

			`- Fetch papers from arXiv API based on date range and search query`
			`- Process papers using configurable LLM models (default: phi-4)`
			`- Generate both JSON and Markdown outputs`
			`- Customizable evaluation criteria`
			`- Rate-limited API requests (2-second delay between requests)`

			`## Installation`

			```bash
			`go install gitea.r8z.us/stwhite/papers@latest`
			```

			`## Usage`

			`Basic usage:`
			```bash
			`papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key"`
			```

			`With custom model and output paths:`
			```bash
			`papers -start 20240101 -end 20240131 -query "machine learning" -api-key "your-key" \`
			`-model "gpt-4" -json-output "results.json" -md-output "summary.md"`
			```

			`Fetch papers without processing:`
			```bash
			`papers -search-only -start 20240101 -end 20240131 -query "machine learning"`
			```

			`Use input file:`
			```bash
			`papers -input papers.json -api-key "your-key"`
			```

			`### Required Flags`

			- `-start`: Start date (YYYYMMDD format)
			- `-end`: End date (YYYYMMDD format)
			- `-query`: Search query

			`### Optional Flags`

			- `-search-only`: Fetch papers from arXiv and save to JSON file without processing
			- `-input`: Input JSON file containing papers (optional)
			- `-maxResults`: Maximum number of results to fetch (1-2000, default: 100)
			- `-model`: LLM model to use for processing (default: "phi-4")
			- `-api-endpoint`: API endpoint URL (default: "http://localhost:1234/v1/chat/completions")
			- `-criteria`: Path to evaluation criteria markdown file (default: "criteria.md")
			- `-json-output`: Custom JSON output file path (default: YYYYMMDD-YYYYMMDD-query.json)
			- `-md-output`: Custom Markdown output file path (default: YYYYMMDD-YYYYMMDD-query.md)

			`## Pipeline`

			`1. Fetch: Retrieves papers from arXiv based on specified date range and query`
			`2. Save: Stores raw paper data in JSON format`
			`3. Process: Evaluates papers using the specified LLM model according to criteria`
			`4. Format: Generates both JSON and Markdown outputs of the processed results`

			`## Output Files`

			`The tool generates two types of output files:`

			`1. JSON Output: Contains the raw processing results`
			- Default name format: `YYYYMMDD-YYYYMMDD-query.json`
			- Can be customized with `-json-output` flag

			`2. Markdown Output: Human-readable formatted results`
			- Default name format: `YYYYMMDD-YYYYMMDD-query.md`
			- Can be customized with `-md-output` flag

			`## Dependencies`

			`- [arxiva](gitea.r8z.us/stwhite/arxiva): Paper fetching from arXiv`
			`- [paperprocessor](gitea.r8z.us/stwhite/paperprocessor): LLM-based paper processing`
			`- [paperformatter](gitea.r8z.us/stwhite/paperformatter): Output formatting`

			`## Error Handling`

			`The tool includes various error checks:`
			`- Date format validation (YYYYMMDD)`
			`- Required flag validation`
			`- Maximum results range validation (1-2000)`
			`- File system operations verification`
			`- API request error handling`

			`## License`

			`[License information not provided in source]`