clineruels and go.sum got left out.

feat: add CORS support for web clients
- Add chi/cors middleware with development configuration - Update API documentation with CORS details - Document allowed origins, methods, and headers - Add production security note about restricting origins - Update .gitignore to track API documentation
2025-01-29 17:00:59 -06:00 · 2025-01-29 17:00:15 -06:00
6 changed files with 240 additions and 47 deletions
--- a/.clinerules
+++ b/.clinerules
@ -1,47 +1 @@
-## Here are the api signatures for arxiva
-### FetchPapers(startDate, endDate, query string, maxResults int) ([]Paper, error)
-startDate: Start date in format "YYYYMMDD"
-endDate: End date in format "YYYYMMDD"
-query: Search query
-maxResults: Maximum number of results (1-2000)
-Fetches papers from arXiv API
-
-### SaveToFile(papers []Paper, startDate, endDate, query string) error
-papers: Array of Paper structs
-startDate: Start date in format "YYYYMMDD"
-endDate: End date in format "YYYYMMDD"
-query: Search query
-Saves papers to a JSON file
-
-JSON file is named "YYYMMDD-YYYYMMDD-query.json" (where YYYYMMDD is start date and YYYYMMDD is end date and query is search query)
-
-## here is the API signature for paperprocessor:
-
-### ProcessFile
-`func ProcessFile(inputPath, outputPath, criteriaPath string, config Config, debug bool) error`
-
-Processes papers from input JSON file and writes results to output JSON file
-
-Parameters:
- inputPath: Path to input JSON file containing papers array
- outputPath: Path to write processing results JSON
- criteriaPath: Path to text file with evaluation criteria
- config: Configuration settings for API and processing
- debug: Enable debug logging when true
-
-Returns:
- error: Processing error or nil if successful
-
-You create config like this:
-    config := paperprocessor.Config{
-        APIEndpoint:  "http://localhost:1234/v1/chat/completions",
-        APIKey:       apiKey,
-        Model:       "qwen2-7b-instruct",
-        RequestDelay: 2 * time.Second,  // 2 second delay between requests
-
-
-## Here is the usage for paperformatter:
-err := paperformatter.FormatPapers("input.json", "output.md")
-if err != nil {
-    log.Fatal(err)
-}
+After all major changes, update git with an informative commit message.
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,6 @@
+# Markdown files except documentation
 *.md
+!README.md
+!API.md
 *.json
 papers
--- a/API.md
+++ b/API.md
@ -0,0 +1,220 @@
+# Papers API Reference
+
+This document describes the HTTP API endpoints available when running Papers in server mode.
+
+## Running the Server
+
+Start the server using:
+```bash
+papers -serve -port 8080
+```
+
+The server will listen on the specified port (default: 8080).
+
+## Important Notes
+
+### CORS
+CORS is enabled on the server with the following configuration:
+- Allowed Origins: All origins (`*`) in development
+- Allowed Methods: `GET`, `POST`, `OPTIONS`
+- Allowed Headers: `Accept`, `Authorization`, `Content-Type`
+- Credentials: Not allowed
+- Max Age: 300 seconds
+
+Note: In production, you should restrict allowed origins to your specific domain(s).
+
+### Authentication
+No authentication is required beyond the API key for LLM processing. The API key should be included in the request body for processing endpoints.
+
+### Timing Considerations
+- Initial paper search: typically < 5 seconds
+- Processing time: up to 30 minutes for large batches
+- Job status polling: recommended interval is 15 seconds
+- LLM rate limiting: 2-second delay between requests
+
+## Endpoints
+
+### Search Papers
+`POST /api/papers/search`
+
+Search for papers on arXiv based on date range and query.
+
+**Request Body:**
+```json
+{
+  "start_date": "20240101",    // Required: Start date in YYYYMMDD format
+  "end_date": "20240131",      // Required: End date in YYYYMMDD format
+  "query": "machine learning", // Required: Search query
+  "max_results": 5            // Optional: Maximum number of results (1-2000, default: 100)
+}
+```
+
+**Response:**
+```json
+[
+  {
+    "title": "Paper Title",
+    "abstract": "Paper Abstract",
+    "arxiv_id": "2401.12345"
+  }
+]
+```
+
+### Process Papers
+`POST /api/papers/process`
+
+Process papers using the specified LLM model. Papers can be provided either directly in the request or by referencing a JSON file.
+
+**Request Body:**
+```json
+{
+  // Option 1: Direct paper data
+  "papers": [                      // Optional: Array of papers
+    {
+      "title": "Paper Title",
+      "abstract": "Paper Abstract",
+      "arxiv_id": "2401.12345"
+    }
+  ],
+  
+  // Option 2: File reference
+  "input_file": "papers.json",     // Optional: Path to input JSON file
+  
+  // Criteria (one of these is required)
+  "criteria": "Accepted papers MUST:\n* primarily address LLMs...", // Optional: Direct criteria text
+  "criteria_file": "criteria.md",  // Optional: Path to criteria markdown file
+  
+  // Required fields
+  "api_key": "your-key",          // Required: API key for LLM service
+  
+  // Optional fields
+  "model": "phi-4"                // Optional: Model to use (default: phi-4)
+}
+```
+
+Notes:
+- Either `papers` or `input_file` must be provided, but not both
+- Either `criteria` or `criteria_file` must be provided, but not both
+
+**Response:**
+```json
+{
+  "job_id": "job-20240129113500"
+}
+```
+
+The endpoint returns immediately with a job ID. Use this ID with the job status endpoint to check progress and get results.
+
+### Search and Process Papers
+`POST /api/papers/search-process`
+
+Combined endpoint to search for papers and process them in one request. This endpoint automatically saves the papers to a file and processes them.
+
+**Request Body:**
+```json
+{
+  "start_date": "20240101",       // Required: Start date in YYYYMMDD format
+  "end_date": "20240131",         // Required: End date in YYYYMMDD format
+  "query": "machine learning",    // Required: Search query
+  "max_results": 5,              // Optional: Maximum number of results (1-2000, default: 100)
+  // Criteria (one of these is required)
+  "criteria": "Accepted papers MUST:\n* primarily address LLMs...", // Optional: Direct criteria text
+  "criteria_file": "criteria.md",  // Optional: Path to criteria markdown file
+  
+  // Required fields
+  "api_key": "your-key",          // Required: API key for LLM service
+  
+  // Optional fields
+  "model": "phi-4"                // Optional: Model to use (default: phi-4)
+}
+```
+
+**Response:**
+```json
+{
+  "job_id": "job-20240101-20240131-machine_learning"
+}
+```
+
+### Get Job Status
+`GET /api/jobs/{jobID}`
+
+Check the status of a processing job and retrieve results when complete.
+
+**Response:**
+```json
+{
+  "id": "job-20240129113500",
+  "status": "completed",  // "pending", "processing", "completed", or "failed"
+  "start_time": "2024-01-29T11:35:00Z",
+  "error": "",  // Error message if status is "failed"
+  "markdown_text": "# Results\n..."  // Full markdown content when completed
+}
+```
+
+## Processing Flow
+
+1. Submit a processing request (either `/api/papers/process` or `/api/papers/search-process`)
+2. Receive a job ID immediately
+3. Poll the job status endpoint until the job is completed
+4. When completed, the markdown content will be in the `markdown_text` field of the response
+
+Example workflow:
+```bash
+# 1. Submit processing request
+curl -X POST -H "Content-Type: application/json" -d '{
+  "start_date": "20240101",
+  "end_date": "20240131",
+  "query": "machine learning",
+  "criteria": "Accepted papers MUST:\n* primarily address LLMs...",
+  "api_key": "your-key",
+  "model": "phi-4"
+}' http://localhost:8080/api/papers/search-process
+
+# Response: {"job_id": "job-20240101-20240131-machine_learning"}
+
+# 2. Check job status and get results
+curl http://localhost:8080/api/jobs/job-20240101-20240131-machine_learning
+
+# Response when completed:
+{
+  "id": "job-20240101-20240131-machine_learning",
+  "status": "completed",
+  "start_time": "2024-01-29T11:35:00Z",
+  "markdown_text": "# Results\n\n## Accepted Papers\n\n1. Paper Title..."
+}
+
+# 3. Save markdown to file (example using jq)
+# The -r flag is important to get raw output without JSON escaping
+curl http://localhost:8080/api/jobs/job-20240101-20240131-machine_learning | jq -r '.markdown_text' > results.md
+
+# Alternative using Python (handles JSON escaping)
+curl http://localhost:8080/api/jobs/job-20240101-20240131-machine_learning | python3 -c '
+import json, sys
+response = json.load(sys.stdin)
+if response.get("status") == "completed":
+    with open("results.md", "w") as f:
+        f.write(response["markdown_text"])
+'
+```
+
+Note: Processing can take up to 30 minutes depending on the number of papers and LLM response times. The job status endpoint can be polled periodically (e.g., every 30 seconds) to check progress.
+
+## Error Responses
+
+All endpoints return appropriate HTTP status codes:
+
+- 200: Success
+- 400: Bad Request (invalid parameters)
+- 500: Internal Server Error
+
+Error responses include a message explaining the error:
+```json
+{
+  "error": "Invalid date format"
+}
+```
+
+## Rate Limiting
+
+The server includes built-in rate limiting for LLM API requests (2-second delay between requests) to prevent overwhelming the LLM service.
--- a/go.mod
+++ b/go.mod
@ -9,4 +9,5 @@ require (
 	gitea.r8z.us/stwhite/paperformatter v0.1.3
 	gitea.r8z.us/stwhite/paperprocessor v0.1.8
 	github.com/go-chi/chi/v5 v5.0.11
+	github.com/go-chi/cors v1.2.1
 )
--- a/go.sum
+++ b/go.sum
@ -6,3 +6,5 @@ gitea.r8z.us/stwhite/paperprocessor v0.1.8 h1:pV810JZQFhuKcle4ix7stUz12LZNIgFCVW
 gitea.r8z.us/stwhite/paperprocessor v0.1.8/go.mod h1:0wHe7XjtQICFrPKbO53SVrUiVw9yi8GOGo9J7znpo+E=
 github.com/go-chi/chi/v5 v5.0.11 h1:BnpYbFZ3T3S1WMpD79r7R5ThWX40TaFB7L31Y8xqSwA=
 github.com/go-chi/chi/v5 v5.0.11/go.mod h1:DslCQbL2OYiznFReuXYUmQ2hGd1aDpCnlMNITLSKoi8=
+github.com/go-chi/cors v1.2.1 h1:xEC8UT3Rlp2QuWNEr4Fs/c2EAGVKBwy/1vHx3bppil4=
+github.com/go-chi/cors v1.2.1/go.mod h1:sSbTewc+6wYHBBCW7ytsFSn836hqM7JxpglAy2Vzc58=
--- a/server.go
+++ b/server.go
@ -15,6 +15,7 @@ import (
 	"gitea.r8z.us/stwhite/paperprocessor"
 	"github.com/go-chi/chi/v5"
 	"github.com/go-chi/chi/v5/middleware"
+	"github.com/go-chi/cors"
 )

 type ProcessingJob struct {
@ -48,9 +49,21 @@ func NewServer(port string, apiEndpoint string) *Server {
 }

 func (s *Server) setupRoutes() {
+	// Basic middleware
 	s.router.Use(middleware.Logger)
 	s.router.Use(middleware.Recoverer)

+	// CORS middleware
+	s.router.Use(cors.Handler(cors.Options{
+		AllowedOrigins:   []string{"*"}, // Allow all origins in development
+		AllowedMethods:   []string{"GET", "POST", "OPTIONS"},
+		AllowedHeaders:   []string{"Accept", "Authorization", "Content-Type"},
+		ExposedHeaders:   []string{},
+		AllowCredentials: false,
+		MaxAge:           300, // Maximum value not ignored by any of major browsers
+	}))
+
+	// Routes
 	s.router.Post("/api/papers/search", s.handleSearch)
 	s.router.Post("/api/papers/process", s.handleProcess)
 	s.router.Post("/api/papers/search-process", s.handleSearchAndProcess)