# ArXiv Processor A Go package for fetching and processing papers from arXiv. ## Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/arxiv-processor.git cd arxiv-processor ``` 2. Initialize the Go module: ```bash go mod init arxiv-processor go mod tidy ``` ## Usage ### As a Library To use the package in your Go application: ```go import "github.com/yourusername/arxiv-processor" func main() { // Create client client := arxiv.NewClient() // Define search parameters // The format "20060102" is Go's reference time format: // 2006 = year // 01 = month // 02 = day // Note: The arXiv API returns full timestamps including time of day, // but the search API only uses the date portion for filtering startDate, _ := time.Parse("20060102", "20240101") endDate, _ := time.Parse("20060102", "20240131") // Fetch papers // The FetchPapers method returns all papers at once after completion // of the API request and any necessary pagination papers, err := client.FetchPapers("cat:cs.AI", startDate, endDate) if err != nil { log.Fatal(err) } // Use papers directly (in-memory) // The papers slice contains all results after completion for _, paper := range papers { fmt.Printf("Title: %s\n", paper.Title) fmt.Printf("Abstract: %s\n", paper.Abstract) } // Optionally save papers to file err = arxiv.SavePapers("papers.json", papers) if err != nil { log.Fatal(err) } } ``` Note: The package currently writes to a file by default. To modify this behavior to only return JSON objects: 1. Remove the SavePapers call 2. Use the returned papers slice directly 3. The papers slice contains all paper data as Go structs 4. You can marshal to JSON using json.Marshal(papers) if needed ### Command Line Interface To use the CLI: ```bash go run main.go --search "cat:cs.AI" --date-range "YYYYMMDD-YYYYMMDD" ``` #### Command Line Options - `--search`: Search query (e.g., "cat:cs.AI" for AI papers) - `--date-range`: Date range in YYYYMMDD-YYYYMMDD format - `--output`: Output file path (default: papers_data.json) ### Example: Fetch AI Papers ```bash go run main.go --search "cat:cs.AI" --date-range "20250115-20250118" ``` ### Program Output - Fetched papers are saved to `papers_data.json` - Example JSON structure: ```json [ { "title": "Sample Paper Title", "abstract": "This is a sample abstract...", "arxiv_id": "2501.08565v1" } ] ``` - The JSON file contains paper metadata including: - Title - Abstract - arXiv ID ## Configuration ### Environment Variables - `ARXIV_MAX_RESULTS`: Maximum number of results to fetch (default: 100) - `ARXIV_START_INDEX`: Start index for pagination (default: 0) ## Package Structure ``` arxiv-processor/ ├── arxiv/ # arXiv API client ├── storage/ # Data storage handlers ├── llm/ # LLM integration (TODO) ├── main.go # Main entry point └── README.md # This file ``` ## Contributing 1. Fork the repository 2. Create a new branch 3. Make your changes 4. Submit a pull request ## License MIT License