Steve White 9396e2da3a | ||
---|---|---|
.. | ||
arxiv | ||
storage | ||
20250123-papers.json | ||
README.md | ||
arxiv-2501.11599v1.md | ||
go.mod | ||
main.go |
README.md
ArXiv Processor
A Go package for fetching and processing papers from arXiv.
Installation
- Clone the repository:
git clone https://github.com/yourusername/arxiv-processor.git
cd arxiv-processor
- Initialize the Go module:
go mod init arxiv-processor
go mod tidy
Usage
As a Library
To use the package in your Go application:
import "github.com/yourusername/arxiv-processor"
func main() {
// Create client
client := arxiv.NewClient()
// Define search parameters
// The format "20060102" is Go's reference time format:
// 2006 = year
// 01 = month
// 02 = day
// Note: The arXiv API returns full timestamps including time of day,
// but the search API only uses the date portion for filtering
startDate, _ := time.Parse("20060102", "20240101")
endDate, _ := time.Parse("20060102", "20240131")
// Fetch papers
// The FetchPapers method returns all papers at once after completion
// of the API request and any necessary pagination
papers, err := client.FetchPapers("cat:cs.AI", startDate, endDate)
if err != nil {
log.Fatal(err)
}
// Use papers directly (in-memory)
// The papers slice contains all results after completion
for _, paper := range papers {
fmt.Printf("Title: %s\n", paper.Title)
fmt.Printf("Abstract: %s\n", paper.Abstract)
}
// Optionally save papers to file
err = arxiv.SavePapers("papers.json", papers)
if err != nil {
log.Fatal(err)
}
}
Note: The package currently writes to a file by default. To modify this behavior to only return JSON objects:
- Remove the SavePapers call
- Use the returned papers slice directly
- The papers slice contains all paper data as Go structs
- You can marshal to JSON using json.Marshal(papers) if needed
Command Line Interface
To use the CLI:
go run main.go --search "cat:cs.AI" --date-range "YYYYMMDD-YYYYMMDD"
Command Line Options
--search
: Search query (e.g., "cat:cs.AI" for AI papers)--date-range
: Date range in YYYYMMDD-YYYYMMDD format--output
: Output file path (default: papers_data.json)
Example: Fetch AI Papers
go run main.go --search "cat:cs.AI" --date-range "20250115-20250118"
Program Output
- Fetched papers are saved to
papers_data.json
- Example JSON structure:
[
{
"title": "Sample Paper Title",
"abstract": "This is a sample abstract...",
"arxiv_id": "2501.08565v1"
}
]
- The JSON file contains paper metadata including:
- Title
- Abstract
- arXiv ID
Configuration
Environment Variables
ARXIV_MAX_RESULTS
: Maximum number of results to fetch (default: 100)ARXIV_START_INDEX
: Start index for pagination (default: 0)
Package Structure
arxiv-processor/
├── arxiv/ # arXiv API client
├── storage/ # Data storage handlers
├── llm/ # LLM integration (TODO)
├── main.go # Main entry point
└── README.md # This file
Contributing
- Fork the repository
- Create a new branch
- Make your changes
- Submit a pull request
License
MIT License