Skip to content

CLI Reference

The papertrail command-line interface provides the main entry point for all PaperTrail operations.

Commands

scrape

Scrape papers from your Slack workspace.

papertrail scrape [OPTIONS]

Options:

  • -o, --output TEXT — Output file path (default: papers_raw.json)
  • -c, --channels TEXT — Specific channels to scrape (space-separated)
  • --exclude TEXT — Channels to exclude (space-separated)
  • -d, --days INTEGER — Only scrape last N days
  • --after DATE — Only scrape papers after this date (YYYY-MM-DD)
  • --before DATE — Only scrape papers before this date (YYYY-MM-DD)
  • --min-engagement INTEGER — Minimum engagement threshold
  • --delay FLOAT — Delay between API requests (seconds)
  • --batch-size INTEGER — Batch size for processing
  • --checkpoint FILE — Save/resume from checkpoint
  • --dry-run — Preview without downloading
  • -v, --verbose — Verbose output
  • --format TEXT — Output format (json, csv)
  • --help — Show help message

Example:

papertrail scrape -o papers.json -c general papers --days 30 -v

enrich

Enrich papers with metadata from Semantic Scholar and OpenAlex.

papertrail enrich INPUT_FILE [OPTIONS]

Arguments:

  • INPUT_FILE — Input JSON file from scraper

Options:

  • -o, --output TEXT — Output file path (default: papers_enriched.json)
  • --batch-size INTEGER — Batch size for API requests (default: 10)
  • --delay FLOAT — Delay between requests (seconds)
  • --no-cache — Ignore cache and re-enrich
  • --require-identifier — Skip papers without DOI/arXiv ID
  • --fields TEXT — Only enrich specific fields (comma-separated)
  • --dry-run — Preview without saving
  • -v, --verbose — Verbose output
  • --help — Show help message

Example:

papertrail enrich papers_raw.json -o papers_enriched.json --batch-size 20 --delay 0.5 -v

embed

Compute embeddings, projections, clusters, and FAISS index.

papertrail embed INPUT_FILE [OPTIONS]

Arguments:

  • INPUT_FILE — Input JSON file from enricher

Options:

  • -o, --output TEXT — Output file path (default: papers_final.json)
  • --backend TEXT — Embedding backend (openai, huggingface, local; default: auto-detect)
  • --batch-size INTEGER — Batch size for embedding (default: 32)
  • --delay FLOAT — Delay between requests (seconds)
  • --n-clusters INTEGER — Number of k-means clusters (default: 5)
  • -p, --projections TEXT — Projections to compute (comma-separated: umap,tsne,pca; default: all)
  • --text-fields TEXT — Text fields to embed (default: abstract,title)
  • --faiss-path TEXT — Save FAISS index to this directory
  • --no-cache — Don't use cached models
  • --dry-run — Preview without saving
  • -v, --verbose — Verbose output
  • --help — Show help message

Example:

papertrail embed papers_enriched.json -o papers_final.json --backend openai --n-clusters 8 -p umap,tsne

build

Build an interactive HTML dashboard.

papertrail build INPUT_FILE [OPTIONS]

Arguments:

  • INPUT_FILE — Input JSON file with embeddings

Options:

  • -o, --output TEXT — Output HTML file (default: dashboard.html)
  • --title TEXT — Dashboard title
  • --description TEXT — Dashboard description
  • --default-coloring TEXT — Default color scheme (cluster, channel, user, date, year, citations)
  • --default-projection TEXT — Default projection (umap, tsne, pca)
  • --primary-color TEXT — Primary color (hex code)
  • --accent-color TEXT — Accent color (hex code)
  • --extra-fields TEXT — Extra metadata fields to display
  • --compress — Compress data for smaller file size
  • --template TEXT — Custom HTML template
  • --faiss-path TEXT — Path to FAISS index
  • -v, --verbose — Verbose output
  • --help — Show help message

Example:

papertrail build papers_final.json -o dashboard.html --title "My Papers" --default-coloring cluster --default-projection umap

Search papers by semantic similarity.

papertrail search [OPTIONS]

Options:

  • -q, --query TEXT — Search query (required)
  • -k, --top-k INTEGER — Number of results (default: 5)
  • --papers FILE — Papers JSON file (default: papers_final.json)
  • --faiss-path TEXT — FAISS index directory
  • --backend TEXT — Embedding backend (auto-detect by default)
  • --field TEXT — Search field (default: abstract,title)
  • --min-score FLOAT — Minimum similarity score
  • --exclude TEXT — Exclude channels
  • --cluster INTEGER — Filter by cluster
  • --after DATE — Only papers after this date
  • --before DATE — Only papers before this date
  • --format TEXT — Output format (text, json, csv; default: text)
  • -v, --verbose — Verbose output
  • --help — Show help message

Example:

papertrail search -q "transformer attention mechanisms" -k 10 --format json

Global Options

These options work with all commands:

  • --help — Show help message and exit
  • --version — Show version and exit
  • -q, --quiet — Suppress output
  • -v, --verbose — Verbose output

Environment Variables

PaperTrail uses these environment variables:

Variable Purpose Example
SLACK_BOT_TOKEN Slack API token (required) xoxb-...
OPENAI_API_KEY OpenAI API key (for OpenAI embeddings) sk-...
HF_TOKEN HuggingFace API token (for HuggingFace embeddings) hf_...

Exit Codes

  • 0 — Success
  • 1 — General error
  • 2 — Configuration error (missing tokens, invalid input)
  • 3 — API error (Slack, OpenAI, HuggingFace)
  • 4 — File error (can't read/write files)

Examples

Complete Pipeline

# Set environment variables
export SLACK_BOT_TOKEN="xoxb-your-token"
export OPENAI_API_KEY="sk-your-key"

# Run full pipeline
papertrail scrape -o papers_raw.json && \
papertrail enrich papers_raw.json -o papers_enriched.json && \
papertrail embed papers_enriched.json -o papers_final.json && \
papertrail build papers_final.json -o dashboard.html

# Open dashboard
open dashboard.html

Scrape Specific Channels

papertrail scrape -c papers-ml papers-bio papers-physics -o papers.json -v

Enrich with Custom Rate Limiting

papertrail enrich papers_raw.json -o enriched.json --delay 1.0 --batch-size 5

Embed with Local Backend

papertrail embed papers_enriched.json -o final.json --backend local --n-clusters 10

Search with Multiple Options

papertrail search -q "deep learning" -k 20 --min-score 0.7 --format csv

Tips

  • Use --dry-run to preview operations without making changes
  • Use --verbose (-v) to see detailed progress
  • Redirect output: papertrail scrape -o papers.json 2>&1 | tee output.log
  • Chain commands with && to stop on first error
  • Use quotes for arguments with spaces: --title "My Paper Collection"