CLI Reference¶
The papertrail command-line interface provides the main entry point for all PaperTrail operations.
Commands¶
scrape¶
Scrape papers from your Slack workspace.
Options:
-o, --output TEXT— Output file path (default:papers_raw.json)-c, --channels TEXT— Specific channels to scrape (space-separated)--exclude TEXT— Channels to exclude (space-separated)-d, --days INTEGER— Only scrape last N days--after DATE— Only scrape papers after this date (YYYY-MM-DD)--before DATE— Only scrape papers before this date (YYYY-MM-DD)--min-engagement INTEGER— Minimum engagement threshold--delay FLOAT— Delay between API requests (seconds)--batch-size INTEGER— Batch size for processing--checkpoint FILE— Save/resume from checkpoint--dry-run— Preview without downloading-v, --verbose— Verbose output--format TEXT— Output format (json, csv)--help— Show help message
Example:
enrich¶
Enrich papers with metadata from Semantic Scholar and OpenAlex.
Arguments:
INPUT_FILE— Input JSON file from scraper
Options:
-o, --output TEXT— Output file path (default:papers_enriched.json)--batch-size INTEGER— Batch size for API requests (default: 10)--delay FLOAT— Delay between requests (seconds)--no-cache— Ignore cache and re-enrich--require-identifier— Skip papers without DOI/arXiv ID--fields TEXT— Only enrich specific fields (comma-separated)--dry-run— Preview without saving-v, --verbose— Verbose output--help— Show help message
Example:
embed¶
Compute embeddings, projections, clusters, and FAISS index.
Arguments:
INPUT_FILE— Input JSON file from enricher
Options:
-o, --output TEXT— Output file path (default:papers_final.json)--backend TEXT— Embedding backend (openai, huggingface, local; default: auto-detect)--batch-size INTEGER— Batch size for embedding (default: 32)--delay FLOAT— Delay between requests (seconds)--n-clusters INTEGER— Number of k-means clusters (default: 5)-p, --projections TEXT— Projections to compute (comma-separated: umap,tsne,pca; default: all)--text-fields TEXT— Text fields to embed (default: abstract,title)--faiss-path TEXT— Save FAISS index to this directory--no-cache— Don't use cached models--dry-run— Preview without saving-v, --verbose— Verbose output--help— Show help message
Example:
papertrail embed papers_enriched.json -o papers_final.json --backend openai --n-clusters 8 -p umap,tsne
build¶
Build an interactive HTML dashboard.
Arguments:
INPUT_FILE— Input JSON file with embeddings
Options:
-o, --output TEXT— Output HTML file (default:dashboard.html)--title TEXT— Dashboard title--description TEXT— Dashboard description--default-coloring TEXT— Default color scheme (cluster, channel, user, date, year, citations)--default-projection TEXT— Default projection (umap, tsne, pca)--primary-color TEXT— Primary color (hex code)--accent-color TEXT— Accent color (hex code)--extra-fields TEXT— Extra metadata fields to display--compress— Compress data for smaller file size--template TEXT— Custom HTML template--faiss-path TEXT— Path to FAISS index-v, --verbose— Verbose output--help— Show help message
Example:
papertrail build papers_final.json -o dashboard.html --title "My Papers" --default-coloring cluster --default-projection umap
search¶
Search papers by semantic similarity.
Options:
-q, --query TEXT— Search query (required)-k, --top-k INTEGER— Number of results (default: 5)--papers FILE— Papers JSON file (default:papers_final.json)--faiss-path TEXT— FAISS index directory--backend TEXT— Embedding backend (auto-detect by default)--field TEXT— Search field (default: abstract,title)--min-score FLOAT— Minimum similarity score--exclude TEXT— Exclude channels--cluster INTEGER— Filter by cluster--after DATE— Only papers after this date--before DATE— Only papers before this date--format TEXT— Output format (text, json, csv; default: text)-v, --verbose— Verbose output--help— Show help message
Example:
Global Options¶
These options work with all commands:
--help— Show help message and exit--version— Show version and exit-q, --quiet— Suppress output-v, --verbose— Verbose output
Environment Variables¶
PaperTrail uses these environment variables:
| Variable | Purpose | Example |
|---|---|---|
SLACK_BOT_TOKEN |
Slack API token (required) | xoxb-... |
OPENAI_API_KEY |
OpenAI API key (for OpenAI embeddings) | sk-... |
HF_TOKEN |
HuggingFace API token (for HuggingFace embeddings) | hf_... |
Exit Codes¶
0— Success1— General error2— Configuration error (missing tokens, invalid input)3— API error (Slack, OpenAI, HuggingFace)4— File error (can't read/write files)
Examples¶
Complete Pipeline¶
# Set environment variables
export SLACK_BOT_TOKEN="xoxb-your-token"
export OPENAI_API_KEY="sk-your-key"
# Run full pipeline
papertrail scrape -o papers_raw.json && \
papertrail enrich papers_raw.json -o papers_enriched.json && \
papertrail embed papers_enriched.json -o papers_final.json && \
papertrail build papers_final.json -o dashboard.html
# Open dashboard
open dashboard.html
Scrape Specific Channels¶
Enrich with Custom Rate Limiting¶
Embed with Local Backend¶
Search with Multiple Options¶
Tips¶
- Use
--dry-runto preview operations without making changes - Use
--verbose(-v) to see detailed progress - Redirect output:
papertrail scrape -o papers.json 2>&1 | tee output.log - Chain commands with
&&to stop on first error - Use quotes for arguments with spaces:
--title "My Paper Collection"