Configuration¶
Before running PaperTrail, you need to configure API tokens and environment variables.
Required Credentials¶
Slack Bot Token¶
PaperTrail reads papers from your Slack workspace using the Slack Bot API.
To create a Slack bot:
- Go to api.slack.com/apps
- Click "Create New App" → "From scratch"
- Give it a name (e.g., "PaperTrail") and select your workspace
- Go to OAuth & Permissions
- Under Scopes, add these bot token scopes:
channels:historychannels:readusers:read-
reactions:read -
Go to Install App and click "Install to Workspace"
- Copy the Bot User OAuth Token (starts with
xoxb-)
Embedding Backend Token¶
Choose one embedding backend and set its API key:
Get your API key from platform.openai.com/api-keys
Get your API token from huggingface.co/settings/tokens
Optional Configuration¶
Semantic Scholar / OpenAlex¶
Metadata enrichment uses Semantic Scholar and OpenAlex APIs, which are free and don't require authentication. However, you can optionally add API keys for higher rate limits:
# Not required, but improves rate limits
export SEMANTIC_SCHOLAR_API_KEY="your-key-here"
export OPENALEX_API_KEY="your-key-here"
Custom Enrichment APIs¶
If you want to use custom metadata sources:
Environment Setup¶
Quick Setup (Recommended)¶
Create a .env file in your project directory:
Then load it before running PaperTrail:
Or use a tool like python-dotenv:
Per-Session Setup¶
Set variables directly in your shell session:
Persistent Setup¶
Add to your shell profile (~/.bashrc, ~/.zshrc, etc.):
Then reload your shell:
Verify Configuration¶
Check that your credentials are set:
# Check all required env vars
python3 << 'EOF'
import os
required = ["SLACK_BOT_TOKEN", "OPENAI_API_KEY"] # or HF_TOKEN or neither for local
for var in required:
value = os.getenv(var, "NOT SET")
print(f"{var}: {value[:20]}..." if value != "NOT SET" else f"{var}: NOT SET")
EOF
Or test with the scraper:
This will validate your Slack token without downloading data.
Security Best Practices¶
Warning
Never commit API keys to version control. Always use environment variables or .env files that are ignored by git.
Good practices:
-
Add
.envto.gitignore: -
Use restricted API tokens with minimal scopes
- Rotate tokens regularly
- Use different tokens for development vs. production
- Store tokens in a password manager or secrets management system
For production deployments:
Use a secrets management system like:
- HashiCorp Vault
- AWS Secrets Manager
- GitHub Secrets (for CI/CD)
- Docker Secrets (for containers)
Embedding Backend Selection¶
PaperTrail auto-detects your embedding backend based on available credentials:
- OpenAI — If
OPENAI_API_KEYis set - HuggingFace — If
HF_TOKENis set (and OpenAI is not) - Local — Otherwise (no API keys required)
Force a Specific Backend¶
Override auto-detection with the --backend flag:
# Force OpenAI
papertrail embed papers.json --backend openai
# Force HuggingFace
papertrail embed papers.json --backend huggingface
# Force local
papertrail embed papers.json --backend local
Troubleshooting¶
Error: SLACK_BOT_TOKEN not found¶
Make sure you set the environment variable:
Verify it's set:
Error: Invalid Slack token¶
Check that your token is correct and still valid. Tokens can be revoked if:
- You reinstalled the app
- The app was uninstalled from your workspace
- The workspace regenerated tokens
Create a new token and try again.
Error: OPENAI_API_KEY not found (but I set it)¶
Make sure you exported the variable (not just assigned it):
Error: Failed to download embeddings¶
Check your internet connection and API rate limits. OpenAI and HuggingFace have rate limits:
- OpenAI: 3,500 requests/minute for most accounts
- HuggingFace: Varies by account tier
If you hit rate limits, add delays between requests:
Next Steps¶
- Quick Start — Run the pipeline
- Scraping Papers — Learn about the scraper
- Computing Embeddings — Embedding details and options