PaperTrail¶
Every paper your team shares — found and mapped.
PaperTrail automatically discovers papers shared across your Slack workspace, enriches them with metadata from Semantic Scholar and OpenAlex, computes semantic embeddings, and serves an interactive dashboard with table view, 2D embedding map, and semantic search.
- GitHub: bschilder/PaperTrail
- PyPI: papertrail-lab
- Author: Brian Schilder, Koo Lab, Cold Spring Harbor Laboratory
Features¶
PaperTrail provides a complete end-to-end pipeline for paper discovery and analysis:
-
Slack Scraping — Automatically detects papers from shared links (DOI, arXiv, bioRxiv, PubMed, and other scholarly URLs) across all channels. Tracks engagement metrics like reactions and thread replies.
-
Metadata Enrichment — Fetches rich metadata including title, authors, abstract, journal, year, and affiliated institutions from Semantic Scholar and OpenAlex APIs.
-
LLM Embeddings — Generates high-quality semantic embeddings via OpenAI (default), HuggingFace Inference API, or local ONNX models. Stored in a FAISS vector database for sub-millisecond similarity search.
-
Interactive Dashboard — Self-contained HTML file with sortable table, d3.js scatter plot (UMAP/t-SNE/PCA projections), color-by filters (cluster/channel/user/date), detail panel, and semantic search chat with autocomplete.
-
CLI Pipeline — Simple four-step workflow:
scrape → enrich → embed → build.
Architecture¶
Slack Workspace
│
▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │───▶│ Enricher │───▶│ Embeddings │───▶│ Preview │
│ │ │ │ │ │ │ │
│ - Slack API │ │ - Semantic │ │ - OpenAI │ │ - Table view │
│ - URL detect │ │ Scholar │ │ - HuggingFace│ │ - Map view │
│ - Engagement │ │ - OpenAlex │ │ - Local ONNX │ │ - Chat │
│ metrics │ │ │ │ - FAISS store│ │ - Detail │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
Embedding Backends¶
PaperTrail supports multiple embedding backends. Choose based on your needs:
| Backend | Model | Dimensions | Speed | Quality | API Key |
|---|---|---|---|---|---|
| OpenAI (default) | text-embedding-3-small |
1536 | Fast | Excellent | OPENAI_API_KEY |
| HuggingFace | BAAI/bge-small-en-v1.5 |
384 | Fast | Very Good | HF_TOKEN (optional) |
| Local | BAAI/bge-small-en-v1.5 |
384 | Medium | Very Good | None required |
The embedding backend is auto-detected based on available API keys. You can also override it explicitly with the --backend flag.
Quick Start¶
1. Install¶
2. Configure¶
Set your API tokens as environment variables:
3. Run the Pipeline¶
# Step 1: Scrape papers from Slack
papertrail scrape -o papers_raw.json
# Step 2: Enrich with metadata
papertrail enrich papers_raw.json -o papers_enriched.json
# Step 3: Compute embeddings and projections
papertrail embed papers_enriched.json -o papers_final.json --backend openai
# Step 4: Build the interactive dashboard
papertrail build papers_final.json -o dashboard.html
Then open dashboard.html in your browser to explore your papers!
4. Search Papers¶
Next Steps¶
- Installation Guide — Detailed setup instructions
- Configuration — API tokens and environment setup
- Quick Start — Step-by-step walkthrough
- User Guide — In-depth usage documentation
- API Reference — Python API documentation
- Koo Lab Demo — Real-world example and use case
Development¶
Interested in contributing? Set up the development environment:
git clone https://github.com/bschilder/PaperTrail.git
cd PaperTrail
pip install -e ".[dev]"
# Run tests
pytest
# Serve docs locally
mkdocs serve
See Contributing for more information.
License¶
MIT License. See LICENSE for details.