PaperTrail¶
Every paper your team shares — found and mapped.
PaperTrail automatically discovers papers shared across your Slack workspace, enriches them with metadata, computes semantic embeddings, and builds an interactive visual dashboard with hierarchical topic clustering, AI-powered search, and full engagement metrics.
Live Demo: Koo Lab Dashboard¶
- GitHub: bschilder/PaperTrail
- PyPI: papertrail-lab
- Author: Brian Schilder, Koo Lab, Cold Spring Harbor Laboratory
Features¶
Web App (Dashboard)¶
A self-contained HTML file — no server required. See the full Dashboard Guide.

- Canvas scatter plot with UMAP/t-SNE/PCA projections and hardware-accelerated rendering
- Hierarchical topic clustering — LLM-generated labels at 3 zoom levels (7 → 15 → 35 topics)
- Topic connection lines — configurable thickness, opacity, curve, and color
- 8 color modes: Cluster, Channel, Year, Citations, Engagement, Density, Contributor, Journal
- Smooth animations — papers fade in/out when filtering, timeline playback with gradual dot appearance
- 3D WebGL view powered by Three.js
- Sortable table with column filters, CSV/XLSX export, Slack message links
- Leaderboard — top contributors, most cited, most engaged
- AI chatbot — natural language search with tool use (HuggingFace, Claude, OpenAI)
- Semantic search — content-based similarity ranking
- Time travel — chronological animation with smooth fade-in
- KDE density background with 7 color palettes
- Lasso & rectangle selection
- URL hash state for shareable views
- Dark theme optimized for readability
Backend (Python Pipeline)¶
A four-step CLI pipeline. See individual guides: Scraping · Enriching · Embeddings · Dashboard
Slack Workspace
│
▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │───▶│ Enricher │───▶│ Embeddings │───▶│ Preview │
│ │ │ │ │ │ │ │
│ - Slack API │ │ - Page scrape│ │ - TF-IDF/SVD │ │ - Map view │
│ - 30+ domains│ │ - OpenAlex │ │ - OpenAI │ │ - 3D view │
│ - Reactions │ │ - Crossref │ │ - HuggingFace│ │ - Table │
│ - Replies │ │ - bioRxiv API│ │ - Local ONNX │ │ - AI agent │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
- Multi-strategy enrichment cascade — page scraping → DOI lookup (OpenAlex, Crossref) → bioRxiv/medRxiv API → title search → Google fallback. Handles Nature, arXiv, Cell, Science, OpenReview, and 30+ domains.
- Junk title rejection — automatically filters out erratum, corrigendum, correspondence, and site boilerplate
- Dead link detection — removes papers with 404/error pages
- Hierarchical clustering on UMAP projections — 3 levels with LLM-generated labels (HF → OpenAI fallback)
- Automated weekly pipeline via GitHub Actions — scrape, enrich, embed, build, deploy to GitHub Pages
Quick Start¶
Option 1: Automated (Fork & Configure)¶
- Fork this repository
- Edit
config.ymlwith your Slack channels - Add
SLACK_BOT_TOKENas a GitHub Actions secret - The pipeline runs weekly and deploys to GitHub Pages
See Deployment Guide for details.
Option 2: CLI¶
# Install
pip install papertrail-lab[all]
# Run the full pipeline
papertrail run-pipeline -c config.yml -o build
# Or step by step:
papertrail scrape --token $SLACK_BOT_TOKEN -c CHANNEL_ID -o raw.json
papertrail enrich raw.json -o enriched.json
papertrail embed enriched.json -o final.json
papertrail build final.json -o dashboard.html
Option 3: Python API¶
from papertrail.pipeline import run_pipeline
run_pipeline(config_path="config.yml", output_dir="build")
Configuration¶
Edit config.yml to set up your instance:
title: "PaperTrail — My Lab"
slack_workspace_url: "https://mylab.slack.com"
channels:
papers-dl: C0123Q7PGGP
general: CP40S009F
embedding_backend: tfidf # or openai, huggingface
openalex_email: "user@example.com"
schedule: "0 2 * * 0" # Weekly Sunday 2am UTC
Next Steps¶
- Dashboard Guide — Full walkthrough of all UI features
- Installation — Detailed setup
- Scraping Guide — Slack integration details
- Enrichment Guide — Metadata resolution strategies
- Embeddings Guide — Backend comparison
- API Reference — Python API docs
- Koo Lab Example — Real-world deployment
- Contributing — Development setup
License¶
MIT License. See LICENSE.