PaperTrail¶

Every paper your team shares — found and mapped.

PaperTrail automatically discovers papers shared across your Slack workspace, enriches them with metadata, computes semantic embeddings, and builds an interactive visual dashboard with hierarchical topic clustering, AI-powered search, and full engagement metrics.

Live Demos¶

Landing page (lab picker): papertrail-portal.vercel.app
Koo Lab — Vercel · GitHub Pages
Standard Model Bio — Vercel · GitHub Pages
GitHub: bschilder/PaperTrail
PyPI: papertrail-lab
Author: Brian Schilder, Koo Lab, Cold Spring Harbor Laboratory

Features¶

Web App (Dashboard)¶

A self-contained HTML file — no server required. See the full Dashboard Guide.

Map View

Canvas scatter plot with UMAP/t-SNE/PCA projections and hardware-accelerated rendering
Hierarchical topic clustering — LLM-generated labels at 3 zoom levels (7 → 15 → 35 topics)
Topic connection lines — configurable thickness, opacity, curve, and color
8 color modes: Cluster, Channel, Year, Citations, Engagement, Density, Contributor, Journal
Smooth animations — papers fade in/out when filtering, timeline playback with gradual dot appearance
3D WebGL view powered by Three.js
Sortable table with column filters, CSV/XLSX export, Slack message links
Leaderboard — top contributors, most cited, most engaged
AI chatbot — natural language search with tool use (HuggingFace, Claude, OpenAI)
Semantic search — content-based similarity ranking
Time travel — chronological animation with smooth fade-in
KDE density background with 7 color palettes
Lasso & rectangle selection
URL hash state for shareable views
Dark theme optimized for readability

Backend (Python Pipeline)¶

A four-step CLI pipeline. See individual guides: Scraping · Enriching · Embeddings · Dashboard

Slack Workspace
      │
      ▼
┌─────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   Scraper    │───▶│   Enricher   │───▶│  Embeddings  │───▶│   Preview    │
│              │    │              │    │              │    │              │
│ - Slack API  │    │ - Page scrape│    │ - TF-IDF/SVD │    │ - Map view   │
│ - 30+ domains│    │ - OpenAlex   │    │ - OpenAI     │    │ - 3D view    │
│ - Reactions  │    │ - Crossref   │    │ - HuggingFace│    │ - Table      │
│ - Replies    │    │ - bioRxiv API│    │ - Local ONNX │    │ - AI agent   │
└─────────────┘    └──────────────┘    └──────────────┘    └──────────────┘

Multi-strategy enrichment cascade — page scraping → DOI lookup (OpenAlex, Crossref) → bioRxiv/medRxiv API → title search → Google fallback. Handles Nature, arXiv, Cell, Science, OpenReview, and 30+ domains.
Junk title rejection — automatically filters out erratum, corrigendum, correspondence, and site boilerplate
Dead link detection — removes papers with 404/error pages
Hierarchical clustering on UMAP projections — 3 levels with LLM-generated labels (HF → OpenAI fallback)
Automated weekly pipeline via GitHub Actions — scrape, enrich, embed, build, deploy to both Vercel and GitHub Pages

Quick Start¶

Option 1: Automated (Fork & Configure)¶

Fork this repository
Edit config.yml with your Slack channels
Add SLACK_BOT_TOKEN as a GitHub Actions secret
(Optional) Add VERCEL_TOKEN, VERCEL_ORG_ID, VERCEL_PROJECT_ID secrets to also deploy to Vercel
The pipeline runs weekly (Sunday 2am UTC) and deploys to Vercel + GitHub Pages

See Configuration → Deployment for details.

Option 2: CLI¶

# Install
pip install papertrail-lab[all]

# Run the full pipeline
papertrail run-pipeline -c config.yml -o build

# Or step by step:
papertrail scrape --token $SLACK_BOT_TOKEN -c CHANNEL_ID -o raw.json
papertrail enrich raw.json -o enriched.json
papertrail embed enriched.json -o final.json
papertrail build final.json -o dashboard.html

Option 3: Python API¶

from papertrail.pipeline import run_pipeline

run_pipeline(config_path="config.yml", output_dir="build")

Configuration¶

Edit config.yml to set up your instance:

title: "PaperTrail — My Lab"
slack_workspace_url: "https://mylab.slack.com"

channels:
  papers-dl: C0123Q7PGGP
  general: CP40S009F

embedding_backend: tfidf  # or openai, huggingface
openalex_email: "user@example.com"
schedule: "0 2 * * 0"  # Weekly Sunday 2am UTC

Next Steps¶

Dashboard Guide — Full walkthrough of all UI features
Installation — Detailed setup
Scraping Guide — Slack integration details
Enrichment Guide — Metadata resolution strategies
Embeddings Guide — Backend comparison
API Reference — Python API docs
Koo Lab Example — Real-world deployment
Contributing — Development setup

License¶

MIT License. See LICENSE.