Skip to content

Contributing

We welcome contributions! This guide explains how to get involved with PaperTrail.

Getting Started

Fork and Clone

# Fork the repository on GitHub, then:
git clone https://github.com/your-username/PaperTrail.git
cd PaperTrail

Set Up Development Environment

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Set up pre-commit hooks (optional but recommended)
pre-commit install

Verify Installation

# Run tests
pytest

# Serve docs locally
mkdocs serve

Development Workflow

1. Create a Branch

git checkout -b feature/your-feature-name

Use descriptive names:

  • feature/add-bge-embeddings
  • fix/handle-missing-dois
  • docs/improve-quickstart

2. Make Changes

Follow these guidelines:

Code Style

  • Use PEP 8 style
  • Format with black: black papertrail/
  • Lint with flake8: flake8 papertrail/
  • Type hints encouraged where helpful
from typing import List, Dict, Optional

def search_papers(
    query: str,
    top_k: int = 5,
    min_score: Optional[float] = None
) -> List[Dict]:
    """Search papers by semantic similarity.

    Args:
        query: Search query text
        top_k: Number of results to return
        min_score: Minimum similarity score filter

    Returns:
        List of papers ranked by similarity
    """
    pass

Documentation

  • Add docstrings to all functions and classes
  • Use Google-style docstrings
  • Include type hints in docstrings
def enrich_papers(papers: List[Dict]) -> List[Dict]:
    """Enrich papers with metadata from external APIs.

    Fetches title, authors, abstract, and other metadata from
    Semantic Scholar and OpenAlex APIs.

    Args:
        papers: List of paper objects with at least 'url' or 'doi'

    Returns:
        List of enriched paper objects

    Raises:
        APIError: If external API calls fail
        ValueError: If papers list is empty

    Example:
        ```python
        papers = scraper.scrape()
        enriched = enricher.enrich_papers(papers)
        ```
    """
    pass

Tests

Add tests for new features. Tests go in tests/:

# tests/test_scraper.py
def test_scrape_papers():
    """Test paper scraping from mock Slack."""
    scraper = Scraper(token="test-token")
    papers = scraper.scrape(channels=["test"])
    assert len(papers) > 0
    assert "url" in papers[0]

def test_scrape_empty_channel():
    """Test scraping channel with no papers."""
    scraper = Scraper(token="test-token")
    papers = scraper.scrape(channels=["empty"])
    assert len(papers) == 0

Run tests before committing:

pytest -v

3. Commit Changes

git add .
git commit -m "Add feature: description of changes"

Write good commit messages:

  • First line: short summary (50 chars max)
  • Blank line
  • Detailed explanation (if needed)

Good messages:

Add local ONNX embedding backend

This adds support for running embeddings locally without API keys,
using the fastembed library with BAAI/bge-small-en-v1.5 models.
Useful for privacy-sensitive data and offline environments.

Bad messages:

fix stuff
update
changes

4. Push and Pull Request

git push origin feature/your-feature-name

Then open a Pull Request on GitHub. Include:

  • Description of changes
  • Why they're needed
  • Any breaking changes
  • Screenshots if UI changes

Areas for Contribution

Code

  • New embedding backends — Add support for more models
  • Performance improvements — Optimize vectorization, caching
  • Bug fixes — See Issues
  • New features — See Discussions

Documentation

  • Tutorials — Step-by-step guides for specific workflows
  • API docs — Improve docstrings and examples
  • Troubleshooting — Common issues and solutions
  • Examples — Real-world use cases

Community

  • Answer questions — Help others in Issues/Discussions
  • Report bugs — File detailed issue reports
  • Suggest features — Discuss ideas in Discussions
  • Share use cases — Tell us how you use PaperTrail

Testing

Run Tests

# All tests
pytest

# Specific test file
pytest tests/test_scraper.py

# Specific test function
pytest tests/test_scraper.py::test_scrape_papers

# Verbose output
pytest -v

# Coverage report
pytest --cov=papertrail

Write Tests

Create test files in tests/:

# tests/test_embeddings.py
import pytest
from papertrail.embeddings import Embedder

def test_embed_papers():
    """Test embedding papers."""
    embedder = Embedder(backend="local")
    papers = [
        {"title": "Paper 1", "abstract": "Test abstract 1"},
        {"title": "Paper 2", "abstract": "Test abstract 2"},
    ]
    result = embedder.embed(papers)

    assert len(result) == 2
    assert "embedding" in result[0]
    assert len(result[0]["embedding"]) == 384  # Local backend dimensions

@pytest.mark.parametrize("backend", ["openai", "huggingface", "local"])
def test_embed_backends(backend):
    """Test all embedding backends."""
    embedder = Embedder(backend=backend)
    papers = [{"title": "Test", "abstract": "test"}]
    result = embedder.embed(papers)
    assert "embedding" in result[0]

Documentation

Documentation is in the docs/ directory using Markdown and mkdocs.

Building Docs Locally

# Install mkdocs
pip install mkdocs mkdocs-material mkdocstrings

# Serve locally
mkdocs serve

# Build static HTML
mkdocs build

Then open http://localhost:8000 in your browser.

Writing Documentation

  • Use clear, concise language
  • Include code examples
  • Explain the "why" not just the "how"
  • Add admonitions for tips and warnings:
!!!tip
    Use `--backend local` to avoid API costs during development.

!!!warning
    Never commit API keys to git. Always use environment variables.

Release Process

Maintainers follow this process for releases:

  1. Update versionpapertrail/__init__.py
  2. Update changelogCHANGELOG.md
  3. Tag commitgit tag v0.1.0
  4. Push tagsgit push origin --tags
  5. Build packagepython -m build
  6. Upload to PyPIpython -m twine upload dist/*

Code of Conduct

We're committed to providing a welcoming and inclusive community. Please:

  • Be respectful and kind
  • Welcome newcomers
  • Give credit for contributions
  • Report harassment to maintainers

Getting Help

Useful Resources

Examples of Great Contributions

Adding a New Embedding Backend

# papertrail/embeddings.py
class CustomEmbedder(BaseEmbedder):
    """Custom embedding backend."""

    def __init__(self, model_name: str = "custom-model"):
        """Initialize custom embedder.

        Args:
            model_name: Name of custom model to use
        """
        self.model = self.load_model(model_name)

    def embed_text(self, text: str) -> np.ndarray:
        """Embed text to vector."""
        return self.model.encode(text)

Improving Documentation

# Advanced: Custom Metadata Sources

You can extend PaperTrail to fetch metadata from custom APIs.

## Example: Adding a Custom API

```python
from papertrail.enricher import Enricher

class CustomEnricher(Enricher):
    def enrich_from_custom_api(self, paper):
        # Your custom logic here
        pass
### Fixing a Bug

```python
# Before
def parse_doi(url):
    # Naive parsing that breaks on some URLs
    return url.split("doi.org/")[1]

# After
def parse_doi(url: str) -> Optional[str]:
    """Extract DOI from URL.

    Args:
        url: URL that may contain a DOI

    Returns:
        DOI string or None if not found
    """
    import re
    match = re.search(r'(?:doi\.org/|DOI:\s*)(.+?)(?:\s|$)', url)
    return match.group(1) if match else None

Thank You!

Thank you for contributing to PaperTrail! Your work helps the research community.


Questions? Contact us