Contributing¶

We welcome contributions! This guide explains how to get involved with PaperTrail.

Getting Started¶

Fork and Clone¶

# Fork the repository on GitHub, then:
git clone https://github.com/your-username/PaperTrail.git
cd PaperTrail

Set Up Development Environment¶

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Set up pre-commit hooks (optional but recommended)
pre-commit install

Verify Installation¶

# Run tests
pytest

# Serve docs locally
mkdocs serve

Development Workflow¶

1. Create a Branch¶

git checkout -b feature/your-feature-name

Use descriptive names:

feature/add-bge-embeddings
fix/handle-missing-dois
docs/improve-quickstart

2. Make Changes¶

Follow these guidelines:

Code Style¶

Use PEP 8 style
Format with black: black papertrail/
Lint with flake8: flake8 papertrail/
Type hints encouraged where helpful

from typing import List, Dict, Optional

def search_papers(
    query: str,
    top_k: int = 5,
    min_score: Optional[float] = None
) -> List[Dict]:
    """Search papers by semantic similarity.

    Args:
        query: Search query text
        top_k: Number of results to return
        min_score: Minimum similarity score filter

    Returns:
        List of papers ranked by similarity
    """
    pass

Documentation¶

Add docstrings to all functions and classes
Use Google-style docstrings
Include type hints in docstrings

def enrich_papers(papers: List[Dict]) -> List[Dict]:
    """Enrich papers with metadata from external APIs.

    Fetches title, authors, abstract, and other metadata from
    Semantic Scholar and OpenAlex APIs.

    Args:
        papers: List of paper objects with at least 'url' or 'doi'

    Returns:
        List of enriched paper objects

    Raises:
        APIError: If external API calls fail
        ValueError: If papers list is empty

    Example:
        ```python
        papers = scraper.scrape()
        enriched = enricher.enrich_papers(papers)
        ```
    """
    pass

Tests¶

Add tests for new features. Tests go in tests/:

# tests/test_scraper.py
def test_scrape_papers():
    """Test paper scraping from mock Slack."""
    scraper = Scraper(token="test-token")
    papers = scraper.scrape(channels=["test"])
    assert len(papers) > 0
    assert "url" in papers[0]

def test_scrape_empty_channel():
    """Test scraping channel with no papers."""
    scraper = Scraper(token="test-token")
    papers = scraper.scrape(channels=["empty"])
    assert len(papers) == 0

Run tests before committing:

pytest -v

3. Commit Changes¶

git add .
git commit -m "Add feature: description of changes"

Write good commit messages:

First line: short summary (50 chars max)
Blank line
Detailed explanation (if needed)

Good messages:

Add local ONNX embedding backend

This adds support for running embeddings locally without API keys,
using the fastembed library with BAAI/bge-small-en-v1.5 models.
Useful for privacy-sensitive data and offline environments.

Bad messages:

fix stuff
update
changes

4. Push and Pull Request¶

git push origin feature/your-feature-name

Then open a Pull Request on GitHub. Include:

Description of changes
Why they're needed
Any breaking changes
Screenshots if UI changes

Areas for Contribution¶

Code¶

New embedding backends — Add support for more models
Performance improvements — Optimize vectorization, caching
Bug fixes — See Issues
New features — See Discussions

Documentation¶

Tutorials — Step-by-step guides for specific workflows
API docs — Improve docstrings and examples
Troubleshooting — Common issues and solutions
Examples — Real-world use cases

Community¶

Answer questions — Help others in Issues/Discussions
Report bugs — File detailed issue reports
Suggest features — Discuss ideas in Discussions
Share use cases — Tell us how you use PaperTrail

Testing¶

Run Tests¶

# All tests
pytest

# Specific test file
pytest tests/test_scraper.py

# Specific test function
pytest tests/test_scraper.py::test_scrape_papers

# Verbose output
pytest -v

# Coverage report
pytest --cov=papertrail

Write Tests¶

Create test files in tests/:

# tests/test_embeddings.py
import pytest
from papertrail.embeddings import Embedder

def test_embed_papers():
    """Test embedding papers."""
    embedder = Embedder(backend="local")
    papers = [
        {"title": "Paper 1", "abstract": "Test abstract 1"},
        {"title": "Paper 2", "abstract": "Test abstract 2"},
    ]
    result = embedder.embed(papers)

    assert len(result) == 2
    assert "embedding" in result[0]
    assert len(result[0]["embedding"]) == 384  # Local backend dimensions

@pytest.mark.parametrize("backend", ["openai", "huggingface", "local"])
def test_embed_backends(backend):
    """Test all embedding backends."""
    embedder = Embedder(backend=backend)
    papers = [{"title": "Test", "abstract": "test"}]
    result = embedder.embed(papers)
    assert "embedding" in result[0]

Documentation¶

Documentation is in the docs/ directory using Markdown and mkdocs.

Building Docs Locally¶

# Install mkdocs
pip install mkdocs mkdocs-material mkdocstrings

# Serve locally
mkdocs serve

# Build static HTML
mkdocs build

Then open http://localhost:8000 in your browser.

Writing Documentation¶

Use clear, concise language
Include code examples
Explain the "why" not just the "how"
Add admonitions for tips and warnings:

!!!tip
    Use `--backend local` to avoid API costs during development.

!!!warning
    Never commit API keys to git. Always use environment variables.

Release Process¶

Maintainers follow this process for releases:

Update version — papertrail/__init__.py
Update changelog — CHANGELOG.md
Tag commit — git tag v0.1.0
Push tags — git push origin --tags
Build package — python -m build
Upload to PyPI — python -m twine upload dist/*

Code of Conduct¶

We're committed to providing a welcoming and inclusive community. Please:

Be respectful and kind
Welcome newcomers
Give credit for contributions
Report harassment to maintainers

Getting Help¶

Questions? — Ask in GitHub Discussions
Found a bug? — Open an Issue
Want to discuss features? — Start a Discussion

Useful Resources¶

Examples of Great Contributions¶

Adding a New Embedding Backend¶

# papertrail/embeddings.py
class CustomEmbedder(BaseEmbedder):
    """Custom embedding backend."""

    def __init__(self, model_name: str = "custom-model"):
        """Initialize custom embedder.

        Args:
            model_name: Name of custom model to use
        """
        self.model = self.load_model(model_name)

    def embed_text(self, text: str) -> np.ndarray:
        """Embed text to vector."""
        return self.model.encode(text)

Improving Documentation¶

# Advanced: Custom Metadata Sources

You can extend PaperTrail to fetch metadata from custom APIs.

## Example: Adding a Custom API

```python
from papertrail.enricher import Enricher

class CustomEnricher(Enricher):
    def enrich_from_custom_api(self, paper):
        # Your custom logic here
        pass

### Fixing a Bug

```python
# Before
def parse_doi(url):
    # Naive parsing that breaks on some URLs
    return url.split("doi.org/")[1]

# After
def parse_doi(url: str) -> Optional[str]:
    """Extract DOI from URL.

    Args:
        url: URL that may contain a DOI

    Returns:
        DOI string or None if not found
    """
    import re
    match = re.search(r'(?:doi\.org/|DOI:\s*)(.+?)(?:\s|$)', url)
    return match.group(1) if match else None

Thank You!¶

Thank you for contributing to PaperTrail! Your work helps the research community.

Questions? Contact us