Overview
This document establishes error handling standards for the Tux Discord bot codebase. Our approach prioritizes user experience, debugging capability, and system reliability through consistent error handling patterns.
Core Principles
1. Fail Gracefully
- Always provide meaningful feedback to users
- Never expose internal errors or stack traces to end users
- Degrade functionality gracefully when possible
2. Log Everything
- Use structured logging with appropriate levels
- Include context for debugging (user ID, guild ID, command, etc.)
- Log both successful operations and failures
3. Be Specific
- Catch specific exceptions when possible
- Avoid broad
except Exception
unless necessary - Chain exceptions to preserve error context
4. Consistent Patterns
- Follow established patterns across the codebase
- Use the global error handler for command-level errors
- Handle infrastructure errors locally with proper fallbacks
Error Handling Architecture
Global Error Handler
Located in src/tux/services/handlers/error/handler.py
Responsibilities:
- Command-level error handling
- User-friendly error messages
- Sentry integration for error tracking
- Automatic error categorization
When to use:
- Command execution errors
- Permission errors
- Validation errors
- Most user-facing errors
Local Error Handling
When to use:
- Infrastructure operations (HTTP, database, file I/O)
- Background tasks
- Service initialization
- Operations that need graceful degradation
Tux Exception Hierarchy
All Tux-specific exceptions inherit from TuxError
base class for consistent error handling:
TuxError
├── TuxConfigurationError
├── TuxRuntimeError
├── TuxDatabaseError
│ ├── TuxDatabaseConnectionError
│ ├── TuxDatabaseMigrationError
│ └── TuxDatabaseQueryError
├── TuxPermissionError
│ ├── TuxPermissionLevelError
│ └── TuxAppCommandPermissionLevelError
├── TuxAPIError
│ ├── TuxAPIConnectionError
│ ├── TuxAPIRequestError
│ ├── TuxAPIResourceNotFoundError
│ └── TuxAPIPermissionError
├── TuxCodeExecutionError
│ ├── TuxMissingCodeError
│ ├── TuxInvalidCodeFormatError
│ ├── TuxUnsupportedLanguageError
│ └── TuxCompilationError
└── TuxServiceError
├── TuxCogLoadError
└── TuxHotReloadError
├── TuxDependencyResolutionError
├── TuxFileWatchError
├── TuxModuleReloadError
└── TuxConfigurationError
Using Specific Exceptions
# ✅ GOOD: Use specific exception types
from tux.shared.exceptions import TuxDatabaseConnectionError
raise TuxDatabaseConnectionError("Cannot connect to PostgreSQL")
# ❌ BAD: Generic exceptions
raise Exception("Database connection failed")
# ❌ BAD: String matching for error types
try:
# some operation
except Exception as e:
if "connection" in str(e).lower():
# handle connection error
# ✅ GOOD: Catch specific exception types
try:
# some operation
except ConnectionError as e:
raise TuxDatabaseConnectionError("Database connection failed") from e
except TuxDatabaseError:
# handle database errors
except Exception as e:
# handle other errors
Patterns & Examples
✅ Sentry Integration
# ✅ GOOD: Use unified Sentry utilities
from tux.services.sentry import capture_database_error, capture_api_error, capture_exception_safe
# Database errors
try:
await db.execute(query)
except Exception as e:
capture_database_error(e, operation="insert", table="cases")
raise TuxDatabaseQueryError("Failed to insert case") from e
# API errors
try:
response = await client.get(url)
except httpx.RequestError as e:
capture_api_error(e, service_name="GitHub", endpoint=url)
raise TuxAPIConnectionError("GitHub API unavailable") from e
# Generic errors with context
try:
# some operation
except Exception as e:
capture_exception_safe(e, extra_context={"operation": "startup", "component": "bot"})
raise TuxRuntimeError("Operation failed") from e
# ❌ BAD: Raw Sentry calls
import sentry_sdk
sentry_sdk.capture_exception(e) # Missing context and standardization
✅ HTTP Operations
async def fetch_data(self, url: str) -> dict | None:
"""Fetch data from API with proper error handling."""
try:
response = await http_client.get(url, timeout=10)
response.raise_for_status()
return response.json()
except httpx.HTTPStatusError as e:
logger.warning(f"HTTP error {e.response.status_code} for {url}")
return None
except httpx.TimeoutException:
logger.warning(f"Timeout fetching {url}")
return None
except Exception as e:
logger.error(f"Unexpected error fetching {url}: {e}")
return None
✅ Database Operations
async def create_user_record(self, user_id: int, data: dict) -> bool:
"""Create user record with error handling."""
try:
await self.db.user.create(user_id=user_id, **data)
logger.info(f"Created user record for {user_id}")
return True
except IntegrityError:
logger.warning(f"User {user_id} already exists")
return False
except Exception as e:
logger.error(f"Failed to create user {user_id}: {e}")
return False
✅ Command Error Handling
@commands.command()
async def my_command(self, ctx: commands.Context[Tux]) -> None:
"""Command with proper validation."""
# Let global error handler catch validation errors
if not ctx.guild:
raise commands.NoPrivateMessage()
# Handle infrastructure errors locally
data = await self.fetch_user_data(ctx.author.id)
if data is None:
await ctx.reply("Unable to fetch user data. Please try again later.")
return
# Process data...
✅ Service Initialization
async def initialize_service(self) -> None:
"""Initialize service with graceful degradation."""
try:
await self.connect_to_external_api()
self.enabled = True
logger.info("Service initialized successfully")
except Exception as e:
logger.error(f"Service initialization failed: {e}")
self.enabled = False
# Continue without this service
Anti-Patterns to Avoid
❌ Silent Failures
# BAD: Silent failure
try:
await some_operation()
except Exception:
return None # User gets no feedback
# GOOD: Proper error handling
try:
await some_operation()
except Exception as e:
logger.error(f"Operation failed: {e}")
raise # Let global handler provide user feedback
❌ Exposing Internal Errors
# BAD: Exposing stack traces
except Exception as e:
await ctx.reply(f"Error: {e}")
# GOOD: User-friendly messages
except Exception as e:
logger.error(f"Command failed: {e}")
await ctx.reply("Something went wrong. Please try again later.")
❌ Overly Broad Catches
# BAD: Too broad
try:
data = response.json()
except Exception:
return None
# GOOD: Specific exceptions
try:
data = response.json()
except (JSONDecodeError, KeyError) as e:
logger.warning(f"Invalid JSON response: {e}")
return None
Error Categories & Handling
User Errors
Examples: Invalid input, missing permissions, rate limits Handling: Global error handler with helpful messages
# Let global handler catch these
raise commands.BadArgument("Invalid user ID format")
raise commands.MissingPermissions(["manage_messages"])
Infrastructure Errors
Examples: Network failures, database timeouts, file I/O errors Handling: Local handling with graceful degradation
try:
result = await external_api_call()
except (httpx.TimeoutException, httpx.ConnectError):
# Graceful fallback
result = get_cached_result()
System Errors
Examples: Configuration errors, startup failures, critical bugs Handling: Log and fail fast or disable functionality
try:
self.config = load_config()
except ConfigError as e:
logger.critical(f"Invalid configuration: {e}")
raise SystemExit(1)
Logging Standards
Log Levels
- DEBUG: Detailed diagnostic information
- INFO: General operational messages
- WARNING: Recoverable errors, degraded functionality
- ERROR: Serious errors that need attention
- CRITICAL: System-threatening errors
Log Format
# Include context for debugging
logger.info(f"User {user_id} executed command '{command}' in guild {guild_id}")
logger.error(f"Database query failed for user {user_id}: {error}")
logger.warning(f"Rate limit hit for guild {guild_id}, using cached data")
Error Recovery & Graceful Degradation
Service Initialization
class MyService:
def __init__(self):
try:
self._initialize()
except Exception as e:
capture_exception_safe(e, extra_context={"service": "MyService"})
raise TuxConfigurationError(f"Failed to initialize MyService: {e}") from e
Graceful Degradation
try:
# Try primary operation
result = await primary_api_call()
except TuxAPIConnectionError:
logger.warning("Primary API unavailable, using fallback")
result = await fallback_operation()
except TuxConfigurationError as e:
logger.warning(f"Skipping feature due to configuration: {e}")
return # Skip feature gracefully
Command Error Handling
@commands.command()
async def my_command(self, ctx: commands.Context[Tux]) -> None:
try:
result = await some_operation()
await ctx.send(f"Result: {result}")
except TuxAPIConnectionError:
await ctx.send("❌ External service is currently unavailable")
except TuxPermissionError as e:
await ctx.send(f"❌ {e}")
except Exception as e:
capture_exception_safe(e, extra_context={"command": "my_command", "user_id": ctx.author.id})
await ctx.send("❌ An unexpected error occurred")
raise # Re-raise for global error handler
Testing Error Handling
Unit Tests
async def test_http_error_handling():
"""Test HTTP error handling."""
with patch('httpx.AsyncClient.get') as mock_get:
mock_get.side_effect = httpx.TimeoutException()
result = await service.fetch_data("http://example.com")
assert result is None
Integration Tests
async def test_command_with_db_error():
"""Test command behavior during database errors."""
with patch.object(db, 'create_user') as mock_create:
mock_create.side_effect = DatabaseError()
# Verify graceful handling
Sentry Integration
Automatic Error Tracking
- All unhandled exceptions are automatically sent to Sentry
- Include user context (ID, guild, command) for debugging
- Use Sentry's breadcrumbs for operation tracking
Manual Error Reporting
from tux.services.sentry import capture_exception_safe
try:
critical_operation()
except Exception as e:
logger.error(f"Critical operation failed: {e}")
capture_exception_safe(e, extra_context={"user_id": user_id})
raise
Migration Guidelines
Existing Code
- Identify critical paths: HTTP, database, file operations
- Add specific error handling: Replace broad catches
- Improve user feedback: Replace generic error messages
- Add logging: Include context for debugging
New Code
- Plan error scenarios: What can go wrong?
- Choose handling strategy: Global vs local
- Implement graceful degradation: Fallback options
- Add comprehensive logging: Success and failure cases
Code Review Checklist
Error Handling Review
- [ ] Are all external operations (HTTP, DB, file I/O) wrapped in try/except?
- [ ] Are exceptions specific rather than broad
Exception
catches? - [ ] Do error messages provide helpful information to users?
- [ ] Is appropriate logging included for debugging?
- [ ] Are errors properly chained to preserve context?
- [ ] Does the code degrade gracefully on errors?
- [ ] Are critical errors properly escalated?
User Experience Review
- [ ] Do users receive meaningful feedback on errors?
- [ ] Are internal errors hidden from users?
- [ ] Is the bot still functional after errors?
- [ ] Are error messages actionable when possible?
Performance Considerations
Error Handling Overhead
- Keep error handling lightweight
- Avoid expensive operations in exception handlers
- Use lazy evaluation for error context
Resource Cleanup
async def process_file(self, file_path: str) -> None:
"""Process file with proper cleanup."""
file_handle = None
try:
file_handle = await aiofiles.open(file_path)
await self.process_data(file_handle)
except Exception as e:
logger.error(f"File processing failed: {e}")
raise
finally:
if file_handle:
await file_handle.close()
Monitoring & Alerting
Key Metrics
- Error rate by command/module
- Response time degradation during errors
- User-facing error frequency
- Critical system error alerts
Dashboards
- Real-time error tracking via Sentry
- Command success/failure rates
- Infrastructure health monitoring
- User experience impact metrics
Quick Reference
Common Patterns
# HTTP with fallback
try:
response = await http_client.get(url)
response.raise_for_status()
return response.json()
except Exception as e:
logger.warning(f"API call failed: {e}")
return fallback_data
# Database with user feedback
try:
await db.operation()
except Exception as e:
logger.error(f"Database error: {e}")
await ctx.reply("Database temporarily unavailable.")
# Service initialization
try:
await service.initialize()
except Exception as e:
logger.error(f"Service init failed: {e}")
self.enabled = False
When to Use Global vs Local
- Global: User input errors, command validation, permissions
- Local: Infrastructure, background tasks, service initialization
Error Message Guidelines
- Be specific but not technical
- Suggest solutions when possible
- Include relevant context (what failed)
- Maintain consistent tone and format
This guide should be updated as error handling patterns evolve. All team members should follow these standards for consistent, reliable error handling across the Tux codebase.