Error Handling Best Practices¶

Work in progress

This section is a work in progress. Please help us by contributing to the documentation.

Error handling is crucial for building reliable Discord bots. Users expect your bot to handle failures gracefully, provide helpful feedback, and continue operating even when things go wrong. Good error handling separates professional bots from fragile ones.

Errors vs Exceptions in Python¶

Understanding the difference between errors and exceptions helps you handle them appropriately:

Errors are fundamental coding mistakes that prevent a program from running altogether. Errors are commonly detected during compilation, and the code won't execute until they are fixed. Examples include syntax errors and indentation errors.

Exceptions are a subcategory of errors that occur during program execution (runtime) when your code encounters an unexpected situation, such as trying to divide by zero or accessing a non-existent dictionary key. Unlike errors, exceptions can be caught and handled within your code.

Python

# Error: Syntax error - prevents execution
print("Hello World"  # Missing closing parenthesis

# Exception: Runtime error - can be handled
result = 10 / 0  # Raises ZeroDivisionError - can be caught

In Tux, we focus on handling exceptions gracefully while preventing errors through proper code review and linting.

Common Python Exceptions¶

Python provides built-in exception classes for different error scenarios. Understanding these helps you catch and handle them appropriately:

Type and Value Exceptions¶

TypeError: Raised when an operation is performed on incompatible types

Python

print("10" + 5)  # Raises TypeError

ValueError: Occurs when a function receives an argument with an unsuitable value

Python

num = int("abc")  # Raises ValueError

Data Structure Exceptions¶

KeyError: Raised when trying to access a non-existent dictionary key

Python

data = {"name": "Alice"}
print(data["age"])  # Raises KeyError

IndexError: Occurs when accessing an out-of-range index

Python

numbers = [1, 2, 3]
print(numbers[5])  # Raises IndexError

Mathematical Exceptions¶

ZeroDivisionError: Triggered when attempting to divide by zero

Python

result = 10 / 0  # Raises ZeroDivisionError

OverflowError: Triggered when the resulting value exceeds the allowed number range

File Operation Exceptions¶

FileNotFoundError: Raised when attempting to access a non-existent file

Python

with open("missing_file.txt", "r") as file:
    content = file.read()  # Raises FileNotFoundError

IOError: Raised when an I/O operation fails (file operations, network operations)
PermissionError: Raised when trying to perform an operation without adequate permissions

Import and Module Exceptions¶

ImportError: Raised when importing a module fails
ModuleNotFoundError: Raised when a module cannot be found (subclass of ImportError)

Runtime Exceptions¶

RuntimeError: Raised when an error doesn't fall into any specific category
AttributeError: Raised when attribute assignment or reference fails
AssertionError: Raised when an assert statement fails

System Exceptions¶

KeyboardInterrupt: Raised when the user presses interrupt keys (Ctrl+C or Delete)
SystemExit: Raised when sys.exit() is called
MemoryError: Raised when programs run out of memory

System Exceptions

KeyboardInterrupt and SystemExit inherit from BaseException, not Exception. They should generally be allowed to propagate for proper program shutdown. See the section on The Critical Difference: except: vs except Exception: for more details.

Understanding Tux's Exception System¶

Tux uses a hierarchical exception system built on TuxError. This hierarchy lets you catch errors at the right level of specificity. Database errors inherit from TuxDatabaseError, API errors from TuxAPIError, and permission errors from TuxPermissionError.

Use specific exception types instead of generic ones. When you raise TuxDatabaseConnectionError, callers can handle database connection issues specifically. They can retry, fall back to cached data, or show appropriate error messages. Generic exceptions force callers to guess what went wrong.

The exception hierarchy includes:

Database errors - Connection failures, query errors, migration issues
API errors - HTTP failures, rate limits, resource not found
Permission errors - Insufficient permissions, invalid access levels
Configuration errors - Invalid settings, missing required values
Service errors - Cog loading failures, hot reload issues

When you catch exceptions, chain them properly with raise ... from e. This preserves the original error context, making debugging easier. The error chain shows what happened at each level, from the original cause to the final error.

Error Categories and Strategies¶

User Errors¶

User errors happen when users provide invalid input or lack permissions. These errors should be handled globally and shown to users with friendly messages. Don't log these as errors—they're expected user behavior.

Examples include invalid command arguments, missing permissions, rate limits, and command not found. The global error handler catches these and converts them to user-friendly messages automatically.

Raise specific exceptions for user errors. Use commands.BadArgument for invalid input, commands.MissingPermissions for permission issues, and TuxPermissionLevelError for custom permission checks. The global handler knows how to format these for users.

Infrastructure Errors¶

Infrastructure errors occur when external systems fail—databases timeout, APIs go down, networks disconnect. These errors need local handling with graceful degradation. Your bot should continue operating even when some services are unavailable.

Handle infrastructure errors locally with fallbacks. If an external API fails, use cached data. If the database is temporarily unavailable, queue operations for later. If file operations fail, log the error and continue with reduced functionality.

Log infrastructure errors appropriately. Use warning level for transient failures that you handle gracefully. Use error level for failures that require attention but don't stop the bot. Use critical level only for failures that prevent core functionality.

System Errors¶

System errors indicate serious problems—invalid configuration, missing dependencies, critical bugs. These errors should fail fast with clear error messages. Don't try to continue operating with broken configuration or missing dependencies.

For configuration errors, validate early and fail immediately. Don't let the bot start with invalid configuration—it will cause confusing errors later. Provide clear error messages explaining what's wrong and how to fix it.

For critical bugs, log the error with full context and shut down gracefully. Don't try to recover from programming errors—they indicate bugs that need fixing, not runtime conditions to handle.

Core Principles¶

Fail Gracefully, Log Aggressively¶

When operations fail, handle them gracefully but log everything. Users shouldn't see technical errors, but you need detailed logs for debugging. Return None or empty results instead of crashing, but log the failure with full context.

Include context in your logs. Log user IDs, command names, operation types, and relevant parameters. This context helps you understand what was happening when errors occurred. Without context, error logs are useless for debugging.

Use appropriate log levels. Debug for detailed execution flow. Info for normal operations and recoverable issues. Warning for problems that don't prevent functionality. Error for failures that need attention. Critical only for failures that stop core functionality.

Be Specific, Not Generic¶

Catch specific exceptions, not generic ones. When you catch TuxDatabaseConnectionError, you know it's a connection issue and can retry or use cached data. When you catch Exception, you don't know what went wrong and can't handle it appropriately.

Handle exceptions at the right level. Catch specific exceptions where you can handle them meaningfully. Let exceptions propagate to the global handler when you can't handle them locally. Don't catch exceptions just to log them—let them propagate to handlers that can actually deal with them.

Keep Try Blocks Focused¶

The scope of your try blocks directly impacts how effectively you can diagnose and fix issues. Large, encompassing try blocks often mask the true source of errors and make debugging significantly more challenging.

Problem: Large try blocks¶

Python

# ❌ BAD: Too broad, impossible to determine which operation failed
try:
    user = await db.get_user(user_id)
    profile = await process_user_data(user)
    await send_notification(profile)
except Exception as e:
    logger.error(f"Operation failed: {e}")
    # Which operation failed? Database? Processing? Notification?

Solution: Focused try blocks¶

Python

# ✅ GOOD: Each operation has its own error boundary
try:
    user = await db.get_user(user_id)
except TuxDatabaseConnectionError as e:
    logger.error(f"Database connection failed: {e}")
    return None

try:
    profile = await process_user_data(user)
except TuxValidationError as e:
    logger.error(f"User data invalid: {e}")
    return None

try:
    await send_notification(profile)
except TuxAPIConnectionError as e:
    logger.error(f"Notification delivery failed: {e}")
    # Continue execution - notification failure shouldn't stop the command

Benefits of focused try blocks:

Clear error source: You immediately know which operation failed
Operation-specific context: Error messages contain relevant details
Appropriate handling: Different failures can be handled differently
Easier debugging: The error source is immediately apparent
Specific recovery: Each operation can implement its own recovery strategy

When to use focused try blocks:

When operations have different failure modes
When you need different error handling for each operation
When you want to continue execution after some failures
When debugging production issues (focused blocks make diagnosis easier)

The extra code is a worthwhile trade-off for the clarity and control it provides. While it might seem verbose, this pattern becomes invaluable as applications grow in complexity and when debugging production issues.

The Critical Difference: `except:` vs `except Exception:`¶

Critical Distinction

Never use bare except: - it catches system exceptions that should propagate.

There's an important difference between except: and except Exception::

except: (bare except) - Catches EVERYTHING:

Catches all exceptions, including KeyboardInterrupt (Ctrl+C) and SystemExit (sys.exit())
These system exceptions should normally propagate to allow proper shutdown
Avoid this pattern unless you have a very specific reason (like a top-level error handler in a web service)

except Exception: - Catches user exceptions only:

Catches exceptions that inherit from Exception
Does NOT catch KeyboardInterrupt or SystemExit (they inherit from BaseException)
Use this when you need to catch all user exceptions but allow system signals to propagate

Exception Hierarchy:

Python

BaseException
├── KeyboardInterrupt  # Ctrl+C - should propagate
├── SystemExit         # sys.exit() - should propagate
└── Exception          # User exceptions - can be caught
    ├── ValueError
    ├── TypeError
    ├── TuxError
    └── ... (all other exceptions)

Example of the problem:

Python

# ❌ BAD: Bare except catches KeyboardInterrupt
try:
    process_data()
except:  # This catches Ctrl+C!
    logger.error("Error occurred")
    # User can't interrupt the program with Ctrl+C

# ✅ GOOD: except Exception allows KeyboardInterrupt to propagate
try:
    process_data()
except Exception as e:  # KeyboardInterrupt will propagate
    logger.error(f"Error occurred: {e}")

When to use each:

except Exception: - Use when you need to catch all user exceptions (e.g., top-level error handlers, cleanup code)
except: (bare) - Only use in very specific cases like web service error handlers where you must catch everything to prevent crashes
Specific exceptions - Always prefer catching specific exception types when possible

Best practice: Always use except Exception: instead of bare except: unless you have a compelling reason to catch system exceptions.

Chain exceptions properly. When you catch an exception and raise a new one, use raise ... from e to preserve the chain. This shows the full error path from original cause to final error, making debugging much easier.

Exception Re-Raising Patterns¶

Python provides several patterns for re-raising exceptions, each with different behaviors regarding traceback preservation. Understanding these patterns helps you choose the right approach for each situation.

Pattern 1: Bare `raise` (Re-raise Same Exception)¶

Use when: You want to log or perform cleanup, then re-raise the same exception with its original traceback.

Python

try:
    result = risky_operation()
except ValueError as e:
    logger.error(f"Validation failed: {e}")
    raise  # Re-raises the same exception with original traceback

Characteristics:

Preserves the original exception type and traceback
Shows exactly where the original error occurred
Good default pattern for logging before re-raising

Note: raise e is almost equivalent to bare raise, but bare raise is preferred as it's more explicit about preserving the traceback.

Pattern 2: Raise New Exception (Without `from`)¶

Use when: You want to raise a different exception type but still preserve the original traceback context.

Python

try:
    result = divide(x, y)
except ZeroDivisionError:
    raise ValueError("Division by zero is not allowed")

Characteristics:

Raises a new exception type with a custom message
Still includes the original exception's traceback in the output
The original exception is available but not explicitly chained

When to avoid: When you need explicit exception chaining (use Pattern 4 instead).

Pattern 3: Raise New Exception `from None`¶

Use when: You want to hide the original exception's traceback for security or abstraction reasons.

Python

try:
    result = authenticate_user(email, password)
except InvalidCredentialsError:
    raise AuthenticationError("Invalid credentials") from None

Characteristics:

Suppresses the original exception's traceback
Only shows the new exception in the traceback
Useful for hiding implementation details or sensitive information

Security use case: When the original exception might leak sensitive information:

Python

try:
    user = login(email="user@example.com", password="secret")
except InvalidPasswordFormatError:
    # Don't expose password format details to potential attackers
    raise AuthenticationError("Authentication failed") from None

When to avoid: When you need the original traceback for debugging (use Pattern 4 instead).

Pattern 4: Exception Chaining (`from e`) - Recommended¶

Use when: You want to raise a new exception type while preserving the original exception as the cause.

Python

try:
    result = divide(x, y)
except ZeroDivisionError as e:
    raise ValueError("Division by zero is not allowed") from e

Characteristics:

Explicitly chains the original exception as the cause
Shows both exceptions in the traceback
Allows access to original exception via e.__cause__
Best practice for raising new exceptions from caught ones

Example with custom exceptions:

Python

try:
    content = read_file(file_path)
except FileNotFoundError as e:
    raise TuxAPIResourceNotFoundError(
        service_name="FileSystem",
        resource_identifier=file_path,
    ) from e

When to use:

Converting library exceptions to your own exception types
Providing more context while preserving original error
Debugging scenarios where you need the full error chain

Pattern Comparison¶

Pattern	Preserves Original Traceback	Shows Both Exceptions	Use Case
Bare `raise`	✅ Yes	N/A (same exception)	Logging before re-raising
Raise new (no `from`)	✅ Yes	⚠️ Implicit	Changing exception type
`from None`	❌ No	❌ No	Security/abstraction
`from e`	✅ Yes	✅ Yes	Best practice

Exception Re-Raising Best Practices¶

Use from e when raising a new exception - This is the recommended pattern for preserving error context.
Use bare raise for logging - When you just need to log and re-raise the same exception.
Use from None sparingly - Only when you need to hide sensitive information or implementation details.
Avoid Pattern 2 - Prefer Pattern 4 (from e) over raising new exceptions without from, as it's more explicit and informative.
Don't suppress exceptions unnecessarily - Suppressing exceptions makes debugging harder, so only use from None when absolutely necessary.

Using try/except/else/finally Blocks¶

Python's exception handling supports four blocks that work together:

try: Execute code that may raise an exception
except: Catch and handle exceptions
else: Execute code only if no exception occurs (optional)
finally: Run code regardless of whether an exception was raised (always executes)

Python

try:
    file = open("example.txt", "r")
except FileNotFoundError:
    print("File not found!")
else:
    # Only executes if no exception occurred
    print("File opened successfully!")
    content = file.read()
    print(content)
    file.close()
finally:
    # Always executes, even if exception occurred
    print("End of file operation.")

Use finally blocks for cleanup operations that must always run, such as closing files, releasing locks, or cleaning up temporary resources.

Catching Multiple Exceptions¶

You can catch multiple exception types by chaining except blocks or using a tuple:

Python

# Method 1: Tuple in single except block
try:
    x = int(input("Enter a number: "))
    print(10 / x)
except (ValueError, ZeroDivisionError) as e:
    print(f"Error: {e}")

# Method 2: Separate except blocks for different handling
try:
    x = int(input("Enter a number: "))
    print(10 / x)
except ValueError:
    print("Invalid input! Please enter a number.")
except ZeroDivisionError:
    print("Cannot divide by zero!")

Use separate except blocks when you need different handling for each exception type. Use a tuple when the handling is the same.

Nested try-except Blocks¶

Nested try-except blocks allow you to handle exceptions at different levels of code execution. This is useful when you need different error handling strategies for different parts of an operation:

Python

try:
    # Outer try block - handles high-level errors
    try:
        # Inner try block - handles specific operation errors
        file = open("data.txt", "r")
        content = file.read()
        file.close()
    except FileNotFoundError:
        # Handle file not found at the inner level
        logger.warning("Data file not found, using defaults")
        content = get_default_data()
    except PermissionError:
        # Handle permission errors at the inner level
        logger.error("Permission denied accessing data file")
        raise  # Re-raise to outer handler
except Exception as e:
    # Outer handler catches any unhandled errors
    logger.error(f"Unexpected error in data loading: {e}")
    capture_exception_safe(e)
    raise

In this example, FileNotFoundError is handled gracefully at the inner level with a fallback, while PermissionError is re-raised to the outer handler. The outer handler catches any unexpected exceptions and logs them to Sentry.

Use nested try-except blocks when:

Different parts of an operation need different error handling
You want to handle some errors locally and let others propagate
You need to perform cleanup at different levels

Using traceback for Detailed Error Information¶

The Python traceback module provides detailed information about exceptions, including the full call stack:

Python

import traceback

try:
    num = int("abc")
except ValueError as e:
    print("An error occurred!")
    traceback.print_exc()  # Prints full traceback information

In Tux, we use traceback in our error logging to provide complete context for debugging. The global error handler automatically includes tracebacks for errors sent to Sentry.

Error Handling Patterns¶

Database Operations¶

Database operations can fail for many reasons—connection issues, query errors, constraint violations. Handle each type of failure appropriately. Connection errors might be transient and worth retrying. Query errors indicate problems with the query itself and shouldn't be retried.

Implement retry logic for transient failures. Use exponential backoff to avoid overwhelming the database during outages. Limit retry attempts to prevent infinite loops. After retries are exhausted, fail gracefully with appropriate error messages.

For query errors, don't retry—the query itself is wrong. Log the error with the query details and raise an appropriate exception. Let callers handle the failure or let it propagate to the global handler.

Use database transactions for multi-step operations. If any step fails, roll back the entire transaction. This keeps your database consistent and prevents partial updates that cause data corruption.

External API Calls¶

External APIs can fail in many ways—timeouts, rate limits, server errors, network issues. Handle each failure mode appropriately. Timeouts might be worth retrying. Rate limits need backoff. Server errors might be transient.

Check HTTP status codes to determine appropriate handling. 404 means the resource doesn't exist—handle it gracefully. 403 might mean rate limiting—back off and retry. 500 means server error—might be transient, worth retrying.

Use timeouts for all API calls. Don't let API calls hang indefinitely. Set reasonable timeouts based on expected response times. If an API call times out, log it and handle it appropriately—retry, use cached data, or fail gracefully.

Convert HTTP exceptions to Tux exceptions. Use convert_httpx_error() to automatically convert httpx exceptions to TuxAPIError subclasses and report them to Sentry. This provides consistent error handling throughout your code and eliminates duplicated error handling logic.

Python

from tux.services.sentry import convert_httpx_error

try:
    response = await httpx.get(endpoint)
except Exception as e:
    convert_httpx_error(
        e,
        service_name="GitHub",
        endpoint="repos.get",
        not_found_resource=f"{owner}/{repo}",  # Optional: for 404 errors
    )

The convert_httpx_error() function automatically:

Converts 404 errors to TuxAPIResourceNotFoundError
Converts 403 errors to TuxAPIPermissionError
Converts other HTTP status errors to TuxAPIRequestError
Converts connection errors to TuxAPIConnectionError
Reports all errors to Sentry with appropriate context

File Operations¶

File operations can fail due to permissions, disk space, or I/O errors. Handle these failures gracefully. Permission errors should be logged and raised as TuxPermissionError. I/O errors might be transient and worth retrying.

Use atomic file operations when possible. Write to temporary files first, then rename them atomically. This prevents partial writes from corrupting files. If the write fails, the original file remains intact.

Clean up temporary files in finally blocks. Even if operations fail, ensure temporary files are deleted. Use context managers to ensure cleanup happens automatically, even when exceptions occur.

Command Error Handling¶

Global Error Handler Integration¶

Tux's global error handler automatically catches command errors and converts them to user-friendly messages. Focus on raising appropriate exceptions in your commands—the handler takes care of formatting and sending responses.

The global handler categorizes errors automatically. It knows how to format Discord API errors, permission errors, validation errors, and Tux exceptions. It provides user-friendly messages, logs errors appropriately, and reports to Sentry with context.

Don't try to handle user-facing errors in commands. Raise appropriate exceptions and let the global handler format them. This keeps your command code focused on business logic and ensures consistent error handling across all commands.

For infrastructure errors in commands, handle them locally and provide fallback behavior. If a database query fails, return a cached result or show a message explaining the issue. Don't let infrastructure failures become user-facing errors unless necessary.

Custom Validation Errors¶

Create custom validation errors for complex validation logic. When validation fails, raise a TuxValidationError with details about what failed and why. The global handler formats these for users automatically.

Include field names and values in validation errors. This helps users understand what they entered incorrectly. Explain why validation failed—"must be at least 3 characters" is more helpful than "invalid input".

Validate early and fail fast. Check input validity before doing expensive operations. Don't waste resources processing invalid input. Return clear error messages explaining what's wrong and how to fix it.

Handling Exceptions in Loops¶

When processing multiple items in a loop, handle exceptions for each iteration independently. This prevents a single failure from stopping the entire loop:

Python

users = [1, 2, 3, 4, 5]
results = []

for user_id in users:
    try:
        user = await db.get_user(user_id)
        results.append(user)
    except TuxAPIResourceNotFoundError:
        logger.warning(f"User {user_id} not found, skipping")
        # Continue to next iteration
    except Exception as e:
        logger.error(f"Unexpected error fetching user {user_id}: {e}")
        capture_exception_safe(e)
        # Continue to next iteration - don't let one failure stop the loop

logger.info(f"Successfully processed {len(results)}/{len(users)} users")

Key principles for loop exception handling:

Catch specific exceptions for expected failures (like missing resources)
Log and continue for recoverable errors
Track successes and failures separately to understand overall health
Don't let one failure stop the entire batch unless it's critical

For batch operations, consider collecting exceptions and processing them after the loop completes:

Python

users = [1, 2, 3, 4, 5]
successes = []
failures = []

for user_id in users:
    try:
        user = await db.get_user(user_id)
        successes.append(user)
    except Exception as e:
        failures.append((user_id, e))
        logger.error(f"Failed to fetch user {user_id}: {e}")

logger.info(f"Batch complete: {len(successes)} succeeded, {len(failures)} failed")
if failures:
    # Process failures separately
    for user_id, error in failures:
        capture_exception_safe(error, extra_context={"user_id": user_id})

Async Error Handling¶

Task Exception Handling¶

When running multiple operations concurrently, handle exceptions in each task independently. Use asyncio.gather() with return_exceptions=True to collect exceptions as results instead of stopping at the first failure.

Process results after gathering completes. Check each result to see if it's an exception or a value. Handle exceptions appropriately—log them, retry operations, or collect them for batch processing. Successful operations shouldn't be affected by failures in other operations.

For batch operations, track successes and failures separately. Log summary information about how many operations succeeded and failed. This helps you understand the overall health of batch operations and identify patterns in failures.

Timeout Handling¶

Always use timeouts for long-running operations. Don't let operations hang indefinitely. Set reasonable timeouts based on expected operation duration. If operations consistently timeout, investigate why they're slow.

When timeouts occur, cancel the operation and handle the cancellation gracefully. Log timeout information to help identify slow operations. Consider whether timeouts indicate problems that need fixing or just slow external services.

Use asyncio.wait_for() for timeout handling. This automatically cancels operations that exceed their timeout. Handle TimeoutError appropriately—retry, use fallbacks, or fail gracefully with clear error messages.

Example: Handling Exceptions in Async Functions¶

Python

import asyncio

async def fetch_user_data(user_id: int):
    """Fetch user data from an external API."""
    try:
        response = await httpx.get(f"https://api.example.com/users/{user_id}")
        response.raise_for_status()
        return response.json()
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 404:
            raise TuxAPIResourceNotFoundError(
                service_name="ExampleAPI",
                resource_identifier=f"user_{user_id}",
            ) from e
        raise TuxAPIRequestError(
            service_name="ExampleAPI",
            status_code=e.response.status_code,
        ) from e
    except httpx.RequestError as e:
        raise TuxAPIConnectionError(
            service_name="ExampleAPI",
            original_error=e,
        ) from e

async def main():
    try:
        user_data = await fetch_user_data(123)
        print(f"User: {user_data}")
    except TuxAPIResourceNotFoundError:
        logger.warning("User not found")
    except TuxAPIConnectionError as e:
        logger.error(f"Connection error: {e}")
        capture_exception_safe(e)
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        capture_exception_safe(e)

asyncio.run(main())

Important notes for async exception handling:

Always await async functions - calling without await returns a coroutine object, not the result
Handle asyncio.CancelledError carefully - don't catch it unless you're handling cancellation specifically
Use asyncio.gather() with return_exceptions=True for concurrent operations where you want all results, even if some fail

Context Managers for Error Handling¶

Context managers in Python shine when it comes to resource management. While they're commonly associated with file operations, their scope extends far beyond basic file handling. The real power of context managers lies in their ability to handle setup and cleanup operations automatically, whether your code executes successfully or raises an exception.

When to Use Context Managers¶

Context managers are particularly crucial when dealing with:

Database connections and transactions - Ensure connections are closed and transactions are committed or rolled back
Network sockets and API sessions - Close connections even when errors occur
Memory-intensive operations - Release resources promptly
Thread locks and semaphores - Ensure locks are released
Temporary file management - Clean up temporary files automatically

Database Transactions¶

Use context managers for database transactions. They ensure transactions are committed on success and rolled back on failure. This keeps your database consistent and prevents partial updates.

Example: Custom Transaction Context Manager¶

Python

from contextlib import contextmanager

@contextmanager
def db_transaction(db_service):
    """Context manager for database transactions."""
    async with db_service.session() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise
        # Connection automatically closed by session context manager

Usage:

Python

async def update_user_profile(user_id: int, data: dict) -> None:
    async with db_transaction(db_service) as session:
        user = await session.get(User, user_id)
        if user:
            user.profile = data
            # Auto-committed on success, rolled back on exception

Benefits:

Automatic commit on success
Automatic rollback on failure
Connection cleanup guaranteed
No manual transaction management needed

Log transaction outcomes appropriately. Log successful commits at debug level. Log rollbacks at warning level with error details. This helps you understand transaction patterns and identify problematic operations.

Resource Cleanup¶

Use context managers for all resources that need cleanup—files, network connections, temporary data. Context managers ensure cleanup happens even when exceptions occur. Don't rely on manual cleanup in finally blocks—use context managers instead.

Example: File Operations¶

Python

# ✅ GOOD: Context manager ensures file is closed
with open("data.txt", "r") as file:
    content = file.read()
    # File automatically closed, even if an exception occurs

# ❌ BAD: Manual cleanup can be forgotten
file = open("data.txt", "r")
try:
    content = file.read()
finally:
    file.close()  # Easy to forget or miss in complex code

Example: Custom Resource Context Manager¶

Python

from contextlib import contextmanager

@contextmanager
def temporary_cache():
    """Context manager for temporary cache operations."""
    cache = {}
    try:
        yield cache
    finally:
        cache.clear()  # Always cleanup, even on exceptions

# Usage
with temporary_cache() as cache:
    cache["key"] = "value"
    # Cache automatically cleared when exiting

Best practices:

Identify paired operations: Look for operations that require setup and cleanup
Reserve for resource management: Not every operation needs a context manager
Use for guaranteed cleanup: When cleanup must happen regardless of success or failure
Centralize logic: Context managers centralize setup and cleanup logic, making code more maintainable

Context managers particularly excel in production environments where resource leaks can cause significant issues. They enforce clean resource management patterns and make code more maintainable by centralizing setup and cleanup logic.

Testing Error Conditions¶

Exception Testing¶

Test that your code raises appropriate exceptions. Use pytest.raises() to verify exceptions are raised for invalid inputs or error conditions. Test that exception messages are helpful and include relevant context.

Test error handling paths. Don't just test happy paths—test what happens when things go wrong. Verify that errors are handled gracefully, resources are cleaned up, and appropriate exceptions are raised.

Mock external dependencies to simulate failures. Test how your code handles database failures, API timeouts, and network errors. Verify that fallbacks work correctly and errors are logged appropriately.

Integration Testing¶

Test error scenarios in integration tests. Verify that errors propagate correctly through your system. Test that global error handlers format errors appropriately and that infrastructure errors are handled gracefully.

Test graceful degradation. Verify that your bot continues operating when some services fail. Test that cached data is used when APIs are unavailable and that operations queue correctly when databases are down.

Performance Considerations¶

Avoid Expensive Operations in Error Paths¶

Don't do expensive computations in error handling code. Error paths should be fast—they're already handling failures, so don't make them slow too. Pre-compute expensive values before operations, or use lazy evaluation.

If you need expensive data for error logging, compute it before the operation starts. That way, if the operation fails, you already have the data for logging. Don't compute it only when errors occur—that makes error handling slow.

Use lightweight error information. Include IDs, names, and simple values in error logs. Don't serialize large objects or compute complex summaries in error handlers. Keep error handling fast so it doesn't slow down your bot.

Exception Creation Cost¶

Create exceptions only when needed. Don't create exceptions during validation—collect validation errors and create a single exception at the end. This avoids the overhead of exception creation for common validation failures.

Use exception chaining efficiently. Don't create multiple exception objects unnecessarily. Chain exceptions properly to preserve context without creating redundant exception objects.

Security: Protecting Sensitive Data¶

Critical Security Concern

Never log passwords, API keys, tokens, or other sensitive information in error messages or logs.

The Problem¶

When exceptions occur, error messages and stack traces may contain sensitive data that gets logged or sent to error monitoring services. This can expose:

Database connection strings with passwords
API keys in URLs or request headers
Authentication tokens
User credentials
Secret keys

The Solution¶

Tux automatically sanitizes sensitive data before sending events to Sentry. Our before_send handler removes or masks:

Database URLs: postgresql://user:password@host → postgresql://***:***@host
API keys in URLs: ?api_key=xxx → ?***
Bearer tokens: Bearer xxxxx → Bearer ***
Authorization headers: Authorization: xxxxx → Authorization: ***

Best Practices¶

Never include secrets in error messages: When raising exceptions, don't include sensitive data in the message string.

Python

# BAD: Exposes API key in error message
raise ValueError(f"API call failed with key {api_key}")

# GOOD: Generic error message
raise ValueError("API call failed - check configuration")

Sanitize before logging: If you must log data that might contain secrets, sanitize it first.

Python

# BAD: Logs full URL with API key
logger.error(f"Request failed: {request_url}")

# GOOD: Sanitize sensitive parts
sanitized_url = _sanitize_url(request_url)
logger.error(f"Request failed: {sanitized_url}")

Use environment variables: Store secrets in environment variables, never in code or error messages.
Validate error context: Before adding context to Sentry events, ensure it doesn't contain sensitive data.

Example: Secure Error Handling¶

Python

import os
import httpx

api_key = os.getenv("SECRET_API_KEY")

try:
    response = httpx.get(f"https://api.example.com/data?key={api_key}")
    response.raise_for_status()
except httpx.HTTPError as e:
    # BAD: This would log the API key
    # logger.error(f"API request failed: {e}")

    # GOOD: Generic error message
    logger.error("API request failed - check connection and configuration")
    capture_api_error(e, endpoint="api.example.com/data")
    # Note: Our Sentry handler automatically sanitizes any sensitive data

Anti-Patterns¶

Silent Failures¶

Never silently swallow exceptions. At minimum, log exceptions with context. Silent failures make debugging impossible—you don't know what went wrong or where. Always log exceptions, even if you handle them gracefully.

If you must handle exceptions silently, log them at debug level. This preserves information for debugging while not cluttering error logs. But prefer explicit error handling—silent failures are usually bugs waiting to happen.

Re-raising with Generic Exceptions¶

Don't lose exception context when re-raising. Always use raise ... from e to chain exceptions. Generic re-raising loses the original error context, making debugging much harder.

Preserve exception chains throughout your code. When you catch an exception and raise a new one, chain them properly. This shows the full error path from original cause to final error.

Overly Broad Exception Handling¶

Don't catch Exception unless you're at the top level of error handling. Catching Exception catches everything, including KeyboardInterrupt and SystemExit, which should propagate. Catch specific exceptions where you can handle them meaningfully.

Handle exceptions at the right level. Catch specific exceptions where you can handle them. Let exceptions propagate to handlers that can actually deal with them. Don't catch exceptions just to log them—let them propagate to appropriate handlers.

Error Monitoring¶

Sentry Integration¶

Tux provides specialized Sentry utilities for different error types:

capture_database_error() - For database failures
convert_httpx_error() - For HTTPX API failures (recommended for API wrappers)
capture_api_error() - For other API failures
capture_cog_error() - For cog errors

These utilities automatically add relevant context.

Use specialized capture functions instead of generic capture_exception(). They add domain-specific context automatically, making errors easier to debug. Database errors include query details, API errors include endpoint information, and cog errors include command context.

Set context before operations, not after. If you set context after an error occurs, you'll miss valuable debugging information. Set user context, command context, and custom tags before doing work, so they're available when errors occur.

Error Metrics¶

Track error patterns to identify problematic areas. Monitor error rates by command, by module, and by error type. This helps you identify which parts of your bot are most error-prone and prioritize fixes.

Set up alerts for critical errors. When database connections fail or configuration errors occur, you need to know immediately. Don't wait for users to report problems—monitor error rates and alert on anomalies.

Monitor user-facing error frequency. If users see many errors, investigate why. User-facing errors indicate problems that need immediate attention. Track these separately from internal errors that don't affect users.

Why Exception Handling Matters¶

Effective error and exception handling is essential for building robust applications. Here's why:

1. Prevents Unexpected Program Crashes¶

When an unhandled exception occurs, Python terminates the program immediately. By handling exceptions, you prevent crashes and provide graceful recovery mechanisms. Your bot continues operating even when individual operations fail.

2. Provides Meaningful Feedback to Users¶

Displaying informative error messages helps users understand what went wrong and how to correct their input or actions. Instead of showing cryptic tracebacks, users get clear, actionable messages.

3. Simplifies Debugging and Enables Detailed Logging¶

Exception handling allows you to log errors for later analysis, helping pinpoint the source of issues. Logging is especially useful for debugging applications running in production, where you can't interactively debug the code.

4. Ensures Proper Cleanup of Resources¶

Handling exceptions properly ensures that resources like open files, database connections, and network sockets are properly closed, preventing memory leaks and data corruption. Use context managers (with statements) to ensure cleanup happens automatically.

5. Reduces Security Vulnerabilities¶

Proper exception handling prevents sensitive information from being exposed in error messages. Never log passwords, API keys, tokens, or other secrets. Our Sentry integration automatically sanitizes sensitive data before sending events.

Code Quality Tools¶

Linting Your Code¶

Tools like ruff and basedpyright can help detect syntax and indentation issues before execution. These work similarly to a spell checker for your code, helping you catch errors early before they cause runtime issues.

Tux uses:

ruff: Fast Python linter and formatter
basedpyright: Static type checker with strict mode

Run quality checks before committing:

Bash

uv run dev all  # Runs all quality checks including linting

Best Practices Summary¶

Code Structure¶

Handle errors at the right level. Catch specific exceptions where you can handle them meaningfully. Let exceptions propagate to global handlers when you can't handle them locally. Don't catch exceptions just to log them.

Use Tux-specific exceptions instead of generic ones. They provide better error categorization and let handlers format errors appropriately. Generic exceptions force handlers to guess what went wrong.

Chain exceptions properly with raise ... from e. This preserves error context and makes debugging easier. Don't lose exception chains when re-raising.

Use try/except/else/finally blocks appropriately. Use finally for cleanup that must always run. Use else for code that should only run when no exception occurs.

User Experience¶

Provide helpful error messages. Users should understand what went wrong and how to fix it. Don't show technical errors to users—convert them to friendly messages.

Handle errors gracefully. Don't crash on recoverable errors. Use fallbacks, cached data, or reduced functionality to keep your bot operating even when some services fail.

Log errors aggressively with context. Include user IDs, command names, operation types, and relevant parameters. This context is essential for debugging production issues.

Strategic logging principles:

Log levels that mean something: Use DEBUG for detailed execution flow, INFO for normal operations, WARNING for recoverable issues, ERROR for failures that need attention, CRITICAL only for failures that stop core functionality
Contextual information: Include user IDs, command names, operation types, and relevant parameters that help reconstruct the error state
Stack traces when they matter: Use logger.exception() or logger.error(..., exc_info=True) for exceptions, but only when the full traceback is needed
Structured logging in production: Use structured logging formats (JSON) for easier parsing and analysis
Log rotation: Manage log storage efficiently to prevent disk space issues

Example: Comprehensive Error Logging¶

Python

from loguru import logger
from tux.services.sentry import capture_exception_safe

async def process_command(ctx, user_id: int, data: dict) -> None:
    logger.info(f"Processing command for user {user_id}", extra={
        "user_id": user_id,
        "command": ctx.command.name,
        "guild_id": ctx.guild.id if ctx.guild else None,
    })

    try:
        result = await risky_operation(user_id, data)
        logger.debug(f"Operation successful for user {user_id}")
        return result
    except TuxValidationError as e:
        # User error - log as warning, don't send to Sentry
        logger.warning(f"Validation failed for user {user_id}: {e}", extra={
            "user_id": user_id,
            "error_type": type(e).__name__,
            "validation_field": getattr(e, "field", None),
        })
        raise
    except Exception as e:
        # System error - log as error, send to Sentry
        logger.error(
            f"Operation failed for user {user_id}: {e}",
            extra={
                "user_id": user_id,
                "command": ctx.command.name,
                "error_type": type(e).__name__,
            },
        )
        capture_exception_safe(e, extra_context={
            "user_id": user_id,
            "command": ctx.command.name,
        })
        raise

What makes logging exceptional:

Tells a story: When an exception occurs, your logs should tell a story about what led to the failure
Track patterns: Strategic logging helps track user behavior patterns, performance bottlenecks, and system health
Production debugging: In production, logs are often your only window into what happened
Correlation: Include correlation IDs or request IDs to trace operations across services

Performance¶

Avoid expensive operations in error paths. Error handling should be fast—don't make failures slow. Pre-compute expensive values or use lazy evaluation.

Create exceptions only when needed. Don't create exceptions during validation—collect errors and create a single exception at the end. This avoids overhead for common validation failures.

Documentation¶

Document expected exceptions in function docstrings. This helps other developers (and your future self) understand what exceptions a function can raise and how to handle them:

Python

async def fetch_user(user_id: int) -> User:
    """
    Fetch a user from the database.

    Parameters
    ----------
    user_id : int
        The ID of the user to fetch.

    Returns
    -------
    User
        The user object.

    Raises
    ------
    TuxAPIResourceNotFoundError
        If the user does not exist.
    TuxDatabaseConnectionError
        If the database connection fails.
    """
    # Implementation...

Clear documentation of expected exceptions:

Helps callers write appropriate error handling
Makes the API contract explicit
Reduces guesswork about what can go wrong
Improves code maintainability

Use NumPy-style docstrings (as shown above) to document exceptions in the Raises section. This is the standard format used throughout Tux.

Modern Python Exception Handling Features¶

Tux uses Python 3.13+, which includes several modern exception handling features introduced in Python 3.10 and 3.11.

Pattern Matching with `match` Statements (Python 3.10+)¶

Python 3.10 introduced structural pattern matching with match statements, which can elegantly handle multiple exception types:

Python

try:
    result = risky_operation()
except Exception as e:
    match e:
        case ZeroDivisionError():
            logger.error("Division by zero detected")
            raise ValueError("Cannot divide by zero") from e
        case FileNotFoundError():
            logger.warning("File not found, using defaults")
            result = get_default_data()
        case TuxAPIResourceNotFoundError():
            logger.warning(f"Resource not found: {e.resource_identifier}")
            raise
        case _:
            logger.error(f"Unexpected error: {e}")
            capture_exception_safe(e)
            raise

Benefits:

More readable than multiple elif isinstance() checks
Exhaustiveness checking (can use case _: as catch-all)
Pattern matching on exception attributes

When to use:

When handling multiple exception types with different logic
When you want more readable exception handling
When pattern matching on exception attributes

Exception Groups and `except*` (Python 3.11+)¶

Python 3.11 introduced exception groups to handle multiple exceptions simultaneously, particularly useful for concurrent operations:

Python

from exceptiongroups import ExceptionGroup

try:
    # Multiple operations that might raise different exceptions
    results = await asyncio.gather(
        fetch_user_data(user_id),
        fetch_guild_data(guild_id),
        fetch_channel_data(channel_id),
        return_exceptions=True,
    )

    # Check for exceptions in results
    exceptions = [r for r in results if isinstance(r, Exception)]
    if exceptions:
        raise ExceptionGroup("Multiple fetch errors", exceptions)

except* TuxAPIResourceNotFoundError as eg:
    # Handle all resource not found errors
    for error in eg.exceptions:
        logger.warning(f"Resource not found: {error.resource_identifier}")

except* TuxAPIConnectionError as eg:
    # Handle all connection errors
    for error in eg.exceptions:
        logger.error(f"Connection error: {error}")
        capture_exception_safe(error)

except* Exception as eg:
    # Handle any other exceptions
    for error in eg.exceptions:
        logger.error(f"Unexpected error: {error}")
        capture_exception_safe(error)

Benefits:

Handle multiple exceptions of the same type together
Useful for concurrent operations (asyncio, threading)
Cleaner than checking each result individually

When to use:

Concurrent operations where multiple exceptions might occur
Batch processing where you want to handle all errors together
When you need to process all exceptions of a specific type

Adding Notes to Exceptions (Python 3.11+)¶

Python 3.11 added the add_note() method to exceptions, allowing you to add contextual information:

Python

try:
    result = process_user_input(user_data)
except ValueError as e:
    e.add_note("This error occurred while processing user input")
    e.add_note("Consider validating input before processing")
    e.add_note(f"User ID: {user_data.get('id', 'unknown')}")
    logger.error(f"Validation error: {e}")
    raise

Benefits:

Add context without modifying exception message
Notes appear in traceback output
Useful for debugging and error reporting

When to use:

When you want to add debugging context to exceptions
When re-raising exceptions with additional information
When you need to preserve original exception message but add context

Suppressing Exceptions with `contextlib.suppress`¶

Use contextlib.suppress to intentionally ignore specific exceptions:

Python

from contextlib import suppress

# Suppress FileNotFoundError when removing a file
with suppress(FileNotFoundError):
    os.remove("temp_file.txt")

# Suppress multiple exception types
with suppress(KeyError, AttributeError):
    value = data["key"].attribute

Benefits:

Explicitly documents intent to ignore exceptions
More readable than empty except blocks
Type-safe exception suppression

When to use:

Cleanup operations where exceptions are expected
Optional operations that shouldn't fail the program
When you want to explicitly document ignored exceptions

Use Sparingly

Only suppress exceptions when you're certain they're safe to ignore. Suppressing exceptions can hide bugs and make debugging difficult.

Example from Tux codebase:

Python

from contextlib import suppress

# Suppress CancelledError when cleaning up tasks
for task in tasks:
    if task.done():
        with suppress(asyncio.CancelledError):
            await task

Creating Custom Exception Classes¶

You can define custom exception classes to extend Python's Exception class and handle specific cases in your application:

Python

class CustomError(Exception):
    """Custom exception for specific application errors."""
    pass

# Usage
if some_condition:
    raise CustomError("This is a custom exception!")

In Tux, we use a hierarchical exception system. All custom exceptions inherit from TuxError:

Python

from tux.shared.exceptions import TuxError

class MyCustomError(TuxError):
    """Custom error for my module."""
    pass

This allows the global error handler to format and handle your custom exceptions appropriately.

Resources¶

Python Exception Handling: Python's official documentation on exception handling
Python traceback Module: traceback documentation for detailed error information
Tux Exception Hierarchy: See src/tux/shared/exceptions.py for all Tux-specific exceptions
Global Error Handler: See src/tux/services/handlers/error/cog.py for error handling implementation
Sentry Integration: See sentry/index.md for error tracking and monitoring details
Security Best Practices: See our logging best practices for more on protecting sensitive data

Error Handling Best Practices¶

Errors vs Exceptions in Python¶

Common Python Exceptions¶

Type and Value Exceptions¶

Data Structure Exceptions¶

Mathematical Exceptions¶

File Operation Exceptions¶

Import and Module Exceptions¶

Runtime Exceptions¶

System Exceptions¶

Understanding Tux's Exception System¶

Error Categories and Strategies¶

User Errors¶

Infrastructure Errors¶

System Errors¶

Core Principles¶

Fail Gracefully, Log Aggressively¶

Be Specific, Not Generic¶

Keep Try Blocks Focused¶

Problem: Large try blocks¶

Solution: Focused try blocks¶

The Critical Difference: except: vs except Exception:¶

Exception Re-Raising Patterns¶

Pattern 1: Bare raise (Re-raise Same Exception)¶

Pattern 2: Raise New Exception (Without from)¶

Pattern 3: Raise New Exception from None¶

Pattern 4: Exception Chaining (from e) - Recommended¶

Pattern Comparison¶

Exception Re-Raising Best Practices¶

Using try/except/else/finally Blocks¶

Catching Multiple Exceptions¶

Nested try-except Blocks¶

Using traceback for Detailed Error Information¶

Error Handling Patterns¶

Database Operations¶

External API Calls¶

File Operations¶

Command Error Handling¶

Global Error Handler Integration¶

Custom Validation Errors¶

Handling Exceptions in Loops¶

Async Error Handling¶

Task Exception Handling¶

Timeout Handling¶

Example: Handling Exceptions in Async Functions¶

Context Managers for Error Handling¶

When to Use Context Managers¶

Database Transactions¶

Example: Custom Transaction Context Manager¶

Resource Cleanup¶

Example: File Operations¶

Example: Custom Resource Context Manager¶

Testing Error Conditions¶

Exception Testing¶

Integration Testing¶

Performance Considerations¶

Avoid Expensive Operations in Error Paths¶

Exception Creation Cost¶

Security: Protecting Sensitive Data¶

The Problem¶

The Solution¶

Best Practices¶

Example: Secure Error Handling¶

Anti-Patterns¶

Silent Failures¶

Re-raising with Generic Exceptions¶

Overly Broad Exception Handling¶

Error Monitoring¶

Sentry Integration¶

Error Metrics¶

Why Exception Handling Matters¶

1. Prevents Unexpected Program Crashes¶

2. Provides Meaningful Feedback to Users¶

3. Simplifies Debugging and Enables Detailed Logging¶

4. Ensures Proper Cleanup of Resources¶

5. Reduces Security Vulnerabilities¶

Code Quality Tools¶

Linting Your Code¶

Best Practices Summary¶

Code Structure¶

The Critical Difference: `except:` vs `except Exception:`¶

Pattern 1: Bare `raise` (Re-raise Same Exception)¶

Pattern 2: Raise New Exception (Without `from`)¶

Pattern 3: Raise New Exception `from None`¶

Pattern 4: Exception Chaining (`from e`) - Recommended¶

Pattern Matching with `match` Statements (Python 3.10+)¶

Exception Groups and `except*` (Python 3.11+)¶

Suppressing Exceptions with `contextlib.suppress`¶