This guide covers how to monitor your Flash deployments, debug issues, and resolve common errors.
Monitoring and debugging
Viewing logs
When running Flash functions, logs are displayed in your terminal:
2025-11-19 12:35:15,109 | INFO | Created endpoint: rb50waqznmn2kg - flash-quickstart-fb
2025-11-19 12:35:15,114 | INFO | Endpoint:rb50waqznmn2kg | API /run
2025-11-19 12:35:15,655 | INFO | Endpoint:rb50waqznmn2kg | Started Job:b0b341e7-...
2025-11-19 12:35:15,762 | INFO | Job:b0b341e7-... | Status: IN_QUEUE
2025-11-19 12:36:09,983 | INFO | Job:b0b341e7-... | Status: COMPLETED
2025-11-19 12:36:10,068 | INFO | Worker:icmkdgnrmdf8gz | Delay Time: 51842 ms
2025-11-19 12:36:10,068 | INFO | Worker:icmkdgnrmdf8gz | Execution Time: 1533 ms
Control log verbosity with the LOG_LEVEL environment variable:
LOG_LEVEL=DEBUG python your_script.py
Available levels: DEBUG, INFO, WARNING, ERROR.
Runpod console
View detailed metrics and logs in the Runpod console:
- Navigate to the Serverless section.
- Click on your endpoint to view:
- Active workers and queue depth.
- Request history and job status.
- Worker logs and execution details.
The console provides metrics including request rate, queue depth, latency, worker count, and error rate.
View worker logs
Access detailed logs for specific workers:
- Go to the Serverless console.
- Select your endpoint.
- Click on a worker to view its logs.
Logs include dependency installation output, function execution output (print statements, errors), and system-level messages.
Add logging to functions
Include print statements in your endpoint functions for debugging:
@Endpoint(name="processor", gpu=GpuGroup.ANY)
async def process(data: dict) -> dict:
print(f"Received data: {data}") # Visible in worker logs
result = do_processing(data)
print(f"Processing complete: {result}")
return result
Configuration errors
API key not set
Error:
No RunPod API key found. Set one with:
flash login # interactive setup
or
export RUNPOD_API_KEY=<your-api-key> # environment variable
or
echo 'RUNPOD_API_KEY=<your-api-key>' >> .env
Get a key: https://docs.runpod.io/get-started/api-keys
Cause: Flash requires a valid Runpod API key to provision and manage endpoints.
Solution:
-
Generate an API key from Settings > API Keys in the Runpod console. The key needs All access permissions.
-
Authenticate using one of these methods:
Option 1: Use
flash login (recommended)
Opens your browser for authentication and saves your credentials.
Option 2: Environment variable
export RUNPOD_API_KEY="your_api_key"
Option 3: .env file for local CLI use
echo "RUNPOD_API_KEY=your_api_key" >> .env
Values in your .env file are only available locally for CLI commands. They are not passed to deployed endpoints.
Option 4: Shell profile for persistent local access
echo 'export RUNPOD_API_KEY="your_api_key"' >> ~/.bashrc
source ~/.bashrc
Invalid route configuration
Error:
Load-balanced endpoints require route decorators
Cause: Load-balanced endpoints require HTTP method decorators for each route.
Solution: Ensure all routes use the correct decorator pattern:
from runpod_flash import Endpoint
api = Endpoint(name="api", cpu="cpu5c-4-8", workers=(1, 5))
# Correct - using route decorators
@api.post("/process")
async def process_data(data: dict) -> dict:
return {"result": "processed"}
@api.get("/health")
async def health_check() -> dict:
return {"status": "healthy"}
Invalid HTTP method
Error:
method must be one of {'GET', 'POST', 'PUT', 'DELETE', 'PATCH'}
Cause: The HTTP method specified is not supported.
Solution: Use one of the supported HTTP methods: GET, POST, PUT, DELETE, or PATCH.
Error:
Cause: HTTP paths must begin with a forward slash.
Solution: Ensure paths start with /:
# Correct
@api.get("/health")
# Incorrect
@api.get("health")
Duplicate routes
Error:
Duplicate route 'POST /process' in endpoint 'my-api'
Cause: Two functions define the same HTTP method and path combination.
Solution: Ensure each route is unique within an endpoint. Either change the path or method of one function.
Build errors
Unsupported Python version
Error:
Python 3.13 is not supported for Flash deployment.
Supported versions: 3.10, 3.11, 3.12
Cause: Flash requires Python 3.10, 3.11, or 3.12.
Solution:
Switch to a supported Python version using a virtual environment:
# Using pyenv
pyenv install 3.12
pyenv local 3.12
# Or using uv
uv venv --python 3.12
source .venv/bin/activate
Alternatively, use a Docker container with a supported Python version for your build environment.
Deployment errors
Tarball too large
Error:
Tarball exceeds maximum size. File size: 1.6GB, Max: 1.5GB
Cause: The deployment package exceeds the 1.5GB limit.
Solution:
- Check for large files that shouldn’t be included (datasets, model weights, logs).
- Add large files to
.flashignore to exclude them from the build.
- Use network volumes to store large models instead of bundling them.
Error:
File is not a valid gzip file. Expected magic bytes (31, 139)
Cause: The build artifact is corrupted or not a valid gzip file.
Solution: Delete the .flash directory and rebuild:
rm -rf .flash
flash build
SSL certificate verification failed
Error:
SSL certificate verification failed. This usually means Python cannot find your system's CA certificates.
Cause: Python cannot locate the system’s trusted CA certificates, preventing secure connections during deployment. This commonly occurs on fresh Python installations, especially on macOS.
Solution: Try one of these fixes:
-
Install certifi and set the certificate bundle path:
pip install certifi
export REQUESTS_CA_BUNDLE=$(python -c "import certifi; print(certifi.where())")
-
macOS only: Run the certificate installer that comes with Python. Find it in your Python installation folder (typically
/Applications/Python 3.x/) and run Install Certificates.command.
-
Add to shell profile for persistence:
echo 'export REQUESTS_CA_BUNDLE=$(python -c "import certifi; print(certifi.where())")' >> ~/.bashrc
source ~/.bashrc
Transient SSL errors (like connection resets) are automatically retried during upload. The certificate verification error requires manual intervention because it indicates a system configuration issue.
Resource provisioning failed
Error:
Failed to provision resources: [error details]
Cause: Flash couldn’t create the Serverless endpoint on Runpod.
Solutions:
-
Check GPU availability: The requested GPU types may not be available. Add fallback options:
gpu=[GpuType.NVIDIA_A100_80GB_PCIe, GpuType.NVIDIA_RTX_A6000, GpuType.NVIDIA_GEFORCE_RTX_4090]
-
Check account limits: You may have hit worker capacity limits. Contact Runpod support to increase limits.
-
Check network volume: If using
volume=, verify the volume exists and is in a compatible datacenter.
Runtime errors
Endpoint not deployed
Error:
Endpoint URL not available - endpoint may not be deployed
Cause: The endpoint function was called before the endpoint finished provisioning.
Solutions:
-
For standalone scripts: Ensure the endpoint has time to provision. Flash handles this automatically, but network issues can cause delays.
-
For Flash apps: Deploy the app first with
flash deploy, then call the endpoint.
-
Check endpoint status: View your endpoints in the Serverless console.
Execution timeout
Error:
Execution timeout on [endpoint] after [N]s
Cause: The endpoint function took longer than the configured timeout.
Solutions:
-
Increase timeout: Set
execution_timeout_ms in your configuration:
@Endpoint(
name="long-running",
gpu=GpuType.NVIDIA_A100_80GB_PCIe,
execution_timeout_ms=600000 # 10 minutes
)
-
Optimize function: Profile your function to identify bottlenecks.
-
Use queue-based endpoints: For long-running tasks, use the
@Endpoint decorator pattern. Queue-based endpoints are designed for longer operations.
Connection failed
Error:
Failed to connect to endpoint [name] ([url])
Cause: Network connectivity issue between your local environment and the Runpod endpoint.
Solutions:
- Check internet connection: Verify you have network access.
- Retry: Transient network issues often resolve on retry. Flash includes automatic retry logic.
- Check endpoint status: Verify the endpoint is running in the Serverless console.
HTTP errors from endpoint
Error:
HTTP error from endpoint [name]: 500 - Internal Server Error
Cause: The endpoint function raised an exception during execution.
Solutions:
-
Check logs: View worker logs in the Serverless console for detailed error messages.
-
Test locally: Use
flash run to test your function locally before deploying.
-
Add error handling: Wrap your function logic in try/except to provide better error messages:
@Endpoint(name="processor", gpu=GpuGroup.ANY)
async def process(data: dict) -> dict:
try:
# Your logic here
return {"result": "success"}
except Exception as e:
return {"error": str(e)}
Serialization errors
Error:
Failed to deserialize result: [error]
Cause: The function’s return value cannot be serialized/deserialized.
Solutions:
-
Use simple types: Return dictionaries, lists, strings, numbers, and other JSON-serializable types.
-
Avoid complex objects: Don’t return PyTorch tensors, NumPy arrays, or custom classes directly. Convert them first:
# Correct
return {"result": tensor.tolist()}
# Incorrect - tensor is not serializable
return {"result": tensor}
-
Check argument types: Input arguments must also be serializable.
Circuit breaker open
Error:
Circuit breaker is open. Retry in [N] seconds
Cause: Too many consecutive failures to the endpoint triggered the circuit breaker protection.
Solutions:
-
Wait and retry: The circuit breaker will automatically attempt recovery after the timeout (typically 60 seconds).
-
Check endpoint health: Multiple failures usually indicate an underlying issue. Check logs and endpoint status.
-
Fix the root cause: Address whatever is causing the repeated failures before retrying.
GPU availability issues
Job stuck in queue
Symptom: Job status shows IN_QUEUE for extended periods.
Cause: The requested GPU types are not available.
Solutions:
-
Add fallback GPUs: Expand your
gpu list with additional options:
@Endpoint(
name="flexible",
gpu=[
GpuType.NVIDIA_A100_80GB_PCIe, # First choice
GpuType.NVIDIA_RTX_A6000, # Fallback
GpuType.NVIDIA_GEFORCE_RTX_4090 # Second fallback
]
)
-
Use GpuGroup.ANY: For development, accept any available GPU:
-
Check availability: View GPU availability in the Serverless console.
-
Contact support: For guaranteed capacity, contact Runpod support.
Dependency errors
Module not found
Error (in worker logs):
ModuleNotFoundError: No module named 'transformers'
Cause: A required dependency was not specified in the @Endpoint decorator.
Solution: Add all required packages to the dependencies parameter:
@Endpoint(
name="processor",
gpu=GpuGroup.ANY,
dependencies=["transformers", "torch", "pillow"]
)
async def process(data: dict) -> dict:
from transformers import pipeline
# ...
Version conflicts
Symptom: Function fails with import errors or unexpected behavior.
Cause: Dependency version conflicts between packages.
Solution: Pin specific versions:
@Endpoint(
name="processor",
gpu=GpuGroup.ANY,
dependencies=[
"transformers==4.36.0",
"torch==2.1.0",
"accelerate>=0.25.0"
]
)
Getting help
If you’re still stuck:
- Discord: Join the Runpod Discord for community support.
- GitHub Issues: Report bugs or request features on the Flash repository.
- Support: Contact Runpod support for account-specific issues.