Docker-Based Architecture¶
Kubiya SDK is built on a Docker-first architecture that lets you leverage existing container images as the building blocks for all your tools. This approach eliminates the need to write complex business logic from scratch and provides scalability, portability, and infrastructure flexibility.
Core Architecture Principles¶
- Docker Images as Tool Foundation: Every tool in Kubiya is backed by a Docker image
- Infrastructure Independence: Run your tools on any infrastructure that supports Docker (local, Kubernetes, cloud)
- Declarative Tool Definition: Define inputs, outputs, and Docker configuration in a simple, declarative way
- Seamless Workflow Integration: Combine Docker-based tools into powerful workflows
Creating Tools with Docker Images¶
The heart of Kubiya's architecture is the ability to use any Docker image as a foundation for your tools. Here's a real example:
from kubiya_sdk import tool
@tool(image="python:3.12-slim")
def process_text(text: str) -> str:
"""Process text using a Python-based tool"""
return text.upper()
In this simple example, the tool uses the python:3.12-slim Docker image. Kubiya handles:
- Pulling the image if needed
- Creating a container with the right environment
- Executing your code within the container
- Passing input/output between your workflow and the container
Pre-Configured Image Profiles¶
Kubiya provides ready-to-use image profiles for common use cases:
from kubiya_sdk import tool, Image
# Data science tools using specialized packages
@tool(image=Image.Python.DATA_SCIENCE)
def analyze_data(data: str) -> dict:
"""Analyze data using pandas, numpy, sklearn"""
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
# Data analysis code here...
return {"result": "analysis complete"}
# Web tools for API integration
@tool(image=Image.Python.WEB)
def fetch_data(url: str) -> dict:
"""Fetch data from a web API"""
import requests
response = requests.get(url)
return response.json()
# AWS tools for cloud operations
@tool(image=Image.Python.AWS)
def list_s3_buckets() -> list:
"""List all S3 buckets"""
import boto3
s3 = boto3.client('s3')
response = s3.list_buckets()
return [bucket['Name'] for bucket in response['Buckets']]
Each profile automatically includes the necessary packages without you having to specify them manually.
Custom Docker Image Configuration¶
You can create custom Docker configurations to meet specific needs:
from kubiya_sdk import tool, Image
# Create a custom image profile
custom_profile = Image.Custom.create(
image="python:3.11-slim",
requirements=["requests", "beautifulsoup4", "lxml"]
)
@tool(image=custom_profile)
def scrape_website(url: str) -> dict:
"""Scrape content from a website"""
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
title = soup.title.string if soup.title else "No title"
return {
"title": title,
"links": [a['href'] for a in soup.find_all('a', href=True)]
}
Specialized Tools Using Existing Docker Images¶
The real power of Kubiya's Docker-based architecture is the ability to use specialized Docker images for complex tasks:
from kubiya_sdk import tool
@tool(
image="postgres:14",
command=["psql", "-h", "${DB_HOST}", "-U", "${DB_USER}", "-d", "${DB_NAME}", "-c", "${QUERY}"],
environment={
"DB_HOST": "postgres.example.com",
"DB_USER": "${DB_USERNAME}", # Dynamic value from input or config
"DB_NAME": "analytics",
"PGPASSWORD": "${DB_PASSWORD}", # Secret from configuration
"QUERY": "${query}" # Value from input
}
)
def run_database_query(query: str, db_username: str, db_password: str) -> str:
"""Run a PostgreSQL query using the official Postgres image"""
# No Python code needed - execution happens directly in the container
pass
@tool(
image="aquasec/trivy:latest",
command=["image", "--format", "json", "${IMAGE_TO_SCAN}"],
environment={
"IMAGE_TO_SCAN": "${image_name}" # Dynamic value from input
}
)
def scan_container_image(image_name: str) -> dict:
"""Scan a container image for vulnerabilities using Trivy"""
# The tool runs directly in the specialized container
pass
File and Volume Mounting¶
Kubiya tools can access local files and directories through volume mounting:
import os
from kubiya_sdk import tool
@tool(
image="sonarsource/sonar-scanner-cli:latest",
command=[
"-Dsonar.projectKey=my-project",
"-Dsonar.sources=/usr/src"
],
volumes={
os.path.abspath("./src"): "/usr/src" # Mount local source code
},
environment={
"SONAR_HOST_URL": "http://sonarqube:9000",
"SONAR_LOGIN": "${SONAR_TOKEN}" # Reference to secret
}
)
def analyze_code_quality(sonar_token: str) -> str:
"""Analyze code quality using SonarQube"""
# Execution happens in the SonarQube container
pass
Dynamic Configuration¶
Tools can use dynamic configuration for connecting to external systems:
from kubiya_sdk import tool, config_model
from pydantic import BaseModel
@config_model(name="aws_config", description="AWS Configuration")
class AWSConfig(BaseModel):
"""AWS Configuration Schema"""
region: str = "us-east-1"
access_key_id: str
secret_access_key: str
@tool(
image=Image.Python.AWS,
required_configs=["aws_config"]
)
def list_ec2_instances(config=None) -> list:
"""List all EC2 instances in a region"""
import boto3
# Use the configuration provided by Kubiya
if not config:
raise ValueError("AWS configuration is required")
session = boto3.Session(
region_name=config.get("region"),
aws_access_key_id=config.get("access_key_id"),
aws_secret_access_key=config.get("secret_access_key")
)
ec2 = session.client('ec2')
instances = ec2.describe_instances()
return [
{
"id": instance["InstanceId"],
"type": instance["InstanceType"],
"state": instance["State"]["Name"]
}
for reservation in instances["Reservations"]
for instance in reservation["Instances"]
]
Kubernetes Integration¶
Kubiya tools can be scheduled on Kubernetes for enhanced scalability:
from kubiya_sdk import tool
from kubiya_sdk.infrastructure import KubernetesConfig
# Configure Kubernetes execution
k8s_config = KubernetesConfig(
namespace="kubiya-tools",
service_account="tool-runner",
resources={
"requests": {
"memory": "256Mi",
"cpu": "100m"
},
"limits": {
"memory": "512Mi",
"cpu": "200m"
}
}
)
@tool(
name="large-data-processor",
image="python:3.12-slim",
requirements=["pandas", "numpy", "scikit-learn"],
infrastructure=k8s_config # Specify Kubernetes execution
)
def process_large_dataset(dataset_url: str) -> dict:
"""Process large datasets on Kubernetes"""
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
# Heavy data processing that benefits from Kubernetes resources
df = pd.read_csv(dataset_url)
# Process the data
result = {"clusters": 5, "points": len(df)}
return result
Benefits of Container-First Architecture¶
Kubiya's Docker-based architecture provides several key advantages:
- Leverage Existing Solutions: Use any of the thousands of available Docker images
- Zero Boilerplate: Focus on the business logic, not container orchestration
- Infrastructure Flexibility: Run the same tools on local Docker, Kubernetes, or cloud services
- Language Agnostic: Use tools written in Python, Go, Node.js, or any language with a Docker image
- Security: Each tool runs in an isolated container with defined resources
- Scalability: Easily scale to handle increased workloads using Kubernetes
Real-World Use Cases¶
DevOps Automation¶
from kubiya_sdk import Workflow, tool
@tool(image="bitnami/kubectl:latest")
def deploy_application(namespace: str, deployment_file: str) -> str:
"""Deploy an application to Kubernetes"""
# Kubectl commands execute in the container
pass
@tool(image="hashicorp/terraform:latest")
def provision_infrastructure(tf_directory: str, vars: dict) -> dict:
"""Provision cloud infrastructure with Terraform"""
# Terraform commands execute in the container
pass
# Create an automation workflow
deployment_workflow = Workflow(
id="infrastructure-deployment",
description="Provision infrastructure and deploy applications",
tools=[provision_infrastructure, deploy_application]
)
Data Processing Pipeline¶
from kubiya_sdk import Workflow, tool
@tool(image="python:3.12-slim", requirements=["pandas", "requests"])
def extract_data(api_url: str, api_key: str) -> dict:
"""Extract data from API"""
import requests
import pandas as pd
headers = {"Authorization": f"Bearer {api_key}"}
response = requests.get(api_url, headers=headers)
data = response.json()
# Convert to DataFrame
df = pd.DataFrame(data)
return {
"data": df.to_json(orient="records"),
"count": len(df)
}
@tool(image="python:3.12-slim", requirements=["pandas", "numpy", "scikit-learn"])
def transform_data(data: str) -> dict:
"""Transform and analyze data"""
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Parse the JSON data
df = pd.read_json(data)
# Perform transformations
# ...
return {
"transformed_data": df.to_json(orient="records"),
"stats": {
"mean": df.mean().to_dict(),
"std": df.std().to_dict()
}
}
@tool(image="python:3.12-slim", requirements=["pandas", "matplotlib", "seaborn"])
def visualize_data(data: str, stats: dict) -> str:
"""Create visualizations from data"""
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
# Create output directory
os.makedirs("./output", exist_ok=True)
# Parse data
df = pd.read_json(data)
# Create visualizations
plt.figure(figsize=(10, 6))
# ...
# Save the chart
chart_path = "./output/chart.png"
plt.savefig(chart_path)
return chart_path
# Create a data pipeline workflow
data_pipeline = Workflow(
id="data-pipeline",
description="Extract, transform and visualize data",
tools=[extract_data, transform_data, visualize_data]
)
Conclusion¶
Kubiya's Docker-based architecture represents a fundamental shift in how developers build automation tools and workflows. By leveraging existing Docker images as building blocks, you can focus on your business logic rather than infrastructure concerns.
This approach brings the benefits of containerization (portability, scalability, and isolation) to your workflows while maintaining a simple, consistent interface for creating and composing tools.