Docker-Based Architecture¶

Kubiya SDK is built on a Docker-first architecture that lets you leverage existing container images as the building blocks for all your tools. This approach eliminates the need to write complex business logic from scratch and provides scalability, portability, and infrastructure flexibility.

Core Architecture Principles¶

Docker Images as Tool Foundation: Every tool in Kubiya is backed by a Docker image
Infrastructure Independence: Run your tools on any infrastructure that supports Docker (local, Kubernetes, cloud)
Declarative Tool Definition: Define inputs, outputs, and Docker configuration in a simple, declarative way
Seamless Workflow Integration: Combine Docker-based tools into powerful workflows

Creating Tools with Docker Images¶

The heart of Kubiya's architecture is the ability to use any Docker image as a foundation for your tools. Here's a real example:

Python

from kubiya_sdk import tool

@tool(image="python:3.12-slim")
def process_text(text: str) -> str:
    """Process text using a Python-based tool"""
    return text.upper()

In this simple example, the tool uses the python:3.12-slim Docker image. Kubiya handles:

Pulling the image if needed
Creating a container with the right environment
Executing your code within the container
Passing input/output between your workflow and the container

Pre-Configured Image Profiles¶

Kubiya provides ready-to-use image profiles for common use cases:

Python

from kubiya_sdk import tool, Image

# Data science tools using specialized packages
@tool(image=Image.Python.DATA_SCIENCE)
def analyze_data(data: str) -> dict:
    """Analyze data using pandas, numpy, sklearn"""
    import pandas as pd
    import numpy as np
    from sklearn.cluster import KMeans

    # Data analysis code here...
    return {"result": "analysis complete"}

# Web tools for API integration
@tool(image=Image.Python.WEB)
def fetch_data(url: str) -> dict:
    """Fetch data from a web API"""
    import requests
    response = requests.get(url)
    return response.json()

# AWS tools for cloud operations
@tool(image=Image.Python.AWS)
def list_s3_buckets() -> list:
    """List all S3 buckets"""
    import boto3
    s3 = boto3.client('s3')
    response = s3.list_buckets()
    return [bucket['Name'] for bucket in response['Buckets']]

Each profile automatically includes the necessary packages without you having to specify them manually.

Custom Docker Image Configuration¶

You can create custom Docker configurations to meet specific needs:

Python

from kubiya_sdk import tool, Image

# Create a custom image profile
custom_profile = Image.Custom.create(
    image="python:3.11-slim",
    requirements=["requests", "beautifulsoup4", "lxml"]
)

@tool(image=custom_profile)
def scrape_website(url: str) -> dict:
    """Scrape content from a website"""
    import requests
    from bs4 import BeautifulSoup

    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    title = soup.title.string if soup.title else "No title"

    return {
        "title": title,
        "links": [a['href'] for a in soup.find_all('a', href=True)]
    }

Specialized Tools Using Existing Docker Images¶

The real power of Kubiya's Docker-based architecture is the ability to use specialized Docker images for complex tasks:

Python

from kubiya_sdk import tool

@tool(
    image="postgres:14",
    command=["psql", "-h", "${DB_HOST}", "-U", "${DB_USER}", "-d", "${DB_NAME}", "-c", "${QUERY}"],
    environment={
        "DB_HOST": "postgres.example.com",
        "DB_USER": "${DB_USERNAME}",  # Dynamic value from input or config
        "DB_NAME": "analytics",
        "PGPASSWORD": "${DB_PASSWORD}",  # Secret from configuration
        "QUERY": "${query}"  # Value from input
    }
)
def run_database_query(query: str, db_username: str, db_password: str) -> str:
    """Run a PostgreSQL query using the official Postgres image"""
    # No Python code needed - execution happens directly in the container
    pass

@tool(
    image="aquasec/trivy:latest",
    command=["image", "--format", "json", "${IMAGE_TO_SCAN}"],
    environment={
        "IMAGE_TO_SCAN": "${image_name}"  # Dynamic value from input
    }
)
def scan_container_image(image_name: str) -> dict:
    """Scan a container image for vulnerabilities using Trivy"""
    # The tool runs directly in the specialized container
    pass

File and Volume Mounting¶

Kubiya tools can access local files and directories through volume mounting:

Python

import os
from kubiya_sdk import tool

@tool(
    image="sonarsource/sonar-scanner-cli:latest",
    command=[
        "-Dsonar.projectKey=my-project",
        "-Dsonar.sources=/usr/src"
    ],
    volumes={
        os.path.abspath("./src"): "/usr/src"  # Mount local source code
    },
    environment={
        "SONAR_HOST_URL": "http://sonarqube:9000",
        "SONAR_LOGIN": "${SONAR_TOKEN}"  # Reference to secret
    }
)
def analyze_code_quality(sonar_token: str) -> str:
    """Analyze code quality using SonarQube"""
    # Execution happens in the SonarQube container
    pass

Dynamic Configuration¶

Tools can use dynamic configuration for connecting to external systems:

Python

from kubiya_sdk import tool, config_model
from pydantic import BaseModel

@config_model(name="aws_config", description="AWS Configuration")
class AWSConfig(BaseModel):
    """AWS Configuration Schema"""
    region: str = "us-east-1"
    access_key_id: str
    secret_access_key: str

@tool(
    image=Image.Python.AWS,
    required_configs=["aws_config"]
)
def list_ec2_instances(config=None) -> list:
    """List all EC2 instances in a region"""
    import boto3

    # Use the configuration provided by Kubiya
    if not config:
        raise ValueError("AWS configuration is required")

    session = boto3.Session(
        region_name=config.get("region"),
        aws_access_key_id=config.get("access_key_id"),
        aws_secret_access_key=config.get("secret_access_key")
    )

    ec2 = session.client('ec2')
    instances = ec2.describe_instances()

    return [
        {
            "id": instance["InstanceId"],
            "type": instance["InstanceType"],
            "state": instance["State"]["Name"]
        }
        for reservation in instances["Reservations"]
        for instance in reservation["Instances"]
    ]

Kubernetes Integration¶

Kubiya tools can be scheduled on Kubernetes for enhanced scalability:

Python

from kubiya_sdk import tool
from kubiya_sdk.infrastructure import KubernetesConfig

# Configure Kubernetes execution
k8s_config = KubernetesConfig(
    namespace="kubiya-tools",
    service_account="tool-runner",
    resources={
        "requests": {
            "memory": "256Mi",
            "cpu": "100m"
        },
        "limits": {
            "memory": "512Mi",
            "cpu": "200m"
        }
    }
)

@tool(
    name="large-data-processor",
    image="python:3.12-slim",
    requirements=["pandas", "numpy", "scikit-learn"],
    infrastructure=k8s_config  # Specify Kubernetes execution
)
def process_large_dataset(dataset_url: str) -> dict:
    """Process large datasets on Kubernetes"""
    import pandas as pd
    import numpy as np
    from sklearn.cluster import KMeans

    # Heavy data processing that benefits from Kubernetes resources
    df = pd.read_csv(dataset_url)

    # Process the data
    result = {"clusters": 5, "points": len(df)}

    return result

Benefits of Container-First Architecture¶

Kubiya's Docker-based architecture provides several key advantages:

Leverage Existing Solutions: Use any of the thousands of available Docker images
Zero Boilerplate: Focus on the business logic, not container orchestration
Infrastructure Flexibility: Run the same tools on local Docker, Kubernetes, or cloud services
Language Agnostic: Use tools written in Python, Go, Node.js, or any language with a Docker image
Security: Each tool runs in an isolated container with defined resources
Scalability: Easily scale to handle increased workloads using Kubernetes

Real-World Use Cases¶

DevOps Automation¶

Python

from kubiya_sdk import Workflow, tool

@tool(image="bitnami/kubectl:latest")
def deploy_application(namespace: str, deployment_file: str) -> str:
    """Deploy an application to Kubernetes"""
    # Kubectl commands execute in the container
    pass

@tool(image="hashicorp/terraform:latest")
def provision_infrastructure(tf_directory: str, vars: dict) -> dict:
    """Provision cloud infrastructure with Terraform"""
    # Terraform commands execute in the container
    pass

# Create an automation workflow
deployment_workflow = Workflow(
    id="infrastructure-deployment",
    description="Provision infrastructure and deploy applications",
    tools=[provision_infrastructure, deploy_application]
)

Data Processing Pipeline¶

Python

from kubiya_sdk import Workflow, tool

@tool(image="python:3.12-slim", requirements=["pandas", "requests"])
def extract_data(api_url: str, api_key: str) -> dict:
    """Extract data from API"""
    import requests
    import pandas as pd

    headers = {"Authorization": f"Bearer {api_key}"}
    response = requests.get(api_url, headers=headers)
    data = response.json()

    # Convert to DataFrame
    df = pd.DataFrame(data)

    return {
        "data": df.to_json(orient="records"),
        "count": len(df)
    }

@tool(image="python:3.12-slim", requirements=["pandas", "numpy", "scikit-learn"])
def transform_data(data: str) -> dict:
    """Transform and analyze data"""
    import pandas as pd
    import numpy as np
    from sklearn.preprocessing import StandardScaler

    # Parse the JSON data
    df = pd.read_json(data)

    # Perform transformations
    # ...

    return {
        "transformed_data": df.to_json(orient="records"),
        "stats": {
            "mean": df.mean().to_dict(),
            "std": df.std().to_dict()
        }
    }

@tool(image="python:3.12-slim", requirements=["pandas", "matplotlib", "seaborn"])
def visualize_data(data: str, stats: dict) -> str:
    """Create visualizations from data"""
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import os

    # Create output directory
    os.makedirs("./output", exist_ok=True)

    # Parse data
    df = pd.read_json(data)

    # Create visualizations
    plt.figure(figsize=(10, 6))
    # ...

    # Save the chart
    chart_path = "./output/chart.png"
    plt.savefig(chart_path)

    return chart_path

# Create a data pipeline workflow
data_pipeline = Workflow(
    id="data-pipeline",
    description="Extract, transform and visualize data",
    tools=[extract_data, transform_data, visualize_data]
)

Conclusion¶

Kubiya's Docker-based architecture represents a fundamental shift in how developers build automation tools and workflows. By leveraging existing Docker images as building blocks, you can focus on your business logic rather than infrastructure concerns.

This approach brings the benefits of containerization (portability, scalability, and isolation) to your workflows while maintaining a simple, consistent interface for creating and composing tools.