Skip to content

Creating Your First Tool

This tutorial will guide you through creating your first Docker-based tool with Kubiya SDK.

Prerequisites

Before you begin, make sure you have:

  1. Installed Kubiya SDK: pip install kubiya-sdk
  2. Docker installed and running on your system
  3. Basic knowledge of Python

Understanding Docker-Based Tools

Kubiya's tools are powered by Docker containers. This means:

  • Each tool runs in its own isolated environment
  • You can leverage existing Docker images for specialized functionality
  • You don't need to reinvent the wheel - use existing solutions as building blocks

A Simple Text Processing Tool

Let's start with a simple tool that processes text:

Python
# text_tool.py
from kubiya_sdk import tool

@tool(image="python:3.12-slim")
def process_text(text: str, operation: str = "uppercase") -> str:
    """Process text with various operations

    Args:
        text: The text to process
        operation: The operation to perform (uppercase, lowercase, capitalize, reverse)

    Returns:
        The processed text
    """
    if operation == "uppercase":
        return text.upper()
    elif operation == "lowercase":
        return text.lower()
    elif operation == "capitalize":
        return text.capitalize()
    elif operation == "reverse":
        return text[::-1]
    else:
        return f"Unknown operation: {operation}"

Let's break down what's happening:

  1. We import the tool decorator from kubiya_sdk
  2. We use @tool(image="python:3.12-slim") to specify that this tool should run in the Python 3.12 slim Docker image
  3. We define a function that takes text and an operation, and returns the processed text
  4. The docstring provides information about the tool's purpose and parameters

Testing Your Tool

You can test your tool directly:

Python
# Run in the same file or in a Python shell after importing
result = process_text("hello world", "uppercase")
print(result)  # Output: HELLO WORLD

Behind the scenes, Kubiya: 1. Pulls the Python Docker image if needed 2. Creates a container with your code 3. Executes the function inside the container 4. Returns the result

Using External Dependencies

Most real-world tools need external libraries. Kubiya makes this easy:

Python
# nlp_tool.py
from kubiya_sdk import tool

@tool(
    image="python:3.12-slim",
    requirements=["nltk"]
)
def analyze_sentiment(text: str) -> dict:
    """Analyze sentiment using NLTK

    Args:
        text: The text to analyze

    Returns:
        A dictionary with sentiment scores
    """
    import nltk
    from nltk.sentiment import SentimentIntensityAnalyzer

    # Download NLTK data if needed
    try:
        nltk.data.find('vader_lexicon')
    except LookupError:
        nltk.download('vader_lexicon')

    # Create analyzer
    sia = SentimentIntensityAnalyzer()

    # Analyze sentiment
    scores = sia.polarity_scores(text)

    return {
        "positive": scores["pos"],
        "negative": scores["neg"],
        "neutral": scores["neu"],
        "compound": scores["compound"]
    }

The requirements=["nltk"] parameter tells Kubiya to install NLTK in the container before executing the tool.

Using a Specialized Docker Image

Instead of using a generic Python image and installing requirements, you can use specialized images:

Python
# weather_tool.py
from kubiya_sdk import tool

@tool(
    image="ghcr.io/chubin/wttr.in:latest",
    command=["/app/bin/srv.py", "${LOCATION}"],
    environment={
        "LOCATION": "${location}"  # Map the input parameter to environment variable
    }
)
def get_weather(location: str) -> str:
    """Get weather information for a location

    Args:
        location: The location to get weather for

    Returns:
        Weather information as text
    """
    # No code needed - execution happens directly in the container
    # The output from the container command becomes the return value
    pass

In this example: 1. We use the wttr.in Docker image which provides weather information 2. We specify the command to run in the container 3. We map the location input parameter to the LOCATION environment variable

Tool with File Input/Output

Many tools need to work with files:

Python
# image_tool.py
import os
from kubiya_sdk import tool

@tool(
    image="python:3.12-slim",
    requirements=["pillow"],
    volumes={
        os.path.abspath("./input"): "/input",  # Mount input directory
        os.path.abspath("./output"): "/output"  # Mount output directory
    }
)
def resize_image(
    input_file: str, 
    width: int, 
    height: int, 
    output_file: str = None
) -> str:
    """Resize an image

    Args:
        input_file: Name of input file (in ./input directory)
        width: Target width in pixels
        height: Target height in pixels
        output_file: Name for output file (in ./output directory)

    Returns:
        Path to the resized image
    """
    from PIL import Image

    # Set default output filename if not provided
    if not output_file:
        name, ext = os.path.splitext(input_file)
        output_file = f"{name}_resized{ext}"

    # Load the image
    input_path = f"/input/{input_file}"
    img = Image.open(input_path)

    # Resize the image
    resized_img = img.resize((width, height))

    # Save the resized image
    output_path = f"/output/{output_file}"
    resized_img.save(output_path)

    # Return the output path on the host
    return f"./output/{output_file}"

In this example: 1. We mount the local ./input and ./output directories to the container 2. The tool reads a file from the input directory and writes to the output directory 3. The paths in the container are different from the paths on the host

Creating a Tool with External Configuration

Many tools need configuration, such as API keys:

Python
# api_tool.py
from kubiya_sdk import tool, config_model
from pydantic import BaseModel

@config_model(name="weather_api_config", description="Weather API Configuration")
class WeatherAPIConfig(BaseModel):
    """Weather API Configuration Schema"""
    api_key: str
    base_url: str = "https://api.weatherapi.com/v1"

@tool(
    image="python:3.12-slim",
    requirements=["requests"],
    required_configs=["weather_api_config"]  # Specify that this tool requires the config
)
def get_detailed_weather(city: str, config=None) -> dict:
    """Get detailed weather information for a city

    Args:
        city: The city to get weather for
        config: Configuration (automatically injected by Kubiya)

    Returns:
        Detailed weather information
    """
    import requests

    if not config:
        raise ValueError("Weather API configuration is required")

    api_key = config.get("api_key")
    base_url = config.get("base_url")

    url = f"{base_url}/current.json?key={api_key}&q={city}"
    response = requests.get(url)

    if response.status_code != 200:
        return {"error": response.json().get("error", {}).get("message", "Unknown error")}

    data = response.json()

    return {
        "location": {
            "name": data["location"]["name"],
            "country": data["location"]["country"]
        },
        "current": {
            "temperature_c": data["current"]["temp_c"],
            "temperature_f": data["current"]["temp_f"],
            "condition": data["current"]["condition"]["text"],
            "humidity": data["current"]["humidity"],
            "wind_kph": data["current"]["wind_kph"]
        }
    }

To use this tool, you need to set the configuration:

Python
from kubiya_sdk.tools.registry import tool_registry

# Set the configuration
tool_registry.set_dynamic_config({
    "weather_api_config": {
        "api_key": "your_api_key_here",
        "base_url": "https://api.weatherapi.com/v1"
    }
})

# Now you can use the tool
result = get_detailed_weather("London")
print(result)

Building a More Complex Tool

Let's build a more complex tool that analyzes GitHub repositories:

Python
# github_tool.py
from kubiya_sdk import tool

@tool(
    image="python:3.12-slim",
    requirements=["requests", "pandas", "matplotlib"]
)
def analyze_github_repo(repo_url: str) -> dict:
    """Analyze a GitHub repository

    Args:
        repo_url: URL of the GitHub repository (e.g., https://github.com/username/repo)

    Returns:
        Analysis results
    """
    import requests
    import pandas as pd
    import matplotlib.pyplot as plt
    import base64
    import io
    from datetime import datetime
    import os

    # Extract owner and repo from URL
    parts = repo_url.rstrip('/').split('/')
    owner = parts[-2]
    repo = parts[-1]

    # GitHub API URLs
    api_base = "https://api.github.com"
    commits_url = f"{api_base}/repos/{owner}/{repo}/commits"
    issues_url = f"{api_base}/repos/{owner}/{repo}/issues"
    repo_url = f"{api_base}/repos/{owner}/{repo}"

    # Fetch repository information
    repo_response = requests.get(repo_url)
    repo_data = repo_response.json()

    # Fetch commits (last 100)
    commits_response = requests.get(commits_url, params={"per_page": 100})
    commits_data = commits_response.json()

    # Fetch issues (last 100)
    issues_response = requests.get(issues_url, params={"per_page": 100, "state": "all"})
    issues_data = issues_response.json()

    # Process commits data
    commit_dates = []
    for commit in commits_data:
        if isinstance(commit, dict) and "commit" in commit:
            date_str = commit["commit"]["committer"]["date"]
            date = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ")
            commit_dates.append(date.strftime("%Y-%m-%d"))

    # Create commits DataFrame and count by date
    commits_df = pd.DataFrame({"date": commit_dates})
    commits_by_date = commits_df.groupby("date").size().reset_index(name="count")
    commits_by_date = commits_by_date.sort_values("date")

    # Create commits chart
    plt.figure(figsize=(10, 5))
    plt.bar(commits_by_date["date"], commits_by_date["count"])
    plt.title("Commits by Date")
    plt.xticks(rotation=45)
    plt.tight_layout()

    # Convert plot to base64
    buffer = io.BytesIO()
    plt.savefig(buffer, format="png")
    buffer.seek(0)
    commits_chart = base64.b64encode(buffer.read()).decode("utf-8")
    plt.close()

    # Basic statistics
    stats = {
        "repository": {
            "name": repo_data.get("name", "Unknown"),
            "owner": repo_data.get("owner", {}).get("login", "Unknown"),
            "stars": repo_data.get("stargazers_count", 0),
            "forks": repo_data.get("forks_count", 0),
            "open_issues": repo_data.get("open_issues_count", 0),
            "language": repo_data.get("language", "Unknown"),
            "created_at": repo_data.get("created_at", "Unknown")
        },
        "commits": {
            "count": len(commits_data),
            "by_date": commits_by_date.to_dict(orient="records")
        },
        "issues": {
            "count": len(issues_data),
            "open": sum(1 for issue in issues_data if issue.get("state") == "open"),
            "closed": sum(1 for issue in issues_data if issue.get("state") == "closed")
        }
    }

    return {
        "stats": stats,
        "commits_chart": commits_chart
    }

This tool: 1. Takes a GitHub repository URL as input 2. Fetches repository information, commits, and issues from the GitHub API 3. Processes the data using pandas 4. Generates a chart using matplotlib 5. Returns detailed statistics and a base64-encoded chart image

Testing the Tool with Kubiya CLI

The Kubiya CLI makes it easy to test tools:

Bash
# Create a test file
cat > github_tool.py << 'EOF'
# Paste the GitHub tool code here
EOF

# Test the tool with the CLI
kubiya tool test analyze_github_repo --param repo_url="https://github.com/kubiya/sdk-py"

Next Steps

Now that you've created your first tool, you can:

  1. Learn about building workflows to combine multiple tools
  2. Explore Docker image integration to use specialized Docker images
  3. Discover how to run tools on Kubernetes for scalability