Creating Your First Tool¶
This tutorial will guide you through creating your first Docker-based tool with Kubiya SDK.
Prerequisites¶
Before you begin, make sure you have:
- Installed Kubiya SDK:
pip install kubiya-sdk - Docker installed and running on your system
- Basic knowledge of Python
Understanding Docker-Based Tools¶
Kubiya's tools are powered by Docker containers. This means:
- Each tool runs in its own isolated environment
- You can leverage existing Docker images for specialized functionality
- You don't need to reinvent the wheel - use existing solutions as building blocks
A Simple Text Processing Tool¶
Let's start with a simple tool that processes text:
# text_tool.py
from kubiya_sdk import tool
@tool(image="python:3.12-slim")
def process_text(text: str, operation: str = "uppercase") -> str:
"""Process text with various operations
Args:
text: The text to process
operation: The operation to perform (uppercase, lowercase, capitalize, reverse)
Returns:
The processed text
"""
if operation == "uppercase":
return text.upper()
elif operation == "lowercase":
return text.lower()
elif operation == "capitalize":
return text.capitalize()
elif operation == "reverse":
return text[::-1]
else:
return f"Unknown operation: {operation}"
Let's break down what's happening:
- We import the
tooldecorator fromkubiya_sdk - We use
@tool(image="python:3.12-slim")to specify that this tool should run in the Python 3.12 slim Docker image - We define a function that takes text and an operation, and returns the processed text
- The docstring provides information about the tool's purpose and parameters
Testing Your Tool¶
You can test your tool directly:
# Run in the same file or in a Python shell after importing
result = process_text("hello world", "uppercase")
print(result) # Output: HELLO WORLD
Behind the scenes, Kubiya: 1. Pulls the Python Docker image if needed 2. Creates a container with your code 3. Executes the function inside the container 4. Returns the result
Using External Dependencies¶
Most real-world tools need external libraries. Kubiya makes this easy:
# nlp_tool.py
from kubiya_sdk import tool
@tool(
image="python:3.12-slim",
requirements=["nltk"]
)
def analyze_sentiment(text: str) -> dict:
"""Analyze sentiment using NLTK
Args:
text: The text to analyze
Returns:
A dictionary with sentiment scores
"""
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
# Download NLTK data if needed
try:
nltk.data.find('vader_lexicon')
except LookupError:
nltk.download('vader_lexicon')
# Create analyzer
sia = SentimentIntensityAnalyzer()
# Analyze sentiment
scores = sia.polarity_scores(text)
return {
"positive": scores["pos"],
"negative": scores["neg"],
"neutral": scores["neu"],
"compound": scores["compound"]
}
The requirements=["nltk"] parameter tells Kubiya to install NLTK in the container before executing the tool.
Using a Specialized Docker Image¶
Instead of using a generic Python image and installing requirements, you can use specialized images:
# weather_tool.py
from kubiya_sdk import tool
@tool(
image="ghcr.io/chubin/wttr.in:latest",
command=["/app/bin/srv.py", "${LOCATION}"],
environment={
"LOCATION": "${location}" # Map the input parameter to environment variable
}
)
def get_weather(location: str) -> str:
"""Get weather information for a location
Args:
location: The location to get weather for
Returns:
Weather information as text
"""
# No code needed - execution happens directly in the container
# The output from the container command becomes the return value
pass
In this example:
1. We use the wttr.in Docker image which provides weather information
2. We specify the command to run in the container
3. We map the location input parameter to the LOCATION environment variable
Tool with File Input/Output¶
Many tools need to work with files:
# image_tool.py
import os
from kubiya_sdk import tool
@tool(
image="python:3.12-slim",
requirements=["pillow"],
volumes={
os.path.abspath("./input"): "/input", # Mount input directory
os.path.abspath("./output"): "/output" # Mount output directory
}
)
def resize_image(
input_file: str,
width: int,
height: int,
output_file: str = None
) -> str:
"""Resize an image
Args:
input_file: Name of input file (in ./input directory)
width: Target width in pixels
height: Target height in pixels
output_file: Name for output file (in ./output directory)
Returns:
Path to the resized image
"""
from PIL import Image
# Set default output filename if not provided
if not output_file:
name, ext = os.path.splitext(input_file)
output_file = f"{name}_resized{ext}"
# Load the image
input_path = f"/input/{input_file}"
img = Image.open(input_path)
# Resize the image
resized_img = img.resize((width, height))
# Save the resized image
output_path = f"/output/{output_file}"
resized_img.save(output_path)
# Return the output path on the host
return f"./output/{output_file}"
In this example:
1. We mount the local ./input and ./output directories to the container
2. The tool reads a file from the input directory and writes to the output directory
3. The paths in the container are different from the paths on the host
Creating a Tool with External Configuration¶
Many tools need configuration, such as API keys:
# api_tool.py
from kubiya_sdk import tool, config_model
from pydantic import BaseModel
@config_model(name="weather_api_config", description="Weather API Configuration")
class WeatherAPIConfig(BaseModel):
"""Weather API Configuration Schema"""
api_key: str
base_url: str = "https://api.weatherapi.com/v1"
@tool(
image="python:3.12-slim",
requirements=["requests"],
required_configs=["weather_api_config"] # Specify that this tool requires the config
)
def get_detailed_weather(city: str, config=None) -> dict:
"""Get detailed weather information for a city
Args:
city: The city to get weather for
config: Configuration (automatically injected by Kubiya)
Returns:
Detailed weather information
"""
import requests
if not config:
raise ValueError("Weather API configuration is required")
api_key = config.get("api_key")
base_url = config.get("base_url")
url = f"{base_url}/current.json?key={api_key}&q={city}"
response = requests.get(url)
if response.status_code != 200:
return {"error": response.json().get("error", {}).get("message", "Unknown error")}
data = response.json()
return {
"location": {
"name": data["location"]["name"],
"country": data["location"]["country"]
},
"current": {
"temperature_c": data["current"]["temp_c"],
"temperature_f": data["current"]["temp_f"],
"condition": data["current"]["condition"]["text"],
"humidity": data["current"]["humidity"],
"wind_kph": data["current"]["wind_kph"]
}
}
To use this tool, you need to set the configuration:
from kubiya_sdk.tools.registry import tool_registry
# Set the configuration
tool_registry.set_dynamic_config({
"weather_api_config": {
"api_key": "your_api_key_here",
"base_url": "https://api.weatherapi.com/v1"
}
})
# Now you can use the tool
result = get_detailed_weather("London")
print(result)
Building a More Complex Tool¶
Let's build a more complex tool that analyzes GitHub repositories:
# github_tool.py
from kubiya_sdk import tool
@tool(
image="python:3.12-slim",
requirements=["requests", "pandas", "matplotlib"]
)
def analyze_github_repo(repo_url: str) -> dict:
"""Analyze a GitHub repository
Args:
repo_url: URL of the GitHub repository (e.g., https://github.com/username/repo)
Returns:
Analysis results
"""
import requests
import pandas as pd
import matplotlib.pyplot as plt
import base64
import io
from datetime import datetime
import os
# Extract owner and repo from URL
parts = repo_url.rstrip('/').split('/')
owner = parts[-2]
repo = parts[-1]
# GitHub API URLs
api_base = "https://api.github.com"
commits_url = f"{api_base}/repos/{owner}/{repo}/commits"
issues_url = f"{api_base}/repos/{owner}/{repo}/issues"
repo_url = f"{api_base}/repos/{owner}/{repo}"
# Fetch repository information
repo_response = requests.get(repo_url)
repo_data = repo_response.json()
# Fetch commits (last 100)
commits_response = requests.get(commits_url, params={"per_page": 100})
commits_data = commits_response.json()
# Fetch issues (last 100)
issues_response = requests.get(issues_url, params={"per_page": 100, "state": "all"})
issues_data = issues_response.json()
# Process commits data
commit_dates = []
for commit in commits_data:
if isinstance(commit, dict) and "commit" in commit:
date_str = commit["commit"]["committer"]["date"]
date = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ")
commit_dates.append(date.strftime("%Y-%m-%d"))
# Create commits DataFrame and count by date
commits_df = pd.DataFrame({"date": commit_dates})
commits_by_date = commits_df.groupby("date").size().reset_index(name="count")
commits_by_date = commits_by_date.sort_values("date")
# Create commits chart
plt.figure(figsize=(10, 5))
plt.bar(commits_by_date["date"], commits_by_date["count"])
plt.title("Commits by Date")
plt.xticks(rotation=45)
plt.tight_layout()
# Convert plot to base64
buffer = io.BytesIO()
plt.savefig(buffer, format="png")
buffer.seek(0)
commits_chart = base64.b64encode(buffer.read()).decode("utf-8")
plt.close()
# Basic statistics
stats = {
"repository": {
"name": repo_data.get("name", "Unknown"),
"owner": repo_data.get("owner", {}).get("login", "Unknown"),
"stars": repo_data.get("stargazers_count", 0),
"forks": repo_data.get("forks_count", 0),
"open_issues": repo_data.get("open_issues_count", 0),
"language": repo_data.get("language", "Unknown"),
"created_at": repo_data.get("created_at", "Unknown")
},
"commits": {
"count": len(commits_data),
"by_date": commits_by_date.to_dict(orient="records")
},
"issues": {
"count": len(issues_data),
"open": sum(1 for issue in issues_data if issue.get("state") == "open"),
"closed": sum(1 for issue in issues_data if issue.get("state") == "closed")
}
}
return {
"stats": stats,
"commits_chart": commits_chart
}
This tool: 1. Takes a GitHub repository URL as input 2. Fetches repository information, commits, and issues from the GitHub API 3. Processes the data using pandas 4. Generates a chart using matplotlib 5. Returns detailed statistics and a base64-encoded chart image
Testing the Tool with Kubiya CLI¶
The Kubiya CLI makes it easy to test tools:
# Create a test file
cat > github_tool.py << 'EOF'
# Paste the GitHub tool code here
EOF
# Test the tool with the CLI
kubiya tool test analyze_github_repo --param repo_url="https://github.com/kubiya/sdk-py"
Next Steps¶
Now that you've created your first tool, you can:
- Learn about building workflows to combine multiple tools
- Explore Docker image integration to use specialized Docker images
- Discover how to run tools on Kubernetes for scalability