text-to-video

Hybrid ai.video

Create and track text-to-video generation jobs through OpenAI's Videos API. Supports prompt-based generation, optional reference images, and polling until completion.

by @web3xiaoba

README

# text-to-video

A Hybrid Gene that turns a text prompt into an OpenAI video generation job and can poll that job until completion.

## Environment

- `ROTIFER_OPENAI_API_KEY` or `OPENAI_API_KEY`: required
- `ROTIFER_OPENAI_BASE_URL`: optional, defaults to `https://api.openai.com/v1`

On macOS, Rotifer can auto-load the OpenAI key from Keychain after you store it with:

```bash
rotifer secret set-openai
```

## Features

- Create text-to-video jobs with `sora-2` or `sora-2-pro`
- Preview the optimized prompt with `operation: "prepare"`
- Optional image/file reference input
- Poll job status until `completed` or `failed`
- Auto-infer a prompt profile from the user's text and inject continuity / negative constraints
- Return stable metadata including `videoId`, `statusUrl`, `downloadUrl`, `optimizedPrompt`, and `shotPlan`

## Usage

Create a job:

```ts
await express({
prompt: "A cinematic drone shot flying through a neon rainy city at night",
model: "sora-2",
seconds: "8",
size: "1280x720"
}, { gatewayFetch });
```

Preview how the gene will optimize a raw prompt before generating:

```ts
await express({
operation: "prepare",
prompt: "A lone traveler walking through a neon rainy alley at night"
});
```

Use structured fields for better prompt adherence:

```ts
await express({
subject: "一位穿白色长裙的年轻女子",
action: "在森林溪流边俯身洗手",
scene: "清晨薄雾中的树林与浅溪",
style: "写实电影感",
camera: "中景，慢速推镜",
lighting: "柔和晨光",
mood: "安静、梦幻",
avoid: ["字幕", "水印", "额外人物", "肢体畸变"],
model: "sora-2",
seconds: "4",
size: "720x1280",
pollUntilComplete: true
}, { gatewayFetch });
```

Create and poll until done:

```ts
await express({
prompt: "A paper boat sailing across a glowing galaxy river",
pollUntilComplete: true,
pollIntervalMs: 5000,
maxPollAttempts: 12
}, { gatewayFetch });
```

Fetch an existing job:

```ts
await express({
operation: "status",
videoId: "video_123"
}, { gatewayFetch });
```

## Notes

- This gene returns the OpenAI content endpoint as `downloadUrl` when a job is completed.
- The current Rotifer network gateway reads response bodies as text, so this gene tracks jobs and exposes the content URL rather than downloading MP4 bytes directly.
- For short prompts, the gene now auto-expands the instruction with consistency constraints unless `enhancePrompt` is set to `false`.
- Prompt enhancement now stays in the same language as the input prompt and adds a `shotPlan`, inferred `optimizationProfile`, and `negativePrompt`.

Phenotype

Input

Property	Type	Description
mood	string	Target emotional tone, such as calm, suspenseful, dreamy, or energetic.
size	720x1280 \| 1280x720 \| 1024x1792 \| 1792x1024 = 720x1280	Output resolution.
avoid	array	Elements to explicitly avoid in the generated clip.
model	string = sora-2	Video model name. OpenAI currently documents sora-2 and sora-2-pro.
scene	string	Scene or location description for prompt construction.
style	string	Visual style, such as realistic cinema, watercolor, anime, or ad film.
action	string	Main action for prompt construction.
camera	string	Camera direction, such as close-up, dolly in, tracking shot, or handheld.
prompt	string	Text prompt describing the video to generate.
seconds	4 \| 8 \| 12 = 4	Clip duration in seconds.
subject	string	Primary subject for prompt construction.
videoId	string	Existing video job ID. Required when operation=status.
lighting	string	Lighting description, such as soft morning light or neon backlight.
operation	create \| status \| prepare = create	Use 'create' to start a new video job, 'status' to fetch an existing job, or 'prepare' to preview the optimized prompt without calling the API.
enhancePrompt	boolean = true	Expand short prompts with subject-consistency and composition constraints.
inputReference	object	Optional reference asset. Provide exactly one of imageUrl or fileId.
pollIntervalMs	integer = 5000	Delay between status polls in milliseconds.
maxPollAttempts	integer = 12	Maximum number of status requests to make while polling.
pollUntilComplete	boolean = false	Whether to poll the job until it reaches completed or failed.
optimizationProfile	auto \| general \| portrait \| landscape \| anime \| advertising \| product \| cinematic = auto	Prompt optimization profile. Use 'auto' to infer the best profile from the user's prompt.

Output

Property	Type	Req
ok	boolean	✓
size	string \| null	✓
error	string \| null	✓
model	string \| null	✓
prompt	string \| null	✓
status	queued \| in_progress \| completed \| failed \| not_started	✓
quality	string \| null	✓
seconds	string \| null	✓
videoId	string \| null	✓
progress	number	✓
shotPlan	object \| null	✓
createdAt	number \| null	✓
expiresAt	number \| null	✓
operation	create \| status \| prepare	✓
statusUrl	string \| null	✓
completedAt	number \| null	✓
downloadUrl	string \| null	✓
pollAttempts	integer	✓
negativePrompt	string \| null	✓
originalPrompt	string \| null	✓
optimizedPrompt	string \| null	✓
optimizationProfile	general \| portrait \| landscape \| anime \| advertising \| product \| cinematic \|	✓

Raw JSON Schema

inputSchema

{
  "type": "object",
  "required": [],
  "properties": {
    "mood": {
      "type": "string",
      "description": "Target emotional tone, such as calm, suspenseful, dreamy, or energetic."
    },
    "size": {
      "enum": [
        "720x1280",
        "1280x720",
        "1024x1792",
        "1792x1024"
      ],
      "type": "string",
      "default": "720x1280",
      "description": "Output resolution."
    },
    "avoid": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Elements to explicitly avoid in the generated clip."
    },
    "model": {
      "type": "string",
      "default": "sora-2",
      "description": "Video model name. OpenAI currently documents sora-2 and sora-2-pro."
    },
    "scene": {
      "type": "string",
      "description": "Scene or location description for prompt construction."
    },
    "style": {
      "type": "string",
      "description": "Visual style, such as realistic cinema, watercolor, anime, or ad film."
    },
    "action": {
      "type": "string",
      "description": "Main action for prompt construction."
    },
    "camera": {
      "type": "string",
      "description": "Camera direction, such as close-up, dolly in, tracking shot, or handheld."
    },
    "prompt": {
      "type": "string",
      "description": "Text prompt describing the video to generate."
    },
    "seconds": {
      "enum": [
        "4",
        "8",
        "12"
      ],
      "type": "string",
      "default": "4",
      "description": "Clip duration in seconds."
    },
    "subject": {
      "type": "string",
      "description": "Primary subject for prompt construction."
    },
    "videoId": {
      "type": "string",
      "description": "Existing video job ID. Required when operation=status."
    },
    "lighting": {
      "type": "string",
      "description": "Lighting description, such as soft morning light or neon backlight."
    },
    "operation": {
      "enum": [
        "create",
        "status",
        "prepare"
      ],
      "type": "string",
      "default": "create",
      "description": "Use 'create' to start a new video job, 'status' to fetch an existing job, or 'prepare' to preview the optimized prompt without calling the API."
    },
    "enhancePrompt": {
      "type": "boolean",
      "default": true,
      "description": "Expand short prompts with subject-consistency and composition constraints."
    },
    "inputReference": {
      "type": "object",
      "properties": {
        "fileId": {
          "type": "string",
          "description": "Uploaded OpenAI file ID to use as a visual reference."
        },
        "imageUrl": {
          "type": "string",
          "description": "A public image URL or base64 data URL."
        }
      },
      "description": "Optional reference asset. Provide exactly one of imageUrl or fileId."
    },
    "pollIntervalMs": {
      "type": "integer",
      "default": 5000,
      "maximum": 60000,
      "minimum": 1000,
      "description": "Delay between status polls in milliseconds."
    },
    "maxPollAttempts": {
      "type": "integer",
      "default": 12,
      "maximum": 60,
      "minimum": 0,
      "description": "Maximum number of status requests to make while polling."
    },
    "pollUntilComplete": {
      "type": "boolean",
      "default": false,
      "description": "Whether to poll the job until it reaches completed or failed."
    },
    "optimizationProfile": {
      "enum": [
        "auto",
        "general",
        "portrait",
        "landscape",
        "anime",
        "advertising",
        "product",
        "cinematic"
      ],
      "type": "string",
      "default": "auto",
      "description": "Prompt optimization profile. Use 'auto' to infer the best profile from the user's prompt."
    }
  }
}

outputSchema

{
  "type": "object",
  "required": [
    "ok",
    "operation",
    "videoId",
    "status",
    "progress",
    "prompt",
    "originalPrompt",
    "optimizedPrompt",
    "optimizationProfile",
    "negativePrompt",
    "shotPlan",
    "model",
    "seconds",
    "size",
    "quality",
    "createdAt",
    "completedAt",
    "expiresAt",
    "statusUrl",
    "downloadUrl",
    "pollAttempts",
    "error"
  ],
  "properties": {
    "ok": {
      "type": "boolean"
    },
    "size": {
      "type": [
        "string",
        "null"
      ]
    },
    "error": {
      "type": [
        "string",
        "null"
      ]
    },
    "model": {
      "type": [
        "string",
        "null"
      ]
    },
    "prompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "status": {
      "enum": [
        "queued",
        "in_progress",
        "completed",
        "failed",
        "not_started"
      ],
      "type": "string"
    },
    "quality": {
      "type": [
        "string",
        "null"
      ]
    },
    "seconds": {
      "type": [
        "string",
        "null"
      ]
    },
    "videoId": {
      "type": [
        "string",
        "null"
      ]
    },
    "progress": {
      "type": "number"
    },
    "shotPlan": {
      "type": [
        "object",
        "null"
      ],
      "required": [
        "opening",
        "middle",
        "ending"
      ],
      "properties": {
        "ending": {
          "type": "string"
        },
        "middle": {
          "type": "string"
        },
        "opening": {
          "type": "string"
        }
      }
    },
    "createdAt": {
      "type": [
        "number",
        "null"
      ]
    },
    "expiresAt": {
      "type": [
        "number",
        "null"
      ]
    },
    "operation": {
      "enum": [
        "create",
        "status",
        "prepare"
      ],
      "type": "string"
    },
    "statusUrl": {
      "type": [
        "string",
        "null"
      ]
    },
    "completedAt": {
      "type": [
        "number",
        "null"
      ]
    },
    "downloadUrl": {
      "type": [
        "string",
        "null"
      ]
    },
    "pollAttempts": {
      "type": "integer"
    },
    "negativePrompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "originalPrompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "optimizedPrompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "optimizationProfile": {
      "enum": [
        "general",
        "portrait",
        "landscape",
        "anime",
        "advertising",
        "product",
        "cinematic",
        null
      ],
      "type": [
        "string",
        "null"
      ]
    }
  }
}

Arena History

Date	Fitness	Safety	Calls
Mar 19	0.5000	1.00	1