← 返回 Gene 目录

text-to-video

Hybrid ai.video

Create and track text-to-video generation jobs through OpenAI's Videos API. Supports prompt-based generation, optional reference images, and polling until completion.

作者 @web3xiaoba

README

# text-to-video

A Hybrid Gene that turns a text prompt into an OpenAI video generation job and can poll that job until completion.

## Environment

- `ROTIFER_OPENAI_API_KEY` or `OPENAI_API_KEY`: required
- `ROTIFER_OPENAI_BASE_URL`: optional, defaults to `https://api.openai.com/v1`

On macOS, Rotifer can auto-load the OpenAI key from Keychain after you store it with:

```bash
rotifer secret set-openai
```

## Features

- Create text-to-video jobs with `sora-2` or `sora-2-pro`
- Preview the optimized prompt with `operation: "prepare"`
- Optional image/file reference input
- Poll job status until `completed` or `failed`
- Auto-infer a prompt profile from the user's text and inject continuity / negative constraints
- Return stable metadata including `videoId`, `statusUrl`, `downloadUrl`, `optimizedPrompt`, and `shotPlan`

## Usage

Create a job:

```ts
await express({
prompt: "A cinematic drone shot flying through a neon rainy city at night",
model: "sora-2",
seconds: "8",
size: "1280x720"
}, { gatewayFetch });
```

Preview how the gene will optimize a raw prompt before generating:

```ts
await express({
operation: "prepare",
prompt: "A lone traveler walking through a neon rainy alley at night"
});
```

Use structured fields for better prompt adherence:

```ts
await express({
subject: "一位穿白色长裙的年轻女子",
action: "在森林溪流边俯身洗手",
scene: "清晨薄雾中的树林与浅溪",
style: "写实电影感",
camera: "中景,慢速推镜",
lighting: "柔和晨光",
mood: "安静、梦幻",
avoid: ["字幕", "水印", "额外人物", "肢体畸变"],
model: "sora-2",
seconds: "4",
size: "720x1280",
pollUntilComplete: true
}, { gatewayFetch });
```

Create and poll until done:

```ts
await express({
prompt: "A paper boat sailing across a glowing galaxy river",
pollUntilComplete: true,
pollIntervalMs: 5000,
maxPollAttempts: 12
}, { gatewayFetch });
```

Fetch an existing job:

```ts
await express({
operation: "status",
videoId: "video_123"
}, { gatewayFetch });
```

## Notes

- This gene returns the OpenAI content endpoint as `downloadUrl` when a job is completed.
- The current Rotifer network gateway reads response bodies as text, so this gene tracks jobs and exposes the content URL rather than downloading MP4 bytes directly.
- For short prompts, the gene now auto-expands the instruction with consistency constraints unless `enhancePrompt` is set to `false`.
- Prompt enhancement now stays in the same language as the input prompt and adds a `shotPlan`, inferred `optimizationProfile`, and `negativePrompt`.

表型

输入

属性类型 描述
mood string Target emotional tone, such as calm, suspenseful, dreamy, or energetic.
size 720x1280 | 1280x720 | 1024x1792 | 1792x1024 = 720x1280 Output resolution.
avoid array Elements to explicitly avoid in the generated clip.
model string = sora-2 Video model name. OpenAI currently documents sora-2 and sora-2-pro.
scene string Scene or location description for prompt construction.
style string Visual style, such as realistic cinema, watercolor, anime, or ad film.
action string Main action for prompt construction.
camera string Camera direction, such as close-up, dolly in, tracking shot, or handheld.
prompt string Text prompt describing the video to generate.
seconds 4 | 8 | 12 = 4 Clip duration in seconds.
subject string Primary subject for prompt construction.
videoId string Existing video job ID. Required when operation=status.
lighting string Lighting description, such as soft morning light or neon backlight.
operation create | status | prepare = create Use 'create' to start a new video job, 'status' to fetch an existing job, or 'prepare' to preview the optimized prompt without calling the API.
enhancePrompt boolean = true Expand short prompts with subject-consistency and composition constraints.
inputReference object Optional reference asset. Provide exactly one of imageUrl or fileId.
pollIntervalMs integer = 5000 Delay between status polls in milliseconds.
maxPollAttempts integer = 12 Maximum number of status requests to make while polling.
pollUntilComplete boolean = false Whether to poll the job until it reaches completed or failed.
optimizationProfile auto | general | portrait | landscape | anime | advertising | product | cinematic = auto Prompt optimization profile. Use 'auto' to infer the best profile from the user's prompt.

输出

属性类型 必填
ok boolean
size string | null
error string | null
model string | null
prompt string | null
status queued | in_progress | completed | failed | not_started
quality string | null
seconds string | null
videoId string | null
progress number
shotPlan object | null
createdAt number | null
expiresAt number | null
operation create | status | prepare
statusUrl string | null
completedAt number | null
downloadUrl string | null
pollAttempts integer
negativePrompt string | null
originalPrompt string | null
optimizedPrompt string | null
optimizationProfile general | portrait | landscape | anime | advertising | product | cinematic |
原始 JSON Schema

inputSchema

{
  "type": "object",
  "required": [],
  "properties": {
    "mood": {
      "type": "string",
      "description": "Target emotional tone, such as calm, suspenseful, dreamy, or energetic."
    },
    "size": {
      "enum": [
        "720x1280",
        "1280x720",
        "1024x1792",
        "1792x1024"
      ],
      "type": "string",
      "default": "720x1280",
      "description": "Output resolution."
    },
    "avoid": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Elements to explicitly avoid in the generated clip."
    },
    "model": {
      "type": "string",
      "default": "sora-2",
      "description": "Video model name. OpenAI currently documents sora-2 and sora-2-pro."
    },
    "scene": {
      "type": "string",
      "description": "Scene or location description for prompt construction."
    },
    "style": {
      "type": "string",
      "description": "Visual style, such as realistic cinema, watercolor, anime, or ad film."
    },
    "action": {
      "type": "string",
      "description": "Main action for prompt construction."
    },
    "camera": {
      "type": "string",
      "description": "Camera direction, such as close-up, dolly in, tracking shot, or handheld."
    },
    "prompt": {
      "type": "string",
      "description": "Text prompt describing the video to generate."
    },
    "seconds": {
      "enum": [
        "4",
        "8",
        "12"
      ],
      "type": "string",
      "default": "4",
      "description": "Clip duration in seconds."
    },
    "subject": {
      "type": "string",
      "description": "Primary subject for prompt construction."
    },
    "videoId": {
      "type": "string",
      "description": "Existing video job ID. Required when operation=status."
    },
    "lighting": {
      "type": "string",
      "description": "Lighting description, such as soft morning light or neon backlight."
    },
    "operation": {
      "enum": [
        "create",
        "status",
        "prepare"
      ],
      "type": "string",
      "default": "create",
      "description": "Use 'create' to start a new video job, 'status' to fetch an existing job, or 'prepare' to preview the optimized prompt without calling the API."
    },
    "enhancePrompt": {
      "type": "boolean",
      "default": true,
      "description": "Expand short prompts with subject-consistency and composition constraints."
    },
    "inputReference": {
      "type": "object",
      "properties": {
        "fileId": {
          "type": "string",
          "description": "Uploaded OpenAI file ID to use as a visual reference."
        },
        "imageUrl": {
          "type": "string",
          "description": "A public image URL or base64 data URL."
        }
      },
      "description": "Optional reference asset. Provide exactly one of imageUrl or fileId."
    },
    "pollIntervalMs": {
      "type": "integer",
      "default": 5000,
      "maximum": 60000,
      "minimum": 1000,
      "description": "Delay between status polls in milliseconds."
    },
    "maxPollAttempts": {
      "type": "integer",
      "default": 12,
      "maximum": 60,
      "minimum": 0,
      "description": "Maximum number of status requests to make while polling."
    },
    "pollUntilComplete": {
      "type": "boolean",
      "default": false,
      "description": "Whether to poll the job until it reaches completed or failed."
    },
    "optimizationProfile": {
      "enum": [
        "auto",
        "general",
        "portrait",
        "landscape",
        "anime",
        "advertising",
        "product",
        "cinematic"
      ],
      "type": "string",
      "default": "auto",
      "description": "Prompt optimization profile. Use 'auto' to infer the best profile from the user's prompt."
    }
  }
}

outputSchema

{
  "type": "object",
  "required": [
    "ok",
    "operation",
    "videoId",
    "status",
    "progress",
    "prompt",
    "originalPrompt",
    "optimizedPrompt",
    "optimizationProfile",
    "negativePrompt",
    "shotPlan",
    "model",
    "seconds",
    "size",
    "quality",
    "createdAt",
    "completedAt",
    "expiresAt",
    "statusUrl",
    "downloadUrl",
    "pollAttempts",
    "error"
  ],
  "properties": {
    "ok": {
      "type": "boolean"
    },
    "size": {
      "type": [
        "string",
        "null"
      ]
    },
    "error": {
      "type": [
        "string",
        "null"
      ]
    },
    "model": {
      "type": [
        "string",
        "null"
      ]
    },
    "prompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "status": {
      "enum": [
        "queued",
        "in_progress",
        "completed",
        "failed",
        "not_started"
      ],
      "type": "string"
    },
    "quality": {
      "type": [
        "string",
        "null"
      ]
    },
    "seconds": {
      "type": [
        "string",
        "null"
      ]
    },
    "videoId": {
      "type": [
        "string",
        "null"
      ]
    },
    "progress": {
      "type": "number"
    },
    "shotPlan": {
      "type": [
        "object",
        "null"
      ],
      "required": [
        "opening",
        "middle",
        "ending"
      ],
      "properties": {
        "ending": {
          "type": "string"
        },
        "middle": {
          "type": "string"
        },
        "opening": {
          "type": "string"
        }
      }
    },
    "createdAt": {
      "type": [
        "number",
        "null"
      ]
    },
    "expiresAt": {
      "type": [
        "number",
        "null"
      ]
    },
    "operation": {
      "enum": [
        "create",
        "status",
        "prepare"
      ],
      "type": "string"
    },
    "statusUrl": {
      "type": [
        "string",
        "null"
      ]
    },
    "completedAt": {
      "type": [
        "number",
        "null"
      ]
    },
    "downloadUrl": {
      "type": [
        "string",
        "null"
      ]
    },
    "pollAttempts": {
      "type": "integer"
    },
    "negativePrompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "originalPrompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "optimizedPrompt": {
      "type": [
        "string",
        "null"
      ]
    },
    "optimizationProfile": {
      "enum": [
        "general",
        "portrait",
        "landscape",
        "anime",
        "advertising",
        "product",
        "cinematic",
        null
      ],
      "type": [
        "string",
        "null"
      ]
    }
  }
}

Arena 历史

日期 适应度 安全分 调用数
3月19日 0.5000 1.00 1