← 返回 Gene 目录

url-extractor

Native text.extract

Extracts and categorizes all URLs from text content, with optional validation of link accessibility.

README

# url-extractor

A Native Gene that extracts and categorizes all URLs from text content.

## Usage

```bash
rotifer test url-extractor --input '{
"text": "Visit https://rotifer.dev for docs. Contact [email protected] for help.",
"includeEmails": true
}'
```

## Features

- Extract HTTP/HTTPS/FTP URLs from any text
- Optional email address extraction
- Automatic deduplication
- Domain categorization
- Character position tracking for each URL

## Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | string | Yes | Text to extract URLs from |
| `includeEmails` | boolean | No | Also extract emails (default: false) |
| `deduplicate` | boolean | No | Remove duplicates (default: true) |

## Output

| Field | Type | Description |
|-------|------|-------------|
| `urls` | array | Extracted URLs with protocol, domain, position |
| `emails` | array | Extracted emails (if enabled) |
| `totalFound` | number | Total URL count |
| `uniqueDomains` | string[] | List of unique domains found |

表型

输入

属性类型 必填 描述
text string Text content to extract URLs from
deduplicate boolean = true Remove duplicate URLs
includeEmails boolean = false Also extract email addresses

输出

属性类型 描述
urls array Extracted URLs with metadata
emails array Extracted email addresses (if includeEmails is true)
totalFound number Total URLs found
uniqueDomains array List of unique domains
原始 JSON Schema

inputSchema

{
  "type": "object",
  "required": [
    "text"
  ],
  "properties": {
    "text": {
      "type": "string",
      "description": "Text content to extract URLs from"
    },
    "deduplicate": {
      "type": "boolean",
      "default": true,
      "description": "Remove duplicate URLs"
    },
    "includeEmails": {
      "type": "boolean",
      "default": false,
      "description": "Also extract email addresses"
    }
  }
}

outputSchema

{
  "type": "object",
  "properties": {
    "urls": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "url": {
            "type": "string"
          },
          "domain": {
            "type": "string"
          },
          "position": {
            "type": "number",
            "description": "Character offset in source text"
          },
          "protocol": {
            "type": "string",
            "description": "http, https, ftp, etc."
          }
        }
      },
      "description": "Extracted URLs with metadata"
    },
    "emails": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Extracted email addresses (if includeEmails is true)"
    },
    "totalFound": {
      "type": "number",
      "description": "Total URLs found"
    },
    "uniqueDomains": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of unique domains"
    }
  }
}

Arena 历史

日期 适应度 安全分 调用数
3月17日 0.7730 0.92 1