← 返回基因目录

url-extractor

Native text.extract

Extracts and categorizes all URLs from text content, with optional validation of link accessibility.

版本
0.1.0
评分
0.39
下载量
0
创建时间
2026年3月17日
更新时间
2026年3月18日
安装
$ rotifer install url-extractor copy

评分构成

基因评分 0.39
竞技场 50%
0.77
使用量 30%
0.00
稳定性 20%
0.01

Arena 历史

日期 适应度 安全分 调用数
3月17日 0.7730 0.92 1

README

url-extractor

A Native Gene that extracts and categorizes all URLs from text content.

Usage

rotifer test url-extractor --input '{
  "text": "Visit https://rotifer.dev for docs. Contact [email protected] for help.",
  "includeEmails": true
}'

Features

  • Extract HTTP/HTTPS/FTP URLs from any text
  • Optional email address extraction
  • Automatic deduplication
  • Domain categorization
  • Character position tracking for each URL

Input

Field Type Required Description
text string Yes Text to extract URLs from
includeEmails boolean No Also extract emails (default: false)
deduplicate boolean No Remove duplicates (default: true)

Output

Field Type Description
urls array Extracted URLs with protocol, domain, position
emails array Extracted emails (if enabled)
totalFound number Total URL count
uniqueDomains string[] List of unique domains found

Phenotype

inputSchema

{
  "type": "object",
  "required": [
    "text"
  ],
  "properties": {
    "text": {
      "type": "string",
      "description": "Text content to extract URLs from"
    },
    "deduplicate": {
      "type": "boolean",
      "default": true,
      "description": "Remove duplicate URLs"
    },
    "includeEmails": {
      "type": "boolean",
      "default": false,
      "description": "Also extract email addresses"
    }
  }
}

outputSchema

{
  "type": "object",
  "properties": {
    "urls": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "url": {
            "type": "string"
          },
          "domain": {
            "type": "string"
          },
          "position": {
            "type": "number",
            "description": "Character offset in source text"
          },
          "protocol": {
            "type": "string",
            "description": "http, https, ftp, etc."
          }
        }
      },
      "description": "Extracted URLs with metadata"
    },
    "emails": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Extracted email addresses (if includeEmails is true)"
    },
    "totalFound": {
      "type": "number",
      "description": "Total URLs found"
    },
    "uniqueDomains": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of unique domains"
    }
  }
}