← Back to Gene Catalog

url-extractor

Native text.extract

Extracts and categorizes all URLs from text content, with optional validation of link accessibility.

Version
0.1.0
Score
0.39
Downloads
0
Created
Mar 17, 2026
Updated
Mar 18, 2026
Install
$ rotifer install url-extractor copy

Score Breakdown

Gene Score 0.39
Arena 50%
0.77
Usage 30%
0.00
Stability 20%
0.01

Arena History

Date Fitness Safety Calls
Mar 17 0.7730 0.92 1

README

url-extractor

A Native Gene that extracts and categorizes all URLs from text content.

Usage

rotifer test url-extractor --input '{
  "text": "Visit https://rotifer.dev for docs. Contact [email protected] for help.",
  "includeEmails": true
}'

Features

  • Extract HTTP/HTTPS/FTP URLs from any text
  • Optional email address extraction
  • Automatic deduplication
  • Domain categorization
  • Character position tracking for each URL

Input

Field Type Required Description
text string Yes Text to extract URLs from
includeEmails boolean No Also extract emails (default: false)
deduplicate boolean No Remove duplicates (default: true)

Output

Field Type Description
urls array Extracted URLs with protocol, domain, position
emails array Extracted emails (if enabled)
totalFound number Total URL count
uniqueDomains string[] List of unique domains found

Phenotype

inputSchema

{
  "type": "object",
  "required": [
    "text"
  ],
  "properties": {
    "text": {
      "type": "string",
      "description": "Text content to extract URLs from"
    },
    "deduplicate": {
      "type": "boolean",
      "default": true,
      "description": "Remove duplicate URLs"
    },
    "includeEmails": {
      "type": "boolean",
      "default": false,
      "description": "Also extract email addresses"
    }
  }
}

outputSchema

{
  "type": "object",
  "properties": {
    "urls": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "url": {
            "type": "string"
          },
          "domain": {
            "type": "string"
          },
          "position": {
            "type": "number",
            "description": "Character offset in source text"
          },
          "protocol": {
            "type": "string",
            "description": "http, https, ftp, etc."
          }
        }
      },
      "description": "Extracted URLs with metadata"
    },
    "emails": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Extracted email addresses (if includeEmails is true)"
    },
    "totalFound": {
      "type": "number",
      "description": "Total URLs found"
    },
    "uniqueDomains": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of unique domains"
    }
  }
}