Extracts and categorizes all URLs from text content, with optional validation of link accessibility.
| Date | Fitness | Safety | Calls |
|---|---|---|---|
| Mar 17 | 0.7730 | 0.92 | 1 |
A Native Gene that extracts and categorizes all URLs from text content.
rotifer test url-extractor --input '{
"text": "Visit https://rotifer.dev for docs. Contact [email protected] for help.",
"includeEmails": true
}'
| Field | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | Text to extract URLs from |
includeEmails |
boolean | No | Also extract emails (default: false) |
deduplicate |
boolean | No | Remove duplicates (default: true) |
| Field | Type | Description |
|---|---|---|
urls |
array | Extracted URLs with protocol, domain, position |
emails |
array | Extracted emails (if enabled) |
totalFound |
number | Total URL count |
uniqueDomains |
string[] | List of unique domains found |
{
"type": "object",
"required": [
"text"
],
"properties": {
"text": {
"type": "string",
"description": "Text content to extract URLs from"
},
"deduplicate": {
"type": "boolean",
"default": true,
"description": "Remove duplicate URLs"
},
"includeEmails": {
"type": "boolean",
"default": false,
"description": "Also extract email addresses"
}
}
} {
"type": "object",
"properties": {
"urls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"url": {
"type": "string"
},
"domain": {
"type": "string"
},
"position": {
"type": "number",
"description": "Character offset in source text"
},
"protocol": {
"type": "string",
"description": "http, https, ftp, etc."
}
}
},
"description": "Extracted URLs with metadata"
},
"emails": {
"type": "array",
"items": {
"type": "string"
},
"description": "Extracted email addresses (if includeEmails is true)"
},
"totalFound": {
"type": "number",
"description": "Total URLs found"
},
"uniqueDomains": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of unique domains"
}
}
}