README
# url-extractor
A Native Gene that extracts and categorizes all URLs from text content.
## Usage
```bash
rotifer test url-extractor --input '{
"text": "Visit https://rotifer.dev for docs. Contact [email protected] for help.",
"includeEmails": true
}'
```
## Features
- Extract HTTP/HTTPS/FTP URLs from any text
- Optional email address extraction
- Automatic deduplication
- Domain categorization
- Character position tracking for each URL
## Input
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | string | Yes | Text to extract URLs from |
| `includeEmails` | boolean | No | Also extract emails (default: false) |
| `deduplicate` | boolean | No | Remove duplicates (default: true) |
## Output
| Field | Type | Description |
|-------|------|-------------|
| `urls` | array | Extracted URLs with protocol, domain, position |
| `emails` | array | Extracted emails (if enabled) |
| `totalFound` | number | Total URL count |
| `uniqueDomains` | string[] | List of unique domains found |
A Native Gene that extracts and categorizes all URLs from text content.
## Usage
```bash
rotifer test url-extractor --input '{
"text": "Visit https://rotifer.dev for docs. Contact [email protected] for help.",
"includeEmails": true
}'
```
## Features
- Extract HTTP/HTTPS/FTP URLs from any text
- Optional email address extraction
- Automatic deduplication
- Domain categorization
- Character position tracking for each URL
## Input
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | string | Yes | Text to extract URLs from |
| `includeEmails` | boolean | No | Also extract emails (default: false) |
| `deduplicate` | boolean | No | Remove duplicates (default: true) |
## Output
| Field | Type | Description |
|-------|------|-------------|
| `urls` | array | Extracted URLs with protocol, domain, position |
| `emails` | array | Extracted emails (if enabled) |
| `totalFound` | number | Total URL count |
| `uniqueDomains` | string[] | List of unique domains found |
Phenotype
Input
| Property | Type | Req | Description |
|---|---|---|---|
| text | string | ✓ | Text content to extract URLs from |
| deduplicate | boolean = true | Remove duplicate URLs | |
| includeEmails | boolean = false | Also extract email addresses |
Output
| Property | Type | Description |
|---|---|---|
| urls | array | Extracted URLs with metadata |
| emails | array | Extracted email addresses (if includeEmails is true) |
| totalFound | number | Total URLs found |
| uniqueDomains | array | List of unique domains |
Raw JSON Schema
inputSchema
{
"type": "object",
"required": [
"text"
],
"properties": {
"text": {
"type": "string",
"description": "Text content to extract URLs from"
},
"deduplicate": {
"type": "boolean",
"default": true,
"description": "Remove duplicate URLs"
},
"includeEmails": {
"type": "boolean",
"default": false,
"description": "Also extract email addresses"
}
}
} outputSchema
{
"type": "object",
"properties": {
"urls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"url": {
"type": "string"
},
"domain": {
"type": "string"
},
"position": {
"type": "number",
"description": "Character offset in source text"
},
"protocol": {
"type": "string",
"description": "http, https, ftp, etc."
}
}
},
"description": "Extracted URLs with metadata"
},
"emails": {
"type": "array",
"items": {
"type": "string"
},
"description": "Extracted email addresses (if includeEmails is true)"
},
"totalFound": {
"type": "number",
"description": "Total URLs found"
},
"uniqueDomains": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of unique domains"
}
}
} Arena History
| Date | Fitness | Safety | Calls |
|---|---|---|---|
| Mar 17 | 0.7730 | 0.92 | 1 |