DocStrange by Nanonets
Document extraction API โ convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
Get your API key: https://docstrange.nanonets.com/app
Quick Start
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=markdown"
Response:
{
"success": true,
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"result": {
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
Setup
1. Get Your API Key
# Visit the dashboard
https://docstrange.nanonets.com/app
Save your API key:
export DOCSTRANGE_API_KEY="your_api_key_here"
2. OpenClaw Configuration (Optional)
Recommended: Use environment variables (most secure):
{
skills: {
entries: {
"docstrange": {
enabled: true,
// API key loaded from environment variable DOCSTRANGE_API_KEY
},
},
},
}
Alternative: Store in config file (use with caution):
{
skills: {
entries: {
"docstrange": {
enabled: true,
env: {
DOCSTRANGE_API_KEY: "your_api_key_here",
},
},
},
},
}
Security Note: If storing API keys in ~/.openclaw/openclaw.json:
- Set file permissions:
chmod 600 ~/.openclaw/openclaw.json - Never commit this file to version control
- Prefer environment variables or your agent's secret store when possible
- Rotate keys regularly and limit API key permissions if supported
Common Tasks
Extract to Markdown
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=markdown"
Access content: response["result"]["markdown"]["content"]
Extract JSON Fields
Simple field list:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=json" \
-F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
-F "include_metadata=confidence_score"
With JSON schema:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=json" \
-F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
Extract Tables to CSV
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=csv" \
-F "csv_options=table"
Async Extraction (Large Documents)
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=markdown"
# Returns: {"record_id": "12345", "status": "processing"}
Poll for results:
curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY"
# Returns: {"status": "completed", "result": {...}}
Advanced Features
Bounding Boxes
Get element coordinates for layout analysis:
-F "include_metadata=bounding_boxes"
Hierarchy Output
Extract document structure (sections, tables, key-value pairs):
-F "json_options=hierarchy_output"
Financial Documents Mode
Enhanced table and number formatting:
-F "markdown_options=financial-docs"
Custom Instructions
Guide extraction with prompts:
-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"
Multiple Formats
Request multiple formats in one call:
-F "output_format=markdown,json"
When to Use
Use DocStrange For:
- Invoice and receipt processing
- Contract text extraction
- Bank statement parsing
- Form digitization
- Image OCR (scanned documents)
Don't Use For:
- Documents >5 pages with sync (use async)
- Video/audio transcription
- Non-document images
Best Practices
| Document Size | Endpoint | Notes |
|---|---|---|
| <=5 pages | /extract/sync |
Immediate response |
| >5 pages | /extract/async |
Poll for results |
JSON Extraction:
- Field list:
["field1", "field2"]โ quick extractions - JSON schema:
{"type": "object", ...}โ strict typing, nested data
Confidence Scores:
- Add
include_metadata=confidence_score - Scores are 0-100 per field
- Review fields <80 manually
Schema Templates
Invoice
{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
Receipt
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Security & Privacy
Data Handling
Important: Documents uploaded to DocStrange are transmitted to https://extraction-api.nanonets.com and processed on external servers.
Before uploading sensitive documents:
- Review Nanonets' privacy policy and data retention policies: https://docstrange.nanonets.com/docs
- Verify encryption in transit (HTTPS) and at rest
- Confirm data deletion/retention timelines
- Test with non-sensitive sample documents first
Best practices:
- Do not upload highly sensitive PII (SSNs, medical records, financial account numbers) until you've confirmed the service's security and compliance posture
- Use API keys with limited permissions/scopes if available
- Rotate API keys regularly (every 90 days recommended)
- Monitor API usage logs for unauthorized access
- Never log or commit API keys to repositories or examples
File Size Limits
- Sync endpoint: Recommended for documents โค5 pages
- Async endpoint: Use for documents >5 pages to avoid timeouts
- Large files: Consider using
file_urlwith publicly accessible URLs instead of uploading large files directly
Operational Safeguards
- Always use environment variables or secure secret stores for API keys
- Never include real API keys in code examples or documentation
- Use placeholder values like
"your_api_key_here"in examples - Set appropriate file permissions on configuration files (600 for JSON configs)
- Enable API key rotation and monitor usage through the dashboard
Troubleshooting
400 Bad Request:
- Provide exactly one input:
file,file_url, orfile_base64 - Verify API key is valid
Sync Timeout:
- Use async for documents >5 pages
- Poll
/extract/results/{record_id}
Missing Confidence Scores:
- Requires
json_options(field list or schema) - Add
include_metadata=confidence_score
Authentication Errors:
- Verify
DOCSTRANGE_API_KEYenvironment variable is set - Check API key hasn't expired or been revoked
- Ensure no extra whitespace in API key value
Pre-Publish Security Checklist
Before publishing or updating this skill, verify:
-
package.jsondeclaresrequiredEnvandprimaryEnvforDOCSTRANGE_API_KEY -
package.jsonlists API endpoints inendpointsarray - All code examples use placeholder values (
"your_api_key_here") not real keys - No API keys or secrets are embedded in
SKILL.mdorpackage.json - Security & Privacy section documents data handling and risks
- Configuration examples include security warnings for plaintext storage
- File permission guidance is included for config files
References
- API Docs: https://docstrange.nanonets.com/docs
- Get API Key: https://docstrange.nanonets.com/app
- Privacy Policy: https://docstrange.nanonets.com/docs (check for privacy/data policy links)