Skip to content

Extraction

LLMs can extract structured data from unstructured text. Give it a document and tell it what fields you need — it returns clean, formatted data.

Prompt
Extract the following fields from this email and return
them as JSON: sender_name, company, meeting_date,
meeting_time, location.

"Hi Alex, this is Maria from Acme Corp. I'd like to
schedule a meeting for March 15th at 2:30 PM at our
downtown office on 5th Avenue. Let me know if that works.
Best, Maria Chen"

The model parses the unstructured text and returns clean JSON with the requested fields.

Response
{
  "sender_name": "Maria Chen",
  "company": "Acme Corp",
  "meeting_date": "March 15th",
  "meeting_time": "2:30 PM",
  "location": "Downtown office, 5th Avenue"
}

This works for any kind of document. Here we extract data from a recipe.

Prompt
Extract all ingredients and their quantities from this
recipe text. Return as a markdown table.

"For the cake, you'll need 2 cups of flour, 1.5 cups of
sugar, 3 large eggs, 1 cup of buttermilk, half a cup of
vegetable oil, 2 teaspoons of vanilla extract, and a
pinch of salt."

The model structures the extracted data in exactly the format you requested.

Response
| Ingredient       | Quantity              |
|------------------|-----------------------|
| Flour            | 2 cups                |
| Sugar            | 1.5 cups              |
| Eggs             | 3 large               |
| Buttermilk       | 1 cup                 |
| Vegetable oil    | 0.5 cup               |
| Vanilla extract  | 2 teaspoons           |
| Salt             | A pinch               |

Key takeaway: Specify the exact fields and output format (JSON, table, CSV) for reliable extraction. This pattern is powerful for automating data entry, parsing emails, processing invoices, and more.

Tips:
- Define the exact schema you want
- Specify output format: JSON, CSV, table
- Handle missing data: "use null if not found"
- Validate output in your application code