Extraction
LLMs can extract structured data from unstructured text. Give it a document and tell it what fields you need — it returns clean, formatted data.
Prompt
Extract the following fields from this email and return
them as JSON: sender_name, company, meeting_date,
meeting_time, location.
"Hi Alex, this is Maria from Acme Corp. I'd like to
schedule a meeting for March 15th at 2:30 PM at our
downtown office on 5th Avenue. Let me know if that works.
Best, Maria Chen"
The model parses the unstructured text and returns clean JSON with the requested fields.
Response
{
"sender_name": "Maria Chen",
"company": "Acme Corp",
"meeting_date": "March 15th",
"meeting_time": "2:30 PM",
"location": "Downtown office, 5th Avenue"
}
This works for any kind of document. Here we extract data from a recipe.
Prompt
Extract all ingredients and their quantities from this
recipe text. Return as a markdown table.
"For the cake, you'll need 2 cups of flour, 1.5 cups of
sugar, 3 large eggs, 1 cup of buttermilk, half a cup of
vegetable oil, 2 teaspoons of vanilla extract, and a
pinch of salt."
The model structures the extracted data in exactly the format you requested.
Response
| Ingredient | Quantity |
|------------------|-----------------------|
| Flour | 2 cups |
| Sugar | 1.5 cups |
| Eggs | 3 large |
| Buttermilk | 1 cup |
| Vegetable oil | 0.5 cup |
| Vanilla extract | 2 teaspoons |
| Salt | A pinch |
Key takeaway: Specify the exact fields and output format (JSON, table, CSV) for reliable extraction. This pattern is powerful for automating data entry, parsing emails, processing invoices, and more.
Tips:
- Define the exact schema you want
- Specify output format: JSON, CSV, table
- Handle missing data: "use null if not found"
- Validate output in your application code