Getting Data In and Out of Your Unstructured Workflows

Speakers


Overview:
Data lives everywhere—S3 buckets, SharePoint sites, Salesforce records, Slack channels, and dozens of other systems. Before you can partition, chunk, embed, or do anything useful with AI, you need connectors to bring that data together. And then you need connectors to write out the data. But building connectors that actually work in production is far more complex than it appears: authentication schemes differ wildly across platforms, APIs change without warning, and edge cases multiply at scale.
In this session, we'll walk through how we build connectors at Unstructured—from the four critical phases of data movement—indexing and downloading at the source, to staging and uploading at the destination. You’ll see the internal architecture that powers these stages, understand how they integrate into the broader processing pipeline, and learn the end-to-end testing strategies required to maintain reliability across 50+ integrations.
Technical Details:
In this session, we’ll walk through:
- The four-phase architecture: Index → Download → Stage → Upload
- How source connectors discover and retrieve documents from external systems
- How destination connectors prepare and deliver data to its final home
- End-to-end testing strategies for maintaining reliability across 50+ integrations
This session includes a practical walkthrough and live Q&A. Can’t make it live? Register anyway and we’ll send you the recording.