Unstructured

SEC Pipelines

10-K, 10-Q, and S-1 filings provide investors with a vital source of information about the risks and opportunities associated with publicly traded companies. In order to understand the impact of these filings on investment decisions, however, analysts first need to extract information from complex and variable XML documents. For an analyst doing this by hand, it would take over 300 hours to extract and structure the content of the risk factors section for each of the 4,000+ publicly traded companies in the US. It could take equally long to develop custom parsing code capable of handling all of the corner cases that appear in real-world filings.

<XBRL>
<?xml version='1.0' encoding='UTF-8'?>
<!-- iXBRL document created with: Toppan Merrill Bridge iXBRL 9.6.7811.37134 -->
      <!-- Based on: iXBRL 1.1 -->
      <!-- Created on: 8/11/2021 10:45:07 PM -->
      <!-- iXBRL Library version: 1.0.7811.37150 -->
      <!-- iXBRL Service Job ID: 19f8db26-9ac2-4427-9c71-ed8734c57db7 -->
<html xmlns:us-gaap="http://fasb.org/us-gaap/2020-01-31" 
xmlns:link="http://www.xbrl.org/2003/linkbase" 
xmlns:country="http://xbrl.sec.gov/country/2020-01-31" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:rgld="http://www.royalgold.com/20210630" 
xmlns:xbrldt="http://xbrl.org/2005/xbrldt" 
xmlns:ixt-sec=
"http://www.sec.gov/inlineXBRL/transformation/20contextRef=
"Duration_7_1_2020_To_6_30_2021_srt_TitleOfIndividualAxis_rgld_
OfficersAndCertainEmployeesMember_us-gaap

The start of a messy XBRL file from EDGAR

Fortunately, Unstructured is here to help! Unstructured is excited to announce the release of our open source pre processing pipeline for select SEC filings, which you can find on GitHub here. The SEC filings API allows users to extract narrative text from one or more sections of a 10-K, 10-Q, or S-1 filing in iXBRL format.
curl -X 'POST' \
  'https://api.unstructured.io/sec-filings/v0.2.0/section' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'text_files=@<your-file-name>.xbrl' \
  -F 'section=RISK_FACTORS'
See this file for a list of valid inputs for the section parameter. To fetch an iXBRL document from EDGAR, use the following helper function from the pipeline repo.
from prepline_sec_filings.fetch import get_form_by_ticker
text = get_form_by_ticker(
    'rgld', 
    '10-K', 
    company='<your-name-or-org>', 
    email='<your-email>'
)
After fetching the document, save it locally and pass it into the API using the file parameter. The API is aware of what sections are valid per filing type, and you may specify one or more of them in the API request using the section. The valid section parameters are listed in this file under the SECSection enum in the GitHub repo, e.g. “PROSPECTUS_SUMMARY” or “RISK_FACTORS”. Alternatively, use  section=_ALL to retrieve all sections.

This API is open and free to use. Enjoy!

Have questions or need help? Use this invite to join us on our community Slack channel.

Sign Up for
Beta Access