A new website, Rank and Filed, gathers data from the Security and Exchange Commission’s EDGAR database, indexes it, and publishes it online in open formats that investors can use to research and discover companies. I’ve included a screenshot of Tesla’s SEC filings below.
The site currently has over 25 million files indexed.
I heard about the new website directly from its creator, Maris Jensen, a former SEC analyst who built the site independently. According to Maris, she proposed the project internally in March 2013 but was immediately turned down.
A month later, after she was terminated for threatening the Commission’s mission with a “lack of respect for senior management” — an issue she holds was unrelated to the proposal — Maris decided to make the idea become real independently and started building. She has since offered to give the site and its code to the SEC but has not heard back from them yet.
Our interview, lightly edited for content and clarity, follows.
Where did the idea for this originate?
The breaking point was realizing that the guy in the cubicle across from me had spent a week writing the same parser as me — a Python program to parse the EDGAR FTP index for specific filings. This is nearly two decades after Carl Malamud set everything up; the FTP index is exactly as he left it. We were in the division responsible for the SEC’s data analytics and interactive data initiatives. The division literally rewrites this program each time they need SEC filings data. There’s no version control. There’s just no excuse! Hilariously, that guy also left the SEC and built an SEC filings website, though his is for-profit: http://legalai.com/
What does this do that the SEC needed?
In 2008, the SEC set up a task force (the ‘21st Century Disclosure Initiative‘) to rethink the way they were making data available to the public. A year later, they published this report, with their conclusion and proposal for a new, modernized disclosure system. I basically just tried to build the system they described. I also did lots of googling — ‘SEC EDGAR tool terrible‘, ‘how to find SEC data‘, etc — and then tried to address the problems people were having.
The problems have been the same for decades. In 1994, people wanted a SEC CIK-to-ticker mapping. 20 years later, this question still pops up on forums monthly.
There are over 600 different forms on EDGAR but the SEC’s form lists are basically no help at all. I went through and googled each form individually. I tried to group them into understandable categories.
The comment at the bottom of this post describes the SEC’s current problem better than I ever could:
Has anyone out there ever tried to use SEC.GOV to search for information about a company? The problem is very easy to articulate. If you search for something, you get 5000 results. At about 10 results per page, you have 500 pages to sift through to find what you want. Once you find what you want, there is ZERO ability to navigate from what you found into related documents!
What if you want to research a particular company’s board of directors? What other companies is each director associated with? Have there been any problems in any of those companies? You can’t investigate these types of things using the technology sec.gov has fielded. You want a needle. The SEC gives you a haystack.
Why not allow for better discovery of all of the SEC data and let investors perform their own investigations of markets & companies?
So instead of focusing on this obvious improvement to the public service the SEC provides, the emphasis apparently is on improving investigative actions. Great. Why not just shut off the sec.gov website completely and let the SEC do all of the investigating and researching of SEC data?
How does RankAndFiled.com compare to other sources of SEC data online?
I unfortunately haven’t added that much ‘value’ yet. I’m a total amateur. I’m just trying to make the data available and understandable! The website doesn’t do any analysis: it just collects, links and presents data from different SEC filings.
Looks like you got some great help from the folks you thanked. Did you build this all yourself with these tools?
Yes, open source tools these days are amazing!! I started this project with no web or software development experience at all.
I actually feel really lucky to have fallen into all of this. Everything I know I learned on google, mostly through tutorials written by the developers listed there.
I also didn’t know anyone in the dataviz or open source community, so I reached out to some of them with stuff like etiquette questions. Their response and support was just incredible — especially the D3 community, they’re just wonderful.
Can you tell me more about where the data on this site comes from and what you’ve done to it?
Basically, the system watches the SEC’s RSS feeds. It reads and indexes data from SEC filings as they come in. Not all the filings show up on the feeds — I’m not sure why — so it also scans the FTP index for any missed filings.
About 25 million SEC documents have been parsed and incorporated so far, which is everything that’s publicly available on EDGAR. So companies and people are tracked and connected over time — who’s raising money where, who owns whom, who moved companies or got promoted, who sold a ton of shares. I also realign all the financial data from quarterly and annual reports so you can see a company’s financial history and so the data is comparable between companies.
It actually feels silly even talking about it, because it’s just so basic. This is stuff the SEC should have been doing years and years and years ago.
But its not a perfect science because one, only a few SEC forms are machine-readable and two, the SEC doesn’t even try to standardize names. SEC registrants are given distinct identifiers but anything goes when companies or names are listed inside a filing. Middle names, middle initials, nicknames, suffixes, titles…
I spent November and December trying to give all my code to the SEC. I received no response, not even a polite no. That’s still the goal — I want them to take over and open source it, or at the very least host the underlying API. It’s their job to make this data available and accessible. They NEED a team over there doing hands-on work with SEC filings, a team struggling to make sense of this data with just the tools available to retail investors, especially now that they’re talking about disclosure reform. Right now, they have almost no incentive to change things over to structured data — they buy all the structured EDGAR data they need.
The SEC keeps saying that it’s the private sector’s job to build tools like this, not theirs, but in the past 20 years nobody has come up with a really great, really affordable option. It doesn’t make sense for any of us to even try — I’ve heard that Bloomberg and Thomson Reuters hire legions of Indian professionals to go through each SEC filing by hand. We just can’t compete.
The SEC will have to make a lot more of their data machine-readable before any ‘disruptive’ innovation can happen, but they won’t do that until they’re forced to (by Congress), unless they have people there who realize how unfair the situation has become.
There are actually a heartbreaking number of SEC employees who also want this to happen, self-described worker bees who’ve reached out to me from personal email to say they’ve been trying to convince their bosses to give this thing a chance. So far, no luck! I would open source it myself, but unfortunately I can’t afford to host the project indefinitely.