Finding complete and well-structured data sets on real-world issues can be challenging. Recently we’ve faced that issue with the PPP data released by the SBA. The data releases were broken up in a few ways, and were missing some critical information that would allow data analysts to understand the impact the loans had on specific industries.
The original release from the SBA broke the data into files based on the loan amount first, and by state second. This means there was a national data set for all loans greater than $150k, which included a loan range (no exact amounts) but included business names and addresses. The other releases were provided in separate files, one for each state for all loans below $150k. This data set didn’t include business names or addresses, but did include exact loan amounts. All data sets contained an NAICS 6 digit code code, which is the North American Industry Classification System and is used to classify businesses by type of economic activity. The data didn’t include what the code stood for.
To save time for others, we’ve released two compiled data sets of the PPP data. The first set is Combined PPP Data. This data set is a single CSV (downloadable as a ZIP) and has all the data for loans both below and above $150k. This is the data in its raw form from the SBA website.
The second set of data is All PPP Data with NAICS Categories. To create this data set, we joined the PPP data with the corresponding NAICS categories, and added supplemental information. This means that all the PPP data will have human readable classifications of businesses rather than codes. In addition to joining the PPP data with NAICS data based on the 6 digit NAICS code, we also provided the Sector, Subsector, Industry group, and Industry for each loan. This will allow analyses to be done at both the highest and lowest levels of business classifications.
We hope our visitors find these data sets useful. We know that a lot of time is needed to clean and supplement data sets and want to assist analysts in that effort. In the future, we’ll be releasing additional free data sets, as well as premium data sets. Free data sets are usually simpler and provide less economic value than our premium data sets. Our premium data sets require much more effort to produce and also provide an edge in a competitive business environment.
MergeYourData.com Builds Analytics For Your Business To Grow
We hope you’re able to find some uses for our data sets in the near future. If you have any data projects that could use automated data pipelines, analyses, or visualization, we’re here to get that done. Whether it’s working alongside your existing data team or running a data project from start to finish, MergeYourData.com can get you where you want to be.