NAICS Download
A Simple Crawler for NAICS Codes
1 Update (2024-01-15)
I found that there’s a NAICS API on GitHub with 91 stars. If you’re comfortable with an API then this may be for you. But I still feel my solution offers the following benefits:
I offered a ready-to-download table that you can use even if you don’t know what an API is.
The GitHub repository seems to be stale (the last commit is 11 years ago) with the 2022 update missing. My solution covers all three updates (2012, 2017, and 2022).
Let me know if there are other good NAICS solutions!
2 TL;DR
While NAICS (North American Industry Classification System) data is public on its own website and the US Census website, utilizing it is tedious, as it’s presented as HTML pages and users have to convert it into a 2D table.
I wrote a simple Scrapy crawler to collect all the NAICS classification results. (There are three versions of NAICS: 2012, 2017, and 2022). The source code is hosted on my GitHub.
How to download:
- Go to the GitHub repository
- Find the
resultsfolder and download thenaics_complete.featherfile.
3 How to Read the Data with R/Python?
feather is an amazing format provided by Apache that supports both R and Python without the need for conversion.
If you use R (first install arrow):
library(arrow)
df = read_feather('results/naics_complete.feather')If you use Python (first install pyarrow):
from pyarrow.feather import read_feather
df = read_feather('results/naics_complete.feather')4 How I Organize the Results

As you can see, there are four columns:
year: 2012, 2017, or 2022code: the NAICS codedesc: description of the industrylevel:“2 digits”: the highest level (about 20 industries)
“4 digits”: the next level (about 300 industries)
“6 digits”: the finest level (about 1000 industries)
5 How to Run the Crawler
At the root directory, simply run scrapy crawl naics. Of course, you need to install Scrapy first. See its documentation.
6 Funny Facts about NAICS and WRDS
NAICS is one of the most popular—and, in my view, the go-to—industry classification systems for North American companies. Quoting WRDS:
NAICS is designed to replace the old SIC system. It was developed jointly by the U.S. Economic Classification Policy Committee (ECPC), Statistics Canada, and Mexico’s Instituto Nacional de Estadistica y Geografia.
Many databases in WRDS do offer NAICS codes, but to my knowledge, they don’t offer textual descriptions. So what’s the use of a NAICS code, say 5311, if nobody tells you that it means “Lessors of Real Estate”? To get this textual description, we have to go to NAICS’s official website, which only shows the data in HTML, not a downloadable tabular spreadsheet:
