This web scraper, built using the Scrapy framework, collects text data from various documents surrounding Federal Open Market Committee (FOMC) meetings found on the Federal Reserve website.
This scrapy project consists of the following spiders:
-
beige_book_archive- Scrapes text from Beige Books found on: https://www.federalreserve.gov/monetarypolicy/beige-book-archive.htm
-
beige_book_current- Scrapes text from recent Beige Books found on: https://www.federalreserve.gov/monetarypolicy/publications/beige-book-default.htm
-
fomc_calendar- Scrapes text from recent documents starting at: https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm
- Takes
$\approx$ 20 seconds to complete crawl
-
historical_materials- Scrapes text from documents five or more years old starting at: https://www.federalreserve.gov/monetarypolicy/fomc_historical_year.htm
- Takes
$\approx$ 45 minutes to complete crawl
The spiders can be run with the scrapy command line tool by running the scrapy crawl command from the scrapy project directory. I recommend running the spiders in the order listed above.
Alternatively, the data will be made available on kaggle at https://www.kaggle.com/datasets/edwardbickerton/fomc-text-data.
The scrapy spiders save each document into a row of the csv file, data/fomc_documents.csv, which has the following columns:
document_kind- A list of document kinds in the dataset can be found here.
meeting_date- The date of the FOMC meeting associated with the document.
- For Beige Books scraped from the
beige_bookspiders,release_datebut notmeeting_dateis made available. For these documents I set the meeting date to the closest subsequent meeting via the scrapy pipelineMeetingDatesPipeline.
release_date- The release date of the document.
- When this is not found on the Federal Reserve website by the spider, it is inferred according the release schedules described here via the scrapy pipeline
ReleaseDatesPipeline.
url- The web address of the document.
The documents are then grouped based on their document_kind and split up into different csv files found in the data/documents_by_type directory.
Details of the csv files can be found in this table.