Fed Scraper

This web scraper, built using the Scrapy framework, collects text data from various documents surrounding Federal Open Market Committee (FOMC) meetings found on the Federal Reserve website.

Spiders

This scrapy project consists of the following spiders:

beige_book_archive
- Scrapes text from Beige Books found on: https://www.federalreserve.gov/monetarypolicy/beige-book-archive.htm
beige_book_current
- Scrapes text from recent Beige Books found on: https://www.federalreserve.gov/monetarypolicy/publications/beige-book-default.htm
fomc_calendar
- Scrapes text from recent documents starting at: https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm
- Takes $\approx$ 20 seconds to complete crawl
historical_materials
- Scrapes text from documents five or more years old starting at: https://www.federalreserve.gov/monetarypolicy/fomc_historical_year.htm
- Takes $\approx$ 45 minutes to complete crawl

Usage

The spiders can be run with the scrapy command line tool by running the scrapy crawl command from the scrapy project directory. I recommend running the spiders in the order listed above.

Alternatively, the data will be made available on kaggle at https://www.kaggle.com/datasets/edwardbickerton/fomc-text-data.

Output

The scrapy spiders save each document into a row of the csv file, data/fomc_documents.csv, which has the following columns:

document_kind
- A list of document kinds in the dataset can be found here.
meeting_date
- The date of the FOMC meeting associated with the document.
- For Beige Books scraped from the beige_book spiders, release_date but not meeting_date is made available. For these documents I set the meeting date to the closest subsequent meeting via the scrapy pipeline MeetingDatesPipeline.
release_date
- The release date of the document.
- When this is not found on the Federal Reserve website by the spider, it is inferred according the release schedules described here via the scrapy pipeline ReleaseDatesPipeline.
url
- The web address of the document.

The documents are then grouped based on their document_kind and split up into different csv files found in the data/documents_by_type directory.

Details of the csv files can be found in this table.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
csv_descriptions		csv_descriptions
fed_scraper		fed_scraper
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
create_csv_descriptions.py		create_csv_descriptions.py
create_meeting_date_list.py		create_meeting_date_list.py
meeting_dates.csv		meeting_dates.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fed Scraper

Spiders

Usage

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fed Scraper

Spiders

Usage

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages