Download historical data from Yahoo Finance
Table of Contents
Introduction
The majority of people use Yahoo Finance to obtain updated quotes for traded securities all around the world. However, it can be a useful source of financial data also for research purposes. In fact, from yahoo it is possible to easily obtain historical series of prices. The service was suspended for a few years but now it seems to work again, albeit with a slightly different procedure.
Tickers
The first problem is to identify the ticker symbol of the desired securities. The ticker symbol is an unique string that identifies each security. A global selection of tickers is available as an Excel file from investexcel.
For the New York stock exchange (NYSE), the updated list of tickers can be automatically downloaded in CSV format from the screening interface. Replace 'nyse' with 'nasdaq' to obtain the list of security traded on NASDAQ. The obtained file lists:
- the ticker symbol
- the name of the company
- the price of the last transaction
- the market capitalization,
- the IPO year
- the Sector as defined by the stock exchange,
- the Industry, a further division of the Sectors
- the link to company page on the stock exchange website
A useful subset of stocks are the components of the S&P500 index. They are liquid and substantially capitalized securities. The list of tickers composing the index can be, for instance, obtained from datahub.
Obtain historical data
Once the ticker of the interesting security is know, the data can be downloaded using a special URL as in the following example
https://query1.finance.yahoo.com/v7/finance/download/[TICKER]?period1=[EPOCHSTART]&period2=[EPOCHEND]&interval=1d&events=history&includeAdjustedClose=true
[TICKER]
is mandatory and is the ticker of the desired
security. [EPOCHSTART]
and [EPOCHEND]
define the range of dates
for which data are downloaded. Thy are expressed using the "epoch", a
Unix standard to represent dates as sequences of numbers (it is the
number of seconds from 1-1-1970). Conversion from common date formats
and epoch format can be obtained using an online epoch converter. For
instance 1AM, first of January, 1990 is expressed as "631155600". The
interval variable sets the frequency of the data. In the example it is
1 day. It can be 1w
for weekly data and 1mo
for monthly
data. events=history
selects historical prices. The last flag
includeAdjustedClose=true
include adjusted closing prices. Prices
are already adjusted for splits, the "Adjusted Close" price is further
adjusted for dividends.
If you insert the URL above in any browser, the data are saved in a file named according to the ticker symbol. The file contains the following comma separated fields:
- Date (yyyy-mm-dd)
- Open
- High
- Low
- Close
- Adj Close
- Volume
Scripting
Generally we want to download data about several securities at once. In this case scripting is necessary.
For instance, in order to download the price data at daily frequency for all NYSE traded companies in the Technology sector we can proceed as follows. From the screener interface we select NYSE and the Technology sector and download the CSV file. Then we collect all the tickers in a variable
ticks=$( awk -F',' 'NR>1{print $1}' nasdaq_screener.csv )
where nasdaq_screener.csv
is the name of the previously downloaded
file. Finally, we cycle over all tickers and download the price data,
saving them in a appropriately named files
USERAGENT="Mozilla/5.0" EPOCHSTART="631155600" EPOCHEND="1609459200" BASEURL="https://query1.finance.yahoo.com/v7/finance/download/" for tick in $ticks; do URL="${BASEURL}${ticker}?period1=${EPOCHSTART}&period2=${EPOCHEND}&interval=1d&events=history&includeAdjustedClose=true" wget --user-agent=${USERAGENT} "$URL" -O ${ticker}.csv done
In the example dates range from 01-01-1990 to 31-12-2020. One needs to specify a well known user agent as, I guess, Yahoo implements some generic protection against scrapers.
S&P 500
Using the previous technique I have download historical prices for the stocks composing the S&P500 index from 01-01-1990 to 31-12-2020. They are collected in this archive. Files are named according to the respective tickers.