How to Use Yahoo Finance Library and Other APIs For Collecting Data
A Step-by-Step Guide to Using Yahoo Finance Python Library and Other APIs to Gather Financial Data
Interested in doing stock market predictions? But have no idea where to collect your data from? In this post, we will discuss how we can use Yahoo Finance library and other APIs such as NASDAQ, Alpha Vantage, and Federal Reserve Economic Data (FRED) to collect stock market and economic data for your project.
Without further ado, let’s get into it!
Yahoo Finance Library
1. Import All Necessary Libraries
First, let’s import all of the Python libraries that we need for this exercise:
The Yahoo Finance library is called “yfinance” as shown in line 2 above. We will use other libraries for calling and gathering data from APIs.
If this is your first time using the Yahoo Finance library, then you will need to install the library before using it. To install, use pip:
$ pip install yfinance --upgrade --no-cache-dir
More detailed documentation on how to install the library can be found on the Python Package Index (PyPi).
2. Define A Helper Function
The next step is to define a helper function to automate the process.
This function takes the name of a stock and returns the stock’s historical record up to the current day. You can easily modify this function to custom-select the time period that you are interested in.
3. Find a Ticker Symbol For Your Stock of Interest
Suppose you are interested in gathering historical data for some of the world’s largest stock market indices, for example, S&P 500, Nikkei, London Exchange (Financial Times Stock Exchange 100), and Hang Seng indices.
To do this, you will need to first find what is called a ticker symbol for each index. You can find search for these from the Yahoo Finance website.
Here are the ticker symbols for our indices of interest:
- S&P 500: ^GSPC
- Nikkei: ^N225
- London Exchange: ^FTSE
- Hang Seng Index: ^HSI
4. Collect the Data Using the Helper Function
When you call the helper function, it will return a dataframe that looks like this:
get_yf_data("^GSPC")
The column names are self-explanatory but for those who are unfamiliar with stock market data, here is a simple definition for each column:
- OPEN: An opening price of a stock. It is the first price a stock trades at when the market opens at 9:30 a.m.
- CLOSE: A closing price of a stock. It is the last price it trades at when the market closes at 4:00 p.m.
- HIGH: A stock’s intraday highest trading price. It is represented by the highest point on a day’s stock chart.
- LOW: A stock’s intraday lowest trading price. It is represented by the lowest point on a day’s stock chart.
- VOLUME: The number of shares of a stock that was traded between its daily open and close.
- DIVIDENDS: Distribution of a company’s earnings to its shareholders, determined by the company’s board of directors. Dividends are typically distributed quarterly.
- STOCK SPLITS: An increase in the number of a company’s shares to boost the stock’s liquidity. This does not change the total dollar value of all shares outstanding since the company’s value remains the same.
In our example, the dividends and stock splits columns are empty (zero) because market indices don’t give out dividends or are split.
Since we have multiple indices that we want to collect data for, you can use a loop to gather all the data and store them in a single dictionary:
data = {}
stock_list = ["^GSPC", "^N225", "^FTSE", "^HSI"]
for stock in stock_list:
data[stock] = get_yf_data(stock)
Or you could store them individually as separate dataframes:
sp500 = get_yf_data("^GSPC")
nikkei = get_yf_data("^N225")
ftse = get_yf_data("^FTSE")
hsi = get_yf_data("^HSI")
APIs
While the Yahoo Finance library is excellent at providing historical data for securities, it does have its limitation. For example, it doesn’t provide any data for commodity prices, such as gold and crude oil, or economic indicators such as Gross Domestic Product (GDP), unemployment rate, consumer index, or median income. Instead, we need to make use of other APIs, such as NASDAQ, Alpha Vantage, and FRED.
1. Get API Keys
To access these APIs, you need to first get API keys. You can do this simply by visiting their website, creating an account, and applying for an API key.
Here are the links for each API: NASDAQ, Alpha Vantage, and FRED.
2. Define URLs For Calling APIs
For this exercise, we will collect data for the following commodities and economic indicators.
From NASDAQ API:
- Gold
- Silver
- GDP
- Unemployment Rate
- Median Income
From Alpha Vantage:
- Consumer Price Index (CPI)
From FRED:
- Consumer Sentiment Index
- Crude Oil
And here are the URL links for each item:
## API keys (fill in your own API keys)
nasdaq_key = ""
alpha_key = ""
fred_key = ""
## URL for Economic Indicators (from Nasdaq)
url_list = ["WGC/GOLD_DAILY_USD",
"LBMA/SILVER",
"FRED/GDP",
"FRED/UNEMPLOY",
"FRED/MEHOINUSA672N"]
## URL for Alpha Vantage
cpi_url = f"https://www.alphavantage.co/query?function=CPI&interval=monthly&apikey={alpha_key}"
## URL for FRED
con_sentiment_url = f"https://api.stlouisfed.org/fred/series/observations?series_id=UMCSENT&api_key={fred_key}&file_type=json"
oil_url = f"https://api.stlouisfed.org/fred/series/observations?series_id=DCOILBRENTEU&api_key={fred_key}&file_type=json"
## Names of the datasets to be obtained from Nasdaq API
data_names = ["gold", "silver", "gdp", "unemploy", "med_income"]
3. Define Helper Functions
We will now define helper functions for each API, which will automate the process of converting JSON-format data collected from an API into a dataframe.
4. Call APIs Using the Helper Functions
Now, let’s put our helper functions to use.
We will create an empty list, and store our data in this list, which we will eventually turn into a single dataframe.
df_list = []
for i, url in enumerate(url_list):
r = connect_nasdaq(url)
df = getNasdaqData(r, data_names[i])
df_list.append(df)
## Connect to Alpha Vantage CPI for CPI data
cpi_r = requests.get(cpi_url)
cpi_df = getAlphaData(cpi_r, "cpi")
df_list.append(cpi_df)
## Connect to FRED API for University of Michigan Consumer Sentiment Index & Crude Oil data
con_sentiment_r = requests.get(con_sentiment_url)
con_sentiment_df = getFREDData(con_sentiment_r, "con_sentiment")
df_list.append(con_sentiment_df)
oil_r = requests.get(oil_url)
oil_df = getFREDData(oil_r, "oil")
df_list.append(oil_df)
## Set the date as index
for df in df_list:
df.set_index("date", inplace = True)
## Concatenate All DFs
api_df = pd.concat(df_list, axis = 1)
This results in the following dataframe:
Don’t be alarmed that there are null (NaN) values. It is simply because some economic indicators such as GDP and unemployment rate are only given monthly, quarterly, or annually. Also, the data for CPI dates back all the way to January 1st, 1913 while the rest of the data are simply not available for that time period. We can slice the dataframe to fit within the time period we are interested in, for example:
start_date = "1990-1-1"
end_date = "2019-12-31"
mask = (df.index >= start_date) & (df.index <= end_date)
df = df.loc[mask]
And there you have it. I hope the codes provided in this post will be helpful when you are doing your own project. Please leave any comments/feedback/questions, and also feel free to connect with me via LinkedIn. 😃