import pandas as pd
import numpy as np
import requests
import warnings
warnings.filterwarnings("ignore")
Here I'll build up some vocabulary around using REST APIs.
I find it easier to understand larger concepts when I have an idea of the smaller pieces that make them up and have the technical vocabulary to speak and read about them. By now, you know that there is almost never one term for one thing in the tech field, and honestly many other fields, so when researching APIs be prepared for more of the same. Keeping that in mind, I will try to use the most common terms below as well as include their common synonyms.
Below is more than you need to know to start playing with the requests
python library, but it still barely scratches the surface! When you're ready for more, Here is a crash course in REST API design that helps in understanding how to use them to request data.
API stands for Application Programming Interface, and it is a set of rules that allows programs to communicate with each other. For my purposes, it will allow for communication between a client, my computer, and a server. I'm interested in using an API to acquire resources or data sets using requests or URLs combined with HTTP methods.
REST, REpresentational State Transfer, is an architectural style and approach to communications that uses HTTP requests to access and use data. It has a specific set of guiding constraints or rules that it follows, but those are deeper than I want to go here because I'm not building an API; I just want a solid foundation for acquiring data from an API.
JSON stands for JavaScript Object Notation and is a language-independent text format that represents objects as name/value pairs (think Python dictionary objects) arrays (think Python list objects), strings, and integers.
For example:
Putting It All Together
A RESTful JSON API uses requests with HTTP methods to GET (retrieve a response object), PUT (change the state of or update your response object), POST (create the response object), and DELETE (remove a response object) data. It allows users to connect to, manage, and interact with cloud services.
The requests
module allows me to send HTTP requests to a REST API using python. The HTTP request to a RESTful JSON API returns a response object or resource with the data in JSON format. I can get a variety of information from this resource using different methods, some of which are explained in the table below in Now What.
HTTP method == Action --> Endpoint == Where --> Resource == What
Requests are made up of four different parts, but this notebook will only dig into HTTP methods and endpoints.
The HTTP method I use indicates the action or type of interaction my request will have with the resource -> CRUD (Create, Read, Update, Delete).
GET - A GET request performs a READ operation. A GET request to a server sends you back the data you requested.
POST - A POST request performs a CREATE operation.
PUT - A PUT request performs an UPDATE operation.
PATCH - A PATCH request performs an UPDATE operation.
DELETE - A DELETE request performs a DELETE operation.
In this notebook, I will only be using the HTTP GET method.
Here, I will break down an example API url or endpoint into its individual components.
root-endpoint + the path + query parameters (optional)
The root-endpoint is the starting point of the API you are requesting from.
# the root-endpoint
https://api.github.com
The path directs your request and determines the resource you receive as a response to your GET request. The example below gets a list of repositories by a certain user, in this case faithkane3
. The API documentation lets me know what paths are available to me.
# the root-endpoint + the path
https://api.github.com/users/faithkane3/repos
The query parameters allow you to modify (sort and filter) your request with key-value pairs. They begin with a ?
and are separated with &
when you chain more than one. I am using the sort
parameter to return my repositories in order of my most recent pushes. This will not return any information about the repos I own or push to in other organizations besides faithkane3
.
# the root-endpoint + the path + query parameters
https://api.github.com/users/faithkane3/repos?sort=pushed
Property/Method | Description |
---|---|
.content | Returns the response in bytes |
.json() | Returns a JSON object of the result if result in JSON format, otherwise raises an error |
.ok | Returns True if status_code is less than 200, otherwise False |
.request | Returns the request object that requested this response |
.status_code | Returns a number that indicates the status |
.text | Returns the content of the response, in unicode |
.url | Returns the URL of the response |
This article is housed in the Time Series Acquire Codeup Curriculum under Further Reading and was really helpful is helping me go through the following steps. Check it out.
Reading the documentation provided by a site's API will tell you how to use it.
For example, what if I want to get information from GitHub's API? I can check out it's API documentation at the url below:
https://docs.github.com/en/free-pro-team@latest/rest
What if I wanted to get a list of repositories from a specific Github user?
url = 'https://api.github.com/users/faithkane3/repos'
response = requests.get(url)
.ok
returns a boolean communicating if the request was successful.
response.ok
.status_code
returns the HTTP response status code.
response.status_code
HTTP Status Codes and Error Messages
200+ means the request has succeeded.
300+ means the request is redirected to another URL
400+ means an error that originates from the client has occurred
500+ means an error that originates from the server has occurred
.text
¶The .text
property returns the raw response text from my response. It's returning a string with an array of key/pair values. In Python, I would recognize these as a list object with nested dictionary objects.
# I'm limiting the text string to 500 characters bc it's one long string!
print(type(response.text))
response.text[:500]
.json()
¶I can use the .json()
method to return my response as python objects and access a list of dictionaries for all my Github repos. The response will not always be a list of dictionaries; it might be a dictionary. Just make sure you know what type of structure you are working with.
For example:
In this case, I find that there are 30 dictionaries in my list, which I have stored in the variable data
.
data = response.json()
# My variable Data is a list holding 30 elements.
print(type(data))
len(data)
# I can take a look at the first element in my list; I see it's a dictionary.
data[:1]
# I can access the keys of the first dictionary to understand the format of each dictionary in my list
print(f'There are {len(data[0])} key:item pairs in each dictionary in my list.\n')
data[0].keys()
# I can get a list of the names of all of my repos on Github by returning the value for the key 'name' for each dictionary
repo_list = [data[i]['name'] for i in range(len(data))]
repo_list
OR I could just make it a pandas DataFrame! When I see a list of dictionaries, I see a DataFrame ready to happen.
repos_df = pd.DataFrame(data)
repos_df.head(2)
repos_df.name.tolist()
How about a list of urls for my repos? Handy for scraping repos!
repos_df.html_url.tolist()
Remember above I stated that some APIs allow for the addition of query parameters to modify my request? Well, GitHub does, and here is a simple example. I want to request the repo names for faithkane3, but this time I want my response to be sorted to start with the repo most recently pushed to and end with the least recently one pushed to. Check it out. This changes the order of my repo list from alphabetic, starting with numbers first, to order of pushes starting with most recent.
url = 'https://api.github.com/users/faithkane3/repos?sort=pushed'
response = requests.get(url)
data = response.json()
repo_list = [data[i]['name'] for i in range(len(data))]
repo_list
repos_df = pd.DataFrame(data)
repos_df.head(2)
Here is a link to a collection of free APIs you can practice with if you like. Remember that the key to getting the data you want from an API is carefully reading the documentation it provides, so you can navigate efficiently.