In [1]:
import pandas as pd
import numpy as np

import requests

import warnings
warnings.filterwarnings("ignore")

Important Terms


Here I'll build up some vocabulary around using REST APIs.

I find it easier to understand larger concepts when I have an idea of the smaller pieces that make them up and have the technical vocabulary to speak and read about them. By now, you know that there is almost never one term for one thing in the tech field, and honestly many other fields, so when researching APIs be prepared for more of the same. Keeping that in mind, I will try to use the most common terms below as well as include their common synonyms.

Below is more than you need to know to start playing with the requests python library, but it still barely scratches the surface! When you're ready for more, Here is a crash course in REST API design that helps in understanding how to use them to request data.


What is a Rest API?

API stands for Application Programming Interface, and it is a set of rules that allows programs to communicate with each other. For my purposes, it will allow for communication between a client, my computer, and a server. I'm interested in using an API to acquire resources or data sets using requests or URLs combined with HTTP methods.

REST, REpresentational State Transfer, is an architectural style and approach to communications that uses HTTP requests to access and use data. It has a specific set of guiding constraints or rules that it follows, but those are deeper than I want to go here because I'm not building an API; I just want a solid foundation for acquiring data from an API.

REST API Architecture Image

JSON stands for JavaScript Object Notation and is a language-independent text format that represents objects as name/value pairs (think Python dictionary objects) arrays (think Python list objects), strings, and integers.

For example:

JSON Object Map

JSON


So What?

Putting It All Together

A RESTful JSON API uses requests with HTTP methods to GET (retrieve a response object), PUT (change the state of or update your response object), POST (create the response object), and DELETE (remove a response object) data. It allows users to connect to, manage, and interact with cloud services.

The requests module allows me to send HTTP requests to a REST API using python. The HTTP request to a RESTful JSON API returns a response object or resource with the data in JSON format. I can get a variety of information from this resource using different methods, some of which are explained in the table below in Now What.


TLDR - Requests

HTTP method == Action --> Endpoint == Where --> Resource == What

Anatomy of a Request:

Anatomy of a REST API Query


Requests are made up of four different parts, but this notebook will only dig into HTTP methods and endpoints.

  • The Headers (not a requirement for all APIs)
  • The Body (only used with POST, PUT, PATCH, or DELETE)

Parts of a Request Chart


The Method Options

The HTTP method I use indicates the action or type of interaction my request will have with the resource -> CRUD (Create, Read, Update, Delete).

GET - A GET request performs a READ operation. A GET request to a server sends you back the data you requested.

POST - A POST request performs a CREATE operation.

PUT - A PUT request performs an UPDATE operation.

PATCH - A PATCH request performs an UPDATE operation.

DELETE - A DELETE request performs a DELETE operation.

In this notebook, I will only be using the HTTP GET method.


The Components of an Endpoint

Here, I will break down an example API url or endpoint into its individual components.

root-endpoint + the path + query parameters (optional)

The root-endpoint is the starting point of the API you are requesting from.

# the root-endpoint

https://api.github.com

The path directs your request and determines the resource you receive as a response to your GET request. The example below gets a list of repositories by a certain user, in this case faithkane3. The API documentation lets me know what paths are available to me.

# the root-endpoint + the path

https://api.github.com/users/faithkane3/repos

The query parameters allow you to modify (sort and filter) your request with key-value pairs. They begin with a ? and are separated with & when you chain more than one. I am using the sort parameter to return my repositories in order of my most recent pushes. This will not return any information about the repos I own or push to in other organizations besides faithkane3.

# the root-endpoint + the path + query parameters

https://api.github.com/users/faithkane3/repos?sort=pushed


Now What Do I Do With a Response Object?

Properties and Methods of Response Objects (not an exhaustive list)

Property/Method Description
.content Returns the response in bytes
.json() Returns a JSON object of the result if result in JSON format, otherwise raises an error
.ok Returns True if status_code is less than 200, otherwise False
.request Returns the request object that requested this response
.status_code Returns a number that indicates the status
.text Returns the content of the response, in unicode
.url Returns the URL of the response

Acquire


This article is housed in the Time Series Acquire Codeup Curriculum under Further Reading and was really helpful is helping me go through the following steps. Check it out.

Reading the documentation provided by a site's API will tell you how to use it.

For example, what if I want to get information from GitHub's API? I can check out it's API documentation at the url below:

https://docs.github.com/en/free-pro-team@latest/rest

What if I wanted to get a list of repositories from a specific Github user?

image.jpg

In [2]:
url = 'https://api.github.com/users/faithkane3/repos'
response = requests.get(url)

.ok returns a boolean communicating if the request was successful.

In [3]:
response.ok
Out[3]:
True

.status_code returns the HTTP response status code.

In [4]:
response.status_code
Out[4]:
200

HTTP Status Codes and Error Messages

200+ means the request has succeeded.

300+ means the request is redirected to another URL

400+ means an error that originates from the client has occurred

500+ means an error that originates from the server has occurred


.text

The .text property returns the raw response text from my response. It's returning a string with an array of key/pair values. In Python, I would recognize these as a list object with nested dictionary objects.

In [5]:
# I'm limiting the text string to 500 characters bc it's one long string!

print(type(response.text))
response.text[:500]
<class 'str'>
Out[5]:
'[{"id":206128554,"node_id":"MDEwOlJlcG9zaXRvcnkyMDYxMjg1NTQ=","name":"101-exercises","full_name":"faithkane3/101-exercises","private":false,"owner":{"login":"faithkane3","id":43799876,"node_id":"MDQ6VXNlcjQzNzk5ODc2","avatar_url":"https://avatars0.githubusercontent.com/u/43799876?v=4","gravatar_id":"","url":"https://api.github.com/users/faithkane3","html_url":"https://github.com/faithkane3","followers_url":"https://api.github.com/users/faithkane3/followers","following_url":"https://api.github.co'

.json()

I can use the .json() method to return my response as python objects and access a list of dictionaries for all my Github repos. The response will not always be a list of dictionaries; it might be a dictionary. Just make sure you know what type of structure you are working with.

For example:

List of Dictionaries Image

In this case, I find that there are 30 dictionaries in my list, which I have stored in the variable data.

In [6]:
data = response.json()
In [7]:
# My variable Data is a list holding 30 elements.

print(type(data))
len(data)
<class 'list'>
Out[7]:
30
In [8]:
# I can take a look at the first element in my list; I see it's a dictionary.

data[:1]
Out[8]:
[{'id': 206128554,
  'node_id': 'MDEwOlJlcG9zaXRvcnkyMDYxMjg1NTQ=',
  'name': '101-exercises',
  'full_name': 'faithkane3/101-exercises',
  'private': False,
  'owner': {'login': 'faithkane3',
   'id': 43799876,
   'node_id': 'MDQ6VXNlcjQzNzk5ODc2',
   'avatar_url': 'https://avatars0.githubusercontent.com/u/43799876?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/faithkane3',
   'html_url': 'https://github.com/faithkane3',
   'followers_url': 'https://api.github.com/users/faithkane3/followers',
   'following_url': 'https://api.github.com/users/faithkane3/following{/other_user}',
   'gists_url': 'https://api.github.com/users/faithkane3/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/faithkane3/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/faithkane3/subscriptions',
   'organizations_url': 'https://api.github.com/users/faithkane3/orgs',
   'repos_url': 'https://api.github.com/users/faithkane3/repos',
   'events_url': 'https://api.github.com/users/faithkane3/events{/privacy}',
   'received_events_url': 'https://api.github.com/users/faithkane3/received_events',
   'type': 'User',
   'site_admin': False},
  'html_url': 'https://github.com/faithkane3/101-exercises',
  'description': 'This is the repo for my 101-exercises from Kaggle',
  'fork': False,
  'url': 'https://api.github.com/repos/faithkane3/101-exercises',
  'forks_url': 'https://api.github.com/repos/faithkane3/101-exercises/forks',
  'keys_url': 'https://api.github.com/repos/faithkane3/101-exercises/keys{/key_id}',
  'collaborators_url': 'https://api.github.com/repos/faithkane3/101-exercises/collaborators{/collaborator}',
  'teams_url': 'https://api.github.com/repos/faithkane3/101-exercises/teams',
  'hooks_url': 'https://api.github.com/repos/faithkane3/101-exercises/hooks',
  'issue_events_url': 'https://api.github.com/repos/faithkane3/101-exercises/issues/events{/number}',
  'events_url': 'https://api.github.com/repos/faithkane3/101-exercises/events',
  'assignees_url': 'https://api.github.com/repos/faithkane3/101-exercises/assignees{/user}',
  'branches_url': 'https://api.github.com/repos/faithkane3/101-exercises/branches{/branch}',
  'tags_url': 'https://api.github.com/repos/faithkane3/101-exercises/tags',
  'blobs_url': 'https://api.github.com/repos/faithkane3/101-exercises/git/blobs{/sha}',
  'git_tags_url': 'https://api.github.com/repos/faithkane3/101-exercises/git/tags{/sha}',
  'git_refs_url': 'https://api.github.com/repos/faithkane3/101-exercises/git/refs{/sha}',
  'trees_url': 'https://api.github.com/repos/faithkane3/101-exercises/git/trees{/sha}',
  'statuses_url': 'https://api.github.com/repos/faithkane3/101-exercises/statuses/{sha}',
  'languages_url': 'https://api.github.com/repos/faithkane3/101-exercises/languages',
  'stargazers_url': 'https://api.github.com/repos/faithkane3/101-exercises/stargazers',
  'contributors_url': 'https://api.github.com/repos/faithkane3/101-exercises/contributors',
  'subscribers_url': 'https://api.github.com/repos/faithkane3/101-exercises/subscribers',
  'subscription_url': 'https://api.github.com/repos/faithkane3/101-exercises/subscription',
  'commits_url': 'https://api.github.com/repos/faithkane3/101-exercises/commits{/sha}',
  'git_commits_url': 'https://api.github.com/repos/faithkane3/101-exercises/git/commits{/sha}',
  'comments_url': 'https://api.github.com/repos/faithkane3/101-exercises/comments{/number}',
  'issue_comment_url': 'https://api.github.com/repos/faithkane3/101-exercises/issues/comments{/number}',
  'contents_url': 'https://api.github.com/repos/faithkane3/101-exercises/contents/{+path}',
  'compare_url': 'https://api.github.com/repos/faithkane3/101-exercises/compare/{base}...{head}',
  'merges_url': 'https://api.github.com/repos/faithkane3/101-exercises/merges',
  'archive_url': 'https://api.github.com/repos/faithkane3/101-exercises/{archive_format}{/ref}',
  'downloads_url': 'https://api.github.com/repos/faithkane3/101-exercises/downloads',
  'issues_url': 'https://api.github.com/repos/faithkane3/101-exercises/issues{/number}',
  'pulls_url': 'https://api.github.com/repos/faithkane3/101-exercises/pulls{/number}',
  'milestones_url': 'https://api.github.com/repos/faithkane3/101-exercises/milestones{/number}',
  'notifications_url': 'https://api.github.com/repos/faithkane3/101-exercises/notifications{?since,all,participating}',
  'labels_url': 'https://api.github.com/repos/faithkane3/101-exercises/labels{/name}',
  'releases_url': 'https://api.github.com/repos/faithkane3/101-exercises/releases{/id}',
  'deployments_url': 'https://api.github.com/repos/faithkane3/101-exercises/deployments',
  'created_at': '2019-09-03T16:59:57Z',
  'updated_at': '2019-09-03T17:13:33Z',
  'pushed_at': '2019-09-03T17:13:31Z',
  'git_url': 'git://github.com/faithkane3/101-exercises.git',
  'ssh_url': 'git@github.com:faithkane3/101-exercises.git',
  'clone_url': 'https://github.com/faithkane3/101-exercises.git',
  'svn_url': 'https://github.com/faithkane3/101-exercises',
  'homepage': None,
  'size': 12,
  'stargazers_count': 0,
  'watchers_count': 0,
  'language': 'Jupyter Notebook',
  'has_issues': True,
  'has_projects': True,
  'has_downloads': True,
  'has_wiki': True,
  'has_pages': False,
  'forks_count': 0,
  'mirror_url': None,
  'archived': False,
  'disabled': False,
  'open_issues_count': 0,
  'license': None,
  'forks': 0,
  'open_issues': 0,
  'watchers': 0,
  'default_branch': 'master'}]
In [9]:
# I can access the keys of the first dictionary to understand the format of each dictionary in my list

print(f'There are {len(data[0])} key:item pairs in each dictionary in my list.\n')
data[0].keys()
There are 73 key:item pairs in each dictionary in my list.

Out[9]:
dict_keys(['id', 'node_id', 'name', 'full_name', 'private', 'owner', 'html_url', 'description', 'fork', 'url', 'forks_url', 'keys_url', 'collaborators_url', 'teams_url', 'hooks_url', 'issue_events_url', 'events_url', 'assignees_url', 'branches_url', 'tags_url', 'blobs_url', 'git_tags_url', 'git_refs_url', 'trees_url', 'statuses_url', 'languages_url', 'stargazers_url', 'contributors_url', 'subscribers_url', 'subscription_url', 'commits_url', 'git_commits_url', 'comments_url', 'issue_comment_url', 'contents_url', 'compare_url', 'merges_url', 'archive_url', 'downloads_url', 'issues_url', 'pulls_url', 'milestones_url', 'notifications_url', 'labels_url', 'releases_url', 'deployments_url', 'created_at', 'updated_at', 'pushed_at', 'git_url', 'ssh_url', 'clone_url', 'svn_url', 'homepage', 'size', 'stargazers_count', 'watchers_count', 'language', 'has_issues', 'has_projects', 'has_downloads', 'has_wiki', 'has_pages', 'forks_count', 'mirror_url', 'archived', 'disabled', 'open_issues_count', 'license', 'forks', 'open_issues', 'watchers', 'default_branch'])
In [10]:
# I can get a list of the names of all of my repos on Github by returning the value for the key 'name' for each dictionary

repo_list = [data[i]['name'] for i in range(len(data))]
repo_list
Out[10]:
['101-exercises',
 'bayes-methodologies-exercises',
 'checkbook_application',
 'classification',
 'codeup_review',
 'database-exercises',
 'ds-methodologies-exercises',
 'faithkane3.github.io',
 'flask_intro',
 'git_warmup',
 'intro-to-deep-learning-with-keras',
 'karma_atm',
 'makeovermonday',
 'natural_language_processing',
 'nlp',
 'numpy-100',
 'pandas',
 'pandas_practice',
 'python',
 'python-exercises',
 'python_101_ds',
 'python_fun',
 'regression',
 'resources',
 'side_projects',
 'sql',
 'sql_practice',
 'statistics-exercises',
 'time_series',
 'zillow_project']

OR I could just make it a pandas DataFrame! When I see a list of dictionaries, I see a DataFrame ready to happen.

In [11]:
repos_df = pd.DataFrame(data)
repos_df.head(2)
Out[11]:
id node_id name full_name private owner html_url description fork url ... forks_count mirror_url archived disabled open_issues_count license forks open_issues watchers default_branch
0 206128554 MDEwOlJlcG9zaXRvcnkyMDYxMjg1NTQ= 101-exercises faithkane3/101-exercises False {'login': 'faithkane3', 'id': 43799876, 'node_... https://github.com/faithkane3/101-exercises This is the repo for my 101-exercises from Kaggle False https://api.github.com/repos/faithkane3/101-ex... ... 0 None False False 0 None 0 0 0 master
1 213958222 MDEwOlJlcG9zaXRvcnkyMTM5NTgyMjI= bayes-methodologies-exercises faithkane3/bayes-methodologies-exercises False {'login': 'faithkane3', 'id': 43799876, 'node_... https://github.com/faithkane3/bayes-methodolog... Bayes exercises on methodologies True https://api.github.com/repos/faithkane3/bayes-... ... 0 None False False 0 None 0 0 0 master

2 rows × 73 columns

In [12]:
repos_df.name.tolist()
Out[12]:
['101-exercises',
 'bayes-methodologies-exercises',
 'checkbook_application',
 'classification',
 'codeup_review',
 'database-exercises',
 'ds-methodologies-exercises',
 'faithkane3.github.io',
 'flask_intro',
 'git_warmup',
 'intro-to-deep-learning-with-keras',
 'karma_atm',
 'makeovermonday',
 'natural_language_processing',
 'nlp',
 'numpy-100',
 'pandas',
 'pandas_practice',
 'python',
 'python-exercises',
 'python_101_ds',
 'python_fun',
 'regression',
 'resources',
 'side_projects',
 'sql',
 'sql_practice',
 'statistics-exercises',
 'time_series',
 'zillow_project']

How about a list of urls for my repos? Handy for scraping repos!

In [13]:
repos_df.html_url.tolist()
Out[13]:
['https://github.com/faithkane3/101-exercises',
 'https://github.com/faithkane3/bayes-methodologies-exercises',
 'https://github.com/faithkane3/checkbook_application',
 'https://github.com/faithkane3/classification',
 'https://github.com/faithkane3/codeup_review',
 'https://github.com/faithkane3/database-exercises',
 'https://github.com/faithkane3/ds-methodologies-exercises',
 'https://github.com/faithkane3/faithkane3.github.io',
 'https://github.com/faithkane3/flask_intro',
 'https://github.com/faithkane3/git_warmup',
 'https://github.com/faithkane3/intro-to-deep-learning-with-keras',
 'https://github.com/faithkane3/karma_atm',
 'https://github.com/faithkane3/makeovermonday',
 'https://github.com/faithkane3/natural_language_processing',
 'https://github.com/faithkane3/nlp',
 'https://github.com/faithkane3/numpy-100',
 'https://github.com/faithkane3/pandas',
 'https://github.com/faithkane3/pandas_practice',
 'https://github.com/faithkane3/python',
 'https://github.com/faithkane3/python-exercises',
 'https://github.com/faithkane3/python_101_ds',
 'https://github.com/faithkane3/python_fun',
 'https://github.com/faithkane3/regression',
 'https://github.com/faithkane3/resources',
 'https://github.com/faithkane3/side_projects',
 'https://github.com/faithkane3/sql',
 'https://github.com/faithkane3/sql_practice',
 'https://github.com/faithkane3/statistics-exercises',
 'https://github.com/faithkane3/time_series',
 'https://github.com/faithkane3/zillow_project']

Using Query Parameters

Remember above I stated that some APIs allow for the addition of query parameters to modify my request? Well, GitHub does, and here is a simple example. I want to request the repo names for faithkane3, but this time I want my response to be sorted to start with the repo most recently pushed to and end with the least recently one pushed to. Check it out. This changes the order of my repo list from alphabetic, starting with numbers first, to order of pushes starting with most recent.

In [14]:
url = 'https://api.github.com/users/faithkane3/repos?sort=pushed'
response = requests.get(url)
In [15]:
data = response.json()
In [16]:
repo_list = [data[i]['name'] for i in range(len(data))]
repo_list
Out[16]:
['time_series',
 'nlp',
 'python',
 'regression',
 'classification',
 'pandas',
 'ds-methodologies-exercises',
 'faithkane3.github.io',
 'sql',
 'pandas_practice',
 'zillow_project',
 'codeup_review',
 'makeovermonday',
 'python-exercises',
 'side_projects',
 'flask_intro',
 'intro-to-deep-learning-with-keras',
 'natural_language_processing',
 'python_fun',
 'git_warmup',
 'bayes-methodologies-exercises',
 'statistics-exercises',
 'resources',
 'numpy-100',
 'karma_atm',
 'checkbook_application',
 'database-exercises',
 'sql_practice',
 '101-exercises',
 'python_101_ds']
In [17]:
repos_df = pd.DataFrame(data)
repos_df.head(2)
Out[17]:
id node_id name full_name private owner html_url description fork url ... forks_count mirror_url archived disabled open_issues_count license forks open_issues watchers default_branch
0 305362805 MDEwOlJlcG9zaXRvcnkzMDUzNjI4MDU= time_series faithkane3/time_series False {'login': 'faithkane3', 'id': 43799876, 'node_... https://github.com/faithkane3/time_series None False https://api.github.com/repos/faithkane3/time_s... ... 0 None False False 0 None 0 0 0 main
1 305364528 MDEwOlJlcG9zaXRvcnkzMDUzNjQ1Mjg= nlp faithkane3/nlp False {'login': 'faithkane3', 'id': 43799876, 'node_... https://github.com/faithkane3/nlp None False https://api.github.com/repos/faithkane3/nlp ... 0 None False False 0 None 0 0 1 main

2 rows × 73 columns


Want More Practice?

Here is a link to a collection of free APIs you can practice with if you like. Remember that the key to getting the data you want from an API is carefully reading the documentation it provides, so you can navigate efficiently.