How do I read a CSV file from Google Drive using Python Colab?
Seven steps to read a CSV file using PyDrive
Tired of that old story: download CSV file, upload into the collab, read/load the data frame and after a while, you need to repeat everything again because the information was not stored there?
Don’t worry, your problems are over!
I will show you a very useful technique that I have used in my Data Science projects using Google Colab (Python 3). As you are doing now, I went to the community and found many colleagues sharing their knowledge, I decided to do the same.
In this article, I will show you how to use PyDrive to read a file in CSV format directly from your Google Drive using Python3 in the Google Colab environment.
Let’s go to step by step?
1) Install PyDrive
The first step is to install PyDrive. As we are using a Notebook environment, the installation using PIP will always have the exclamation mark (!) in front.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
2) Authenticate
The second step is to authenticate and create a PyDrive client. See there:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
3) Authorizing
As soon as you execute this part of the code, the authenticator will ask you to click on the link that appears in your notebook. You must follow this third step, click on the link, authenticate with your Gmail account and copy the generated code. Return to your notebook and paste this code into the requested area. Press Enter, and you’re done, you’re authenticated!
4) Generating a shareable link
Now comes a more tricky part. This is the fourth step. Go to your Google Drive, find your file and perform the same procedure to share that file, generating a shareable link:
1) find your file and click on it;
2) click on the “share” button;
3) generate a shareable link “get link”
5) Getting the file_id
For the fifth step, pretend that this is the full URL (it is fake, don’t worry :). Extract only the selected code. This is your file_id.
fileDownloaded = drive.CreateFile({‘id’:’XXXXXXXXXXXXXX’})
6) Load the CSV
For the sixth step, tell your notebook now the name of the CSV file you want to load into memory.
fileDownloaded.GetContentFile(‘example.csv’)
7) Showing the Results
The last, seventh step, just use the good-Pandas, turn this into a Data Frame and display its header. o/
import pandas as pd
df = pd.read_csv(‘example.csv’, delimiter=’;’ )
df.head()
That’s it folks!!!
I hope I have helped.
Whenever you need something, leave a comment there.