Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update proj1 readme #86

Merged
merged 6 commits into from
Mar 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
297 changes: 94 additions & 203 deletions completed/filter_csv_notebook_complete.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,25 +11,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We're going to use built-in Python modules - programs really - to download a csv file from the Internet and save it locally.\n",
"So we know how to [download a CSV from the Interent and read the data](read_csv_notebook_complete.ipynb) using the `csv` library.\n",
"\n",
"CSV stands for comma-separated values. It's a common file format a file format that resembles a spreadsheet or database table in a text file.\n",
"Now we're ready to do something a bit more useful using these libraries, combined with the basics we learned on Day 1.\n",
"\n",
"So first, let's import two built-in Python modules: urllib and csv. \n",
"\n",
"* ```urllib``` is a module that allows Python to make http requests to URLs on the web to fetch HTML. It contains a submodule called request. And inside there we want a specific method called urlretrieve\n",
"\n",
"* ```csv``` is a module that helps Python work with tabular data extracted from spreadsheets and databases"
"Once again, we'll start by importing the library code for downloading a file and processing a CSV:"
]
},
{
"cell_type": "code",
"execution_count": 87,
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"from urllib.request import urlretrieve\n",
"import csv"
"import csv\n",
"from urllib.request import urlretrieve"
]
},
{
Expand All @@ -41,7 +37,7 @@
},
{
"cell_type": "code",
"execution_count": 88,
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -52,31 +48,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we need a URL to a CSV file out on the Internet.\n",
"\n",
"For this project we're going to download a CSV file that the [FDIC](https://www.fdic.gov/bank/individual/failed/banklist.html) compiles of all the banks that have failed since October 1, 2000.\n",
"\n",
"The file we want is at https://s3.amazonaws.com/datanicar/banklist.csv.\n",
"Now we can download the file using `urlretrieve`. Don't forget that it takes two arguments:\n",
"\n",
"If the internet is uncooperative, we can also use the local version of the file in the ```project1/data/``` directory, and structure out code a little differently.\n",
"\n",
"To do this, we use that program within the ```urllib``` module to download the file and save it to our project folder. It's called ```urlretrieve``` and for our purposes starting out think of it as a way to download a file from the Internet.\n",
"\n",
"`urlretrieve` takes two arguments to download a file. First specify our target URL, and then we give it a name for the file we want to create."
"- the URL\n",
"- the name we'll give the file when we save it locally"
]
},
{
"cell_type": "code",
"execution_count": 89,
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('banklist.csv', <http.client.HTTPMessage at 0x110c740f0>)"
"('banklist.csv', <http.client.HTTPMessage at 0x114bb5910>)"
]
},
"execution_count": 89,
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -89,216 +78,118 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The output shows we successfully downloaded the file and saved it\n",
"The output shows we successfully downloaded the file and saved it.\n",
"\n",
"For this exercise, we'll:\n",
"\n",
"Let's open a new file so we can filter just the data we want. We add the newline parameter when we open the file to write so it doesn't add [additional, blank rows on Windows machines](https://stackoverflow.com/questions/3348460/csv-file-written-with-python-has-blank-lines-between-each-row/3348664)."
"- read the data\n",
"- filter the data for banks in California and save them in a list\n",
"- save the California banks list to a new CSV file\n",
"\n",
"> NOTE: Below, we use the newline argument when we open the file to write so it doesn't add [additional, blank rows on Windows machines](https://stackoverflow.com/questions/3348460/csv-file-written-with-python-has-blank-lines-between-each-row/3348664)."
]
},
{
"cell_type": "code",
"execution_count": 90,
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"filtered_file = open('california_banks.csv', 'w', newline='')"
"ca_banks = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use the writer method to write data to a file by passing in the name of the new file as the first argument and delimiter as the the second.\n",
"We'll use the `csv` library's [writer](https://docs.python.org/3/library/csv.html#csv.writer) functionality to...well...write data to a file by passing in the name of the new file as the first argument and delimiter as the second.\n",
"\n",
"Then we will go ahead and use python's csv reader to open the file and see what is inside.\n",
"Then we'll use `reader` to open the file and see what is inside.\n",
"\n",
"We specify the name of the file we just created, and we add a setting so we can open and read almost any CSV file."
]
},
{
"cell_type": "code",
"execution_count": 91,
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Bank Name', 'City', 'ST', 'CERT', 'Acquiring Institution', 'Closing Date', 'Updated Date']\n",
"['Frontier Bank, FSB D/B/A El Paseo Bank', 'Palm Desert', 'CA', '34738', 'Bank of Southern California, N.A.', '7-Nov-14', '10-Nov-16']\n",
"<class 'list'>\n",
"7\n",
"['Palm Desert National Bank', 'Palm Desert', 'CA', '23632', 'Pacific Premier Bank', '27-Apr-12', '7-Dec-15']\n",
"<class 'list'>\n",
"7\n",
"['Citizens Bank of Northern California', 'Nevada City', 'CA', '33983', 'Tri Counties Bank', '23-Sep-11', '7-Jan-18']\n",
"<class 'list'>\n",
"7\n",
"['San Luis Trust Bank, FSB', 'San Luis Obispo', 'CA', '34783', 'First California Bank', '18-Feb-11', '12-Sep-16']\n",
"<class 'list'>\n",
"7\n",
"['Charter Oak Bank', 'Napa', 'CA', '57855', 'Bank of Marin', '18-Feb-11', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Canyon National Bank', 'Palm Springs', 'CA', '34692', 'Pacific Premier Bank', '11-Feb-11', '19-Aug-14']\n",
"<class 'list'>\n",
"7\n",
"['First Vietnamese American Bank', 'Westminster', 'CA', '57885', 'Grandpoint Bank', '5-Nov-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Western Commercial Bank', 'Woodland Hills', 'CA', '58087', 'First California Bank', '5-Nov-10', '12-Sep-16']\n",
"<class 'list'>\n",
"7\n",
"['Sonoma Valley Bank', 'Sonoma', 'CA', '27259', 'Westamerica Bank', '20-Aug-10', '8-Aug-18']\n",
"<class 'list'>\n",
"7\n",
"['Los Padres Bank', 'Solvang', 'CA', '32165', 'Pacific Western Bank', '20-Aug-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Butte Community Bank', 'Chico', 'CA', '33219', 'Rabobank, N.A.', '20-Aug-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Pacific State Bank', 'Stockton', 'CA', '27090', 'Rabobank, N.A.', '20-Aug-10', '29-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Granite Community Bank, NA', 'Granite Bay', 'CA', '57315', 'Tri Counties Bank', '28-May-10', '7-Sep-17']\n",
"<class 'list'>\n",
"7\n",
"['1st Pacific Bank of California', 'San Diego', 'CA', '35517', 'City National Bank', '7-May-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Tamalpais Bank', 'San Rafael', 'CA', '33493', 'Union Bank, N.A.', '16-Apr-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Innovative Bank', 'Oakland', 'CA', '23876', 'Center Bank', '16-Apr-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['La Jolla Bank, FSB', 'La Jolla', 'CA', '32423', 'OneWest Bank, FSB', '19-Feb-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['First Regional Bank', 'Los Angeles', 'CA', '23011', 'First-Citizens Bank & Trust Company', '29-Jan-10', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['First Federal Bank of California, F.S.B.', 'Santa Monica', 'CA', '28536', 'OneWest Bank, FSB', '18-Dec-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Imperial Capital Bank', 'La Jolla', 'CA', '26348', 'City National Bank', '18-Dec-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Pacific Coast National Bank', 'San Clemente', 'CA', '57914', 'Sunwest Bank', '13-Nov-09', '10-Apr-17']\n",
"<class 'list'>\n",
"7\n",
"['United Commercial Bank', 'San Francisco', 'CA', '32469', 'East West Bank', '6-Nov-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Pacific National Bank', 'San Francisco', 'CA', '30006', 'U.S. Bank N.A.', '30-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['California National Bank', 'Los Angeles', 'CA', '34659', 'U.S. Bank N.A.', '30-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['San Diego National Bank', 'San Diego', 'CA', '23594', 'U.S. Bank N.A.', '30-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['San Joaquin Bank', 'Bakersfield', 'CA', '23266', 'Citizens Business Bank', '16-Oct-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Affinity Bank', 'Ventura', 'CA', '27197', 'Pacific Western Bank', '28-Aug-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Temecula Valley Bank', 'Temecula', 'CA', '34341', 'First-Citizens Bank & Trust Company', '17-Jul-09', '20-Oct-16']\n",
"<class 'list'>\n",
"7\n",
"['Vineyard Bank', 'Rancho Cucamonga', 'CA', '23556', 'California Bank & Trust', '17-Jul-09', '14-Sep-18']\n",
"<class 'list'>\n",
"7\n",
"['Mirae Bank', 'Los Angeles', 'CA', '57332', 'Wilshire State Bank', '26-Jun-09', '21-Feb-18']\n",
"<class 'list'>\n",
"7\n",
"['MetroPacific Bank', 'Irvine', 'CA', '57893', 'Sunwest Bank', '26-Jun-09', '5-Feb-15']\n",
"<class 'list'>\n",
"7\n",
"['First Bank of Beverly Hills', 'Calabasas', 'CA', '32069', 'No Acquirer', '24-Apr-09', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['County Bank', 'Merced', 'CA', '22574', 'Westamerica Bank', '6-Feb-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Alliance Bank', 'Culver City', 'CA', '23124', 'California Bank & Trust', '6-Feb-09', '8-Aug-18']\n",
"<class 'list'>\n",
"7\n",
"['1st Centennial Bank', 'Redlands', 'CA', '33025', 'First California Bank', '23-Jan-09', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['PFF Bank & Trust', 'Pomona', 'CA', '28344', 'U.S. Bank, N.A.', '21-Nov-08', '31-Jan-19']\n",
"<class 'list'>\n",
"7\n",
"['Downey Savings & Loan', 'Newport Beach', 'CA', '30968', 'U.S. Bank, N.A.', '21-Nov-08', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Security Pacific Bank', 'Los Angeles', 'CA', '23595', 'Pacific Western Bank', '7-Nov-08', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['First Heritage Bank, NA', 'Newport Beach', 'CA', '57961', 'Mutual of Omaha Bank', '25-Jul-08', '12-Sep-16']\n",
"<class 'list'>\n",
"7\n",
"['IndyMac Bank', 'Pasadena', 'CA', '29730', 'OneWest Bank, FSB', '11-Jul-08', '1-Feb-19']\n",
"<class 'list'>\n",
"7\n",
"['Southern Pacific Bank', 'Torrance', 'CA', '27094', 'Beal Bank', '7-Feb-03', '20-Oct-08']\n",
"<class 'list'>\n",
"7\n"
]
}
],
"outputs": [],
"source": [
"# create our output\n",
"output = csv.writer(filtered_file, delimiter=',')\n",
"\n",
"# open our downloaded file\n",
"with open(downloaded_file, 'r') as file:\n",
" \n",
" # use python's csv reader to access the contents\n",
" # and create an object that represents the data\n",
" csv_data = csv.reader(file)\n",
" \n",
" # write our header row to the output csv\n",
" header_row = next(csv_data)\n",
" print(header_row) \n",
" output.writerow(header_row)\n",
" \n",
" # loop through each row of the csv\n",
" for row in csv_data:\n",
"\n",
" # now we're going to use an IF statement\n",
" # to find items where the state field\n",
" # is equal to California\n",
" if row[2] == 'CA':\n",
" \n",
" # write the row to the new csv file\n",
" output.writerow(row) \n",
" \n",
" # and print the row to the terminal\n",
" print(row)\n",
" # Create a counter\n",
" count = 0\n",
"\n",
" # print the data type to the terminal\n",
" print(type(row))\n",
"\n",
" # print the length of the row to the terminal\n",
" print(len(row)) \n",
" \n",
" # otherwise continue on\n",
" else:\n",
" continue\n",
" \n",
"# close the output file\n",
"filtered_file.close()"
" # Read the data\n",
" reader = csv.reader(file)\n",
" \n",
" import pdb\n",
" # loop through each row of the csv\n",
" for row in reader:\n",
" # Save the header row\n",
" if count == 0:\n",
" ca_banks.append(row)\n",
"\n",
" # Save the row if it's a CA bank\n",
" if row[2].strip().upper() == 'CA':\n",
" # Increment the count\n",
" count += 1\n",
" # Save the row to a our list\n",
" ca_banks.append(row)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How do we find out how many banks failed in California?"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"65"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(ca_banks)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And we can create a new CSV with just the CA banks."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"with open('ca_banks.csv', 'w') as output:\n",
" writer = csv.writer(output)\n",
" for row in ca_banks:\n",
" writer.writerow(row)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -312,9 +203,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}
Loading
Loading