Chicago Indicators¶
Objective: Analyze real-world datasets from the city of Chicago using SQLite statements.
Import libraries¶
In [1]:
import pandas as pd
import sqlite3
import folium
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
Load the dataset¶
In [2]:
con = sqlite3.connect(":memory:")
cur = con.cursor()
In [3]:
census_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv"
schools_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv"
crime_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCrimeData.csv"
df_census = pd.read_csv(census_url)
df_schools = pd.read_csv(schools_url)
df_crime = pd.read_csv(crime_url)
df_census.to_sql("CENSUS_DATA", con, if_exists='replace', index=False, method="multi")
df_schools.to_sql("CHICAGO_PUBLIC_SCHOOLS", con, if_exists='replace', index=False, method="multi")
df_crime.to_sql("CHICAGO_CRIME_DATA", con, if_exists='replace', index=False, method="multi");
In [4]:
df_census.head()
Out[4]:
| COMMUNITY_AREA_NUMBER | COMMUNITY_AREA_NAME | PERCENT_OF_HOUSING_CROWDED | PERCENT_HOUSEHOLDS_BELOW_POVERTY | PERCENT_AGED_16__UNEMPLOYED | PERCENT_AGED_25__WITHOUT_HIGH_SCHOOL_DIPLOMA | PERCENT_AGED_UNDER_18_OR_OVER_64 | PER_CAPITA_INCOME | HARDSHIP_INDEX | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | Rogers Park | 7.7 | 23.6 | 8.7 | 18.2 | 27.5 | 23939 | 39.0 |
| 1 | 2.0 | West Ridge | 7.8 | 17.2 | 8.8 | 20.8 | 38.5 | 23040 | 46.0 |
| 2 | 3.0 | Uptown | 3.8 | 24.0 | 8.9 | 11.8 | 22.2 | 35787 | 20.0 |
| 3 | 4.0 | Lincoln Square | 3.4 | 10.9 | 8.2 | 13.4 | 25.5 | 37524 | 17.0 |
| 4 | 5.0 | North Center | 0.3 | 7.5 | 5.2 | 4.5 | 26.2 | 57123 | 6.0 |
In [5]:
df_schools.head()
Out[5]:
| School_ID | NAME_OF_SCHOOL | Elementary, Middle, or High School | Street_Address | City | State | ZIP_Code | Phone_Number | Link | Network_Manager | ... | Freshman_on_Track_Rate__ | X_COORDINATE | Y_COORDINATE | Latitude | Longitude | COMMUNITY_AREA_NUMBER | COMMUNITY_AREA_NAME | Ward | Police_District | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 610038 | Abraham Lincoln Elementary School | ES | 615 W Kemper Pl | Chicago | IL | 60614 | (773) 534-5720 | http://schoolreports.cps.edu/SchoolProgressRep... | Fullerton Elementary Network | ... | NDA | 1171699.458 | 1915829.428 | 41.924497 | -87.644522 | 7 | LINCOLN PARK | 43 | 18 | (41.92449696, -87.64452163) |
| 1 | 610281 | Adam Clayton Powell Paideia Community Academy ... | ES | 7511 S South Shore Dr | Chicago | IL | 60649 | (773) 535-6650 | http://schoolreports.cps.edu/SchoolProgressRep... | Skyway Elementary Network | ... | NDA | 1196129.985 | 1856209.466 | 41.760324 | -87.556736 | 43 | SOUTH SHORE | 7 | 4 | (41.76032435, -87.55673627) |
| 2 | 610185 | Adlai E Stevenson Elementary School | ES | 8010 S Kostner Ave | Chicago | IL | 60652 | (773) 535-2280 | http://schoolreports.cps.edu/SchoolProgressRep... | Midway Elementary Network | ... | NDA | 1148427.165 | 1851012.215 | 41.747111 | -87.731702 | 70 | ASHBURN | 13 | 8 | (41.74711093, -87.73170248) |
| 3 | 609993 | Agustin Lara Elementary Academy | ES | 4619 S Wolcott Ave | Chicago | IL | 60609 | (773) 535-4389 | http://schoolreports.cps.edu/SchoolProgressRep... | Pershing Elementary Network | ... | NDA | 1164504.290 | 1873959.199 | 41.809757 | -87.672145 | 61 | NEW CITY | 20 | 9 | (41.8097569, -87.6721446) |
| 4 | 610513 | Air Force Academy High School | HS | 3630 S Wells St | Chicago | IL | 60609 | (773) 535-1590 | http://schoolreports.cps.edu/SchoolProgressRep... | Southwest Side High School Network | ... | 91.8 | 1175177.622 | 1880745.126 | 41.828146 | -87.632794 | 34 | ARMOUR SQUARE | 11 | 9 | (41.82814609, -87.63279369) |
5 rows × 78 columns
In [6]:
df_crime.head()
Out[6]:
| ID | CASE_NUMBER | DATE | BLOCK | IUCR | PRIMARY_TYPE | DESCRIPTION | LOCATION_DESCRIPTION | ARREST | DOMESTIC | ... | DISTRICT | WARD | COMMUNITY_AREA_NUMBER | FBICODE | X_COORDINATE | Y_COORDINATE | YEAR | LATITUDE | LONGITUDE | LOCATION | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3512276 | HK587712 | 2004-08-28 | 047XX S KEDZIE AVE | 890 | THEFT | FROM BUILDING | SMALL RETAIL STORE | False | False | ... | 9 | 14.0 | 58.0 | 6 | 1155838.0 | 1873050.0 | 2004 | 41.807440 | -87.703956 | (41.8074405, -87.703955849) |
| 1 | 3406613 | HK456306 | 2004-06-26 | 009XX N CENTRAL PARK AVE | 820 | THEFT | $500 AND UNDER | OTHER | False | False | ... | 11 | 27.0 | 23.0 | 6 | 1152206.0 | 1906127.0 | 2004 | 41.898280 | -87.716406 | (41.898279962, -87.716405505) |
| 2 | 8002131 | HT233595 | 2011-04-04 | 043XX S WABASH AVE | 820 | THEFT | $500 AND UNDER | NURSING HOME/RETIREMENT HOME | False | False | ... | 2 | 3.0 | 38.0 | 6 | 1177436.0 | 1876313.0 | 2011 | 41.815933 | -87.624642 | (41.815933131, -87.624642127) |
| 3 | 7903289 | HT133522 | 2010-12-30 | 083XX S KINGSTON AVE | 840 | THEFT | FINANCIAL ID THEFT: OVER $300 | RESIDENCE | False | False | ... | 4 | 7.0 | 46.0 | 6 | 1194622.0 | 1850125.0 | 2010 | 41.743665 | -87.562463 | (41.743665322, -87.562462756) |
| 4 | 10402076 | HZ138551 | 2016-02-02 | 033XX W 66TH ST | 820 | THEFT | $500 AND UNDER | ALLEY | False | False | ... | 8 | 15.0 | 66.0 | 6 | 1155240.0 | 1860661.0 | 2016 | 41.773455 | -87.706480 | (41.773455295, -87.706480471) |
5 rows × 21 columns
Understand the dataset¶
There are three datasets available on the Chicago Data Portal:
1. Socioeconomic Indicators in Chicago¶
This dataset contains a selection of six socioeconomic indicators of public health importance and a "hardship index" for each community area in Chicago, covering the years 2008 to 2012.
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at this link.
In [7]:
df_census.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 78 entries, 0 to 77 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 COMMUNITY_AREA_NUMBER 77 non-null float64 1 COMMUNITY_AREA_NAME 78 non-null object 2 PERCENT_OF_HOUSING_CROWDED 78 non-null float64 3 PERCENT_HOUSEHOLDS_BELOW_POVERTY 78 non-null float64 4 PERCENT_AGED_16__UNEMPLOYED 78 non-null float64 5 PERCENT_AGED_25__WITHOUT_HIGH_SCHOOL_DIPLOMA 78 non-null float64 6 PERCENT_AGED_UNDER_18_OR_OVER_64 78 non-null float64 7 PER_CAPITA_INCOME 78 non-null int64 8 HARDSHIP_INDEX 77 non-null float64 dtypes: float64(7), int64(1), object(1) memory usage: 5.6+ KB
In [8]:
df_schools.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 566 entries, 0 to 565 Data columns (total 78 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 School_ID 566 non-null int64 1 NAME_OF_SCHOOL 566 non-null object 2 Elementary, Middle, or High School 566 non-null object 3 Street_Address 566 non-null object 4 City 566 non-null object 5 State 566 non-null object 6 ZIP_Code 566 non-null int64 7 Phone_Number 566 non-null object 8 Link 565 non-null object 9 Network_Manager 566 non-null object 10 Collaborative_Name 566 non-null object 11 Adequate_Yearly_Progress_Made_ 566 non-null object 12 Track_Schedule 566 non-null object 13 CPS_Performance_Policy_Status 566 non-null object 14 CPS_Performance_Policy_Level 566 non-null object 15 HEALTHY_SCHOOL_CERTIFIED 566 non-null object 16 Safety_Icon 566 non-null object 17 SAFETY_SCORE 513 non-null float64 18 Family_Involvement_Icon 566 non-null object 19 Family_Involvement_Score 566 non-null object 20 Environment_Icon 566 non-null object 21 Environment_Score 513 non-null float64 22 Instruction_Icon 566 non-null object 23 Instruction_Score 513 non-null float64 24 Leaders_Icon 566 non-null object 25 Leaders_Score 566 non-null object 26 Teachers_Icon 566 non-null object 27 Teachers_Score 566 non-null object 28 Parent_Engagement_Icon 566 non-null object 29 Parent_Engagement_Score 566 non-null object 30 Parent_Environment_Icon 566 non-null object 31 Parent_Environment_Score 566 non-null object 32 AVERAGE_STUDENT_ATTENDANCE 565 non-null object 33 Rate_of_Misconducts__per_100_students_ 566 non-null float64 34 Average_Teacher_Attendance 566 non-null object 35 Individualized_Education_Program_Compliance_Rate 566 non-null object 36 Pk_2_Literacy__ 566 non-null object 37 Pk_2_Math__ 566 non-null object 38 Gr3_5_Grade_Level_Math__ 566 non-null object 39 Gr3_5_Grade_Level_Read__ 566 non-null object 40 Gr3_5_Keep_Pace_Read__ 566 non-null object 41 Gr3_5_Keep_Pace_Math__ 566 non-null object 42 Gr6_8_Grade_Level_Math__ 566 non-null object 43 Gr6_8_Grade_Level_Read__ 566 non-null object 44 Gr6_8_Keep_Pace_Math_ 566 non-null object 45 Gr6_8_Keep_Pace_Read__ 566 non-null object 46 Gr_8_Explore_Math__ 566 non-null object 47 Gr_8_Explore_Read__ 566 non-null object 48 ISAT_Exceeding_Math__ 476 non-null float64 49 ISAT_Exceeding_Reading__ 476 non-null float64 50 ISAT_Value_Add_Math 468 non-null float64 51 ISAT_Value_Add_Read 468 non-null float64 52 ISAT_Value_Add_Color_Math 566 non-null object 53 ISAT_Value_Add_Color_Read 566 non-null object 54 Students_Taking__Algebra__ 566 non-null object 55 Students_Passing__Algebra__ 566 non-null object 56 9th Grade EXPLORE (2009) 566 non-null object 57 9th Grade EXPLORE (2010) 566 non-null object 58 10th Grade PLAN (2009) 566 non-null object 59 10th Grade PLAN (2010) 566 non-null object 60 Net_Change_EXPLORE_and_PLAN 566 non-null object 61 11th Grade Average ACT (2011) 566 non-null object 62 Net_Change_PLAN_and_ACT 566 non-null object 63 College_Eligibility__ 566 non-null object 64 Graduation_Rate__ 566 non-null object 65 College_Enrollment_Rate__ 566 non-null object 66 COLLEGE_ENROLLMENT 566 non-null int64 67 General_Services_Route 566 non-null int64 68 Freshman_on_Track_Rate__ 566 non-null object 69 X_COORDINATE 566 non-null float64 70 Y_COORDINATE 566 non-null float64 71 Latitude 566 non-null float64 72 Longitude 566 non-null float64 73 COMMUNITY_AREA_NUMBER 566 non-null int64 74 COMMUNITY_AREA_NAME 566 non-null object 75 Ward 566 non-null int64 76 Police_District 566 non-null int64 77 Location 566 non-null object dtypes: float64(12), int64(7), object(59) memory usage: 345.0+ KB
In [9]:
df_crime.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 533 entries, 0 to 532 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 533 non-null int64 1 CASE_NUMBER 533 non-null object 2 DATE 533 non-null object 3 BLOCK 533 non-null object 4 IUCR 533 non-null object 5 PRIMARY_TYPE 533 non-null object 6 DESCRIPTION 533 non-null object 7 LOCATION_DESCRIPTION 533 non-null object 8 ARREST 533 non-null bool 9 DOMESTIC 533 non-null bool 10 BEAT 533 non-null int64 11 DISTRICT 533 non-null int64 12 WARD 490 non-null float64 13 COMMUNITY_AREA_NUMBER 490 non-null float64 14 FBICODE 533 non-null object 15 X_COORDINATE 529 non-null float64 16 Y_COORDINATE 529 non-null float64 17 YEAR 533 non-null int64 18 LATITUDE 529 non-null float64 19 LONGITUDE 529 non-null float64 20 LOCATION 529 non-null object dtypes: bool(2), float64(6), int64(4), object(9) memory usage: 80.3+ KB
Visualize the location of public schools and where crimes occurred¶
The red markers indicate where crimes occurred, while the blue markers indicate the location of public schools.
In [10]:
map = folium.Map(location=[41.88, -87.62], zoom_start=12)
feature_group = folium.FeatureGroup()
for point in list(df_schools[["Latitude", "Longitude"]].dropna().to_numpy()):
feature_group.add_child(folium.Marker(point, popup='School', icon=folium.Icon(color='blue')))
for point in list(df_crime[["LATITUDE", "LONGITUDE"]].dropna().to_numpy()):
feature_group.add_child(folium.Marker(point, popup='Crime', icon=folium.Icon(color='red')))
map.add_child(feature_group)
map
Out[10]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Analysis¶
What percentage of community areas in Chicago have a hardship index greater than 50?¶
In [11]:
query = """
SELECT (COUNT(CASE WHEN HARDSHIP_INDEX > 50 THEN 1 END) * 100 / COUNT(*)) || '%' AS Percentage
FROM CENSUS_DATA
"""
pd.read_sql_query(query, con)
Out[11]:
| Percentage | |
|---|---|
| 0 | 48% |
What are the community areas with the highest and lowest hardship index?¶
In [12]:
query = """
SELECT COMMUNITY_AREA_NAME, HARDSHIP_INDEX
FROM CENSUS_DATA
WHERE HARDSHIP_INDEX IN ((SELECT MAX(HARDSHIP_INDEX) FROM CENSUS_DATA),
(SELECT MIN(HARDSHIP_INDEX) FROM CENSUS_DATA))
"""
pd.read_sql_query(query, con)
Out[12]:
| COMMUNITY_AREA_NAME | HARDSHIP_INDEX | |
|---|---|---|
| 0 | Near North Side | 1.0 |
| 1 | Riverdale | 98.0 |
Which Chicago community areas have per-capita incomes greater than $50,000?¶
In [13]:
query = """
SELECT COMMUNITY_AREA_NAME, PER_CAPITA_INCOME
FROM CENSUS_DATA
WHERE PER_CAPITA_INCOME > 50000
ORDER BY PER_CAPITA_INCOME DESC
"""
pd.read_sql_query(query, con)
Out[13]:
| COMMUNITY_AREA_NAME | PER_CAPITA_INCOME | |
|---|---|---|
| 0 | Near North Side | 88669 |
| 1 | Lincoln Park | 71551 |
| 2 | Loop | 65526 |
| 3 | Lake View | 60058 |
| 4 | Near South Side | 59077 |
| 5 | North Center | 57123 |
What is the relationship between the PER_CAPITA_INCOME and HARDSHIP_INDEX features?¶
In [ ]:
query = """
SELECT PER_CAPITA_INCOME, HARDSHIP_INDEX
FROM CENSUS_DATA
"""
plt.figure()
sns.scatterplot(x="PER_CAPITA_INCOME", y="HARDSHIP_INDEX", data=pd.read_sql_query(query, con))
plt.show()
How many schools of each type are in the dataset?¶
In [15]:
query = """
SELECT "Elementary, Middle, or High School", COUNT(*)
FROM CHICAGO_PUBLIC_SCHOOLS
GROUP BY "Elementary, Middle, or High School"
"""
pd.read_sql_query(query, con)
Out[15]:
| Elementary, Middle, or High School | COUNT(*) | |
|---|---|---|
| 0 | ES | 462 |
| 1 | HS | 93 |
| 2 | MS | 11 |
Which schools have the highest and lowest safety scores?¶
In [16]:
query = """
SELECT NAME_OF_SCHOOL, SAFETY_SCORE
FROM CHICAGO_PUBLIC_SCHOOLS
WHERE SAFETY_SCORE IN ((SELECT MAX(SAFETY_SCORE) FROM CHICAGO_PUBLIC_SCHOOLS),
(SELECT MIN(SAFETY_SCORE) FROM CHICAGO_PUBLIC_SCHOOLS))
ORDER BY SAFETY_SCORE ASC
"""
pd.read_sql_query(query, con)
Out[16]:
| NAME_OF_SCHOOL | SAFETY_SCORE | |
|---|---|---|
| 0 | Edmond Burke Elementary School | 1.0 |
| 1 | Abraham Lincoln Elementary School | 99.0 |
| 2 | Alexander Graham Bell Elementary School | 99.0 |
| 3 | Annie Keller Elementary Gifted Magnet School | 99.0 |
| 4 | Augustus H Burley Elementary School | 99.0 |
| 5 | Edgar Allan Poe Elementary Classical School | 99.0 |
| 6 | Edgebrook Elementary School | 99.0 |
| 7 | Ellen Mitchell Elementary School | 99.0 |
| 8 | James E McDade Elementary Classical School | 99.0 |
| 9 | James G Blaine Elementary School | 99.0 |
| 10 | LaSalle Elementary Language Academy | 99.0 |
| 11 | Mary E Courtenay Elementary Language Arts Center | 99.0 |
| 12 | Northside College Preparatory High School | 99.0 |
| 13 | Northside Learning Center High School | 99.0 |
| 14 | Norwood Park Elementary School | 99.0 |
| 15 | Oriole Park Elementary School | 99.0 |
| 16 | Sauganash Elementary School | 99.0 |
| 17 | Stephen Decatur Classical Elementary School | 99.0 |
| 18 | Talman Elementary School | 99.0 |
| 19 | Wildwood Elementary School | 99.0 |
What are the five schools with the lowest safety scores?¶
In [17]:
query = """
SELECT NAME_OF_SCHOOL, SAFETY_SCORE
FROM CHICAGO_PUBLIC_SCHOOLS
WHERE SAFETY_SCORE IS NOT NULL
ORDER BY SAFETY_SCORE ASC
LIMIT 5
"""
pd.read_sql_query(query, con)
Out[17]:
| NAME_OF_SCHOOL | SAFETY_SCORE | |
|---|---|---|
| 0 | Edmond Burke Elementary School | 1.0 |
| 1 | Luke O'Toole Elementary School | 5.0 |
| 2 | George W Tilton Elementary School | 6.0 |
| 3 | Foster Park Elementary School | 11.0 |
| 4 | Emil G Hirsch Metropolitan High School | 13.0 |
What are the five schools with the highest average student attendance?¶
In [18]:
query = """
SELECT NAME_OF_SCHOOL, AVERAGE_STUDENT_ATTENDANCE
FROM CHICAGO_PUBLIC_SCHOOLS
ORDER BY AVERAGE_STUDENT_ATTENDANCE DESC
LIMIT 5
"""
pd.read_sql_query(query, con)
Out[18]:
| NAME_OF_SCHOOL | AVERAGE_STUDENT_ATTENDANCE | |
|---|---|---|
| 0 | John Charles Haines Elementary School | 98.40% |
| 1 | James Ward Elementary School | 97.80% |
| 2 | Edgar Allan Poe Elementary Classical School | 97.60% |
| 3 | Orozco Fine Arts & Sciences Elementary School | 97.60% |
| 4 | Rachel Carson Elementary School | 97.60% |
Which schools have average student attendance lower than 70%?¶
In [19]:
query = """
SELECT NAME_OF_SCHOOL, AVERAGE_STUDENT_ATTENDANCE
FROM CHICAGO_PUBLIC_SCHOOLS
WHERE AVERAGE_STUDENT_ATTENDANCE < 70
ORDER BY AVERAGE_STUDENT_ATTENDANCE
"""
pd.read_sql_query(query, con)
Out[19]:
| NAME_OF_SCHOOL | AVERAGE_STUDENT_ATTENDANCE | |
|---|---|---|
| 0 | Richard T Crane Technical Preparatory High School | 57.90% |
| 1 | Barbara Vick Early Childhood & Family Center | 60.90% |
| 2 | Dyett High School | 62.50% |
| 3 | Wendell Phillips Academy High School | 63.00% |
| 4 | Orr Academy High School | 66.30% |
| 5 | Manley Career Academy High School | 66.80% |
| 6 | Chicago Vocational Career Academy High School | 68.80% |
| 7 | Roberto Clemente Community Academy High School | 69.60% |
What is the hardship index for the five community areas with the lowest college enrollment?¶
In [20]:
query = """
SELECT CENSUS_DATA.COMMUNITY_AREA_NAME, HARDSHIP_INDEX, SUM(COLLEGE_ENROLLMENT)
FROM CENSUS_DATA, CHICAGO_PUBLIC_SCHOOLS
WHERE CENSUS_DATA.COMMUNITY_AREA_NUMBER = CHICAGO_PUBLIC_SCHOOLS.COMMUNITY_AREA_NUMBER
GROUP BY CENSUS_DATA.COMMUNITY_AREA_NAME
ORDER BY SUM(COLLEGE_ENROLLMENT) ASC
LIMIT 5
"""
pd.read_sql_query(query, con)
Out[20]:
| COMMUNITY_AREA_NAME | HARDSHIP_INDEX | SUM(COLLEGE_ENROLLMENT) | |
|---|---|---|---|
| 0 | Oakland | 78.0 | 140 |
| 1 | Fuller Park | 97.0 | 531 |
| 2 | Burnside | 79.0 | 549 |
| 3 | O'Hare | 24.0 | 786 |
| 4 | Loop | 3.0 | 871 |
What is the hardship index for the five community areas with the highest total college enrollment?¶
In [21]:
query = """
SELECT CENSUS_DATA.COMMUNITY_AREA_NAME, HARDSHIP_INDEX, SUM(COLLEGE_ENROLLMENT)
FROM CENSUS_DATA, CHICAGO_PUBLIC_SCHOOLS
WHERE CENSUS_DATA.COMMUNITY_AREA_NUMBER = CHICAGO_PUBLIC_SCHOOLS.COMMUNITY_AREA_NUMBER
GROUP BY CENSUS_DATA.COMMUNITY_AREA_NAME
ORDER BY SUM(COLLEGE_ENROLLMENT) DESC
LIMIT 5
"""
pd.read_sql_query(query, con)
Out[21]:
| COMMUNITY_AREA_NAME | HARDSHIP_INDEX | SUM(COLLEGE_ENROLLMENT) | |
|---|---|---|---|
| 0 | South Lawndale | 96.0 | 14793 |
| 1 | Belmont Cragin | 70.0 | 14386 |
| 2 | Austin | 73.0 | 10933 |
| 3 | Gage Park | 93.0 | 9915 |
| 4 | Brighton Park | 84.0 | 9647 |
What are the community areas with per-capita income less than $11,000?¶
In [22]:
query = """
SELECT COMMUNITY_AREA_NAME, PER_CAPITA_INCOME
FROM CENSUS_DATA
WHERE PER_CAPITA_INCOME < 11000
"""
pd.read_sql_query(query, con)
Out[22]:
| COMMUNITY_AREA_NAME | PER_CAPITA_INCOME | |
|---|---|---|
| 0 | West Garfield Park | 10934 |
| 1 | South Lawndale | 10402 |
| 2 | Fuller Park | 10432 |
| 3 | Riverdale | 8201 |
What are the case numbers for crimes involving minors?¶
In [23]:
query = """
SELECT CASE_NUMBER, DESCRIPTION
FROM CHICAGO_CRIME_DATA
WHERE DESCRIPTION LIKE '%MINOR%'
"""
pd.read_sql_query(query, con)
Out[23]:
| CASE_NUMBER | DESCRIPTION | |
|---|---|---|
| 0 | HL266884 | SELL/GIVE/DEL LIQUOR TO MINOR |
| 1 | HK238408 | ILLEGAL CONSUMPTION BY MINOR |
What are the kidnapping crimes involving a child?¶
In [24]:
query = """
SELECT CASE_NUMBER, PRIMARY_TYPE, DESCRIPTION
FROM CHICAGO_CRIME_DATA
WHERE PRIMARY_TYPE == "KIDNAPPING" AND DESCRIPTION LIKE '%CHILD%'
"""
pd.read_sql_query(query, con)
Out[24]:
| CASE_NUMBER | PRIMARY_TYPE | DESCRIPTION | |
|---|---|---|---|
| 0 | HN144152 | KIDNAPPING | CHILD ABDUCTION/STRANGER |
What kind of crimes were recorded in schools?¶
In [25]:
query = """
SELECT PRIMARY_TYPE, LOCATION_DESCRIPTION
FROM CHICAGO_CRIME_DATA
WHERE LOCATION_DESCRIPTION LIKE '%SCHOOL%'
"""
pd.read_sql_query(query, con)
Out[25]:
| PRIMARY_TYPE | LOCATION_DESCRIPTION | |
|---|---|---|
| 0 | BATTERY | SCHOOL, PUBLIC, GROUNDS |
| 1 | BATTERY | SCHOOL, PUBLIC, BUILDING |
| 2 | BATTERY | SCHOOL, PUBLIC, BUILDING |
| 3 | BATTERY | SCHOOL, PUBLIC, BUILDING |
| 4 | BATTERY | SCHOOL, PUBLIC, GROUNDS |
| 5 | CRIMINAL DAMAGE | SCHOOL, PUBLIC, GROUNDS |
| 6 | NARCOTICS | SCHOOL, PUBLIC, GROUNDS |
| 7 | NARCOTICS | SCHOOL, PUBLIC, BUILDING |
| 8 | ASSAULT | SCHOOL, PUBLIC, GROUNDS |
| 9 | CRIMINAL TRESPASS | SCHOOL, PUBLIC, GROUNDS |
| 10 | PUBLIC PEACE VIOLATION | SCHOOL, PRIVATE, BUILDING |
| 11 | PUBLIC PEACE VIOLATION | SCHOOL, PUBLIC, BUILDING |
What is the average safety score for each type of school?¶
In [26]:
query = """
SELECT "Elementary, Middle, or High School", AVG(SAFETY_SCORE)
FROM CHICAGO_PUBLIC_SCHOOLS
GROUP BY "Elementary, Middle, or High School"
"""
pd.read_sql_query(query, con)
Out[26]:
| Elementary, Middle, or High School | AVG(SAFETY_SCORE) | |
|---|---|---|
| 0 | ES | 49.520384 |
| 1 | HS | 49.623529 |
| 2 | MS | 48.000000 |
What are the five community areas with the highest percentage of households below the poverty line?¶
In [27]:
query = """
SELECT COMMUNITY_AREA_NAME, PERCENT_HOUSEHOLDS_BELOW_POVERTY
FROM CENSUS_DATA
ORDER BY PERCENT_HOUSEHOLDS_BELOW_POVERTY DESC
LIMIT 5
"""
pd.read_sql_query(query, con)
Out[27]:
| COMMUNITY_AREA_NAME | PERCENT_HOUSEHOLDS_BELOW_POVERTY | |
|---|---|---|
| 0 | Riverdale | 56.5 |
| 1 | Fuller Park | 51.2 |
| 2 | Englewood | 46.6 |
| 3 | North Lawndale | 43.1 |
| 4 | East Garfield Park | 42.4 |
Which community areas are the most crime prone?¶
In [28]:
query = """
SELECT CENSUS_DATA.COMMUNITY_AREA_NAME, COUNT(CHICAGO_CRIME_DATA.COMMUNITY_AREA_NUMBER) AS FREQUENCY
FROM CHICAGO_CRIME_DATA, CENSUS_DATA
WHERE CHICAGO_CRIME_DATA.COMMUNITY_AREA_NUMBER == CENSUS_DATA.COMMUNITY_AREA_NUMBER
GROUP BY CHICAGO_CRIME_DATA.COMMUNITY_AREA_NUMBER
ORDER BY FREQUENCY DESC
LIMIT 5
"""
pd.read_sql_query(query, con)
Out[28]:
| COMMUNITY_AREA_NAME | FREQUENCY | |
|---|---|---|
| 0 | Austin | 43 |
| 1 | Humboldt park | 22 |
| 2 | Englewood | 21 |
| 3 | North Lawndale | 16 |
| 4 | Near West Side | 16 |
Which community areas have the lowest number of crimes?¶
In [29]:
query = """
SELECT CENSUS_DATA.COMMUNITY_AREA_NAME, COUNT(CHICAGO_CRIME_DATA.COMMUNITY_AREA_NUMBER) AS FREQUENCY
FROM CHICAGO_CRIME_DATA, CENSUS_DATA
WHERE CHICAGO_CRIME_DATA.COMMUNITY_AREA_NUMBER == CENSUS_DATA.COMMUNITY_AREA_NUMBER
GROUP BY CHICAGO_CRIME_DATA.COMMUNITY_AREA_NUMBER
ORDER BY FREQUENCY ASC
LIMIT 5
"""
pd.read_sql_query(query, con)
Out[29]:
| COMMUNITY_AREA_NAME | FREQUENCY | |
|---|---|---|
| 0 | Forest Glen | 1 |
| 1 | Near South Side | 1 |
| 2 | Burnside | 1 |
| 3 | Hegewisch | 1 |
| 4 | Bridgeport | 1 |
Run in Google Colab