Lesson 8 – The search ‘for’ w’if’i

As I am taking my usual Broadway stroll this morning, I noticed some people immersed in their smartphone, texting or browsing while walking. Guilty as charged. It reminded me of the blog post a few years ago about my memory down the phone lane.

I started paying attention to my surroundings and noticed this kiosk, a free wifi hotspot provided by the City of New York.

So, I used their wifi to look up LinkNYC. Thier description tells me that “LinkNYC is a first-of-its-kind communications network that will replace over 7,500 pay phones across the five boroughs with new structures called Links. Each Link provides superfast, free public Wi-Fi, phone calls, device charging and a tablet for access to city services, maps and directions.”

They have been around since January 2016. I want to know how many such kiosks are there in the City and each borough. They have a map on their website for locating the kiosks, but it is hard for me to count points on a map. My City at service again, I found the entire wifi hotspot locations data on NYC Open Data.

Let us learn some tricks in R while trying to answer this question.

Usual chores.

Step 1: Get the data
I downloaded the hotspot data file from here.

Step 2: Create a new folder on your computer
Let us call this folder “lesson8”. The downloaded data file “Free_WiFi_Hotspots_09042005.csv” is saved in this folder.

Step 3: Create a new code in R
Create a new code for this lesson – “lesson8_code.R”.

Step 4: Choose your working directory
In this lesson, we have a folder named “lesson8”. So we should tell R that “lesson8” is the folder where the data files are stored. You can do this using the “setwd” command. If you followed the above steps and if you type list.files() in the console, you would see “lesson8_code.R” and “Free_WiFi_Hotspots_09042005.csv” on the screen as the listed files in your folder.

Step 5: Read the data into R workspace
Since we have a comma separated values file (.csv), we can use the “read.csv” command to read the data file into R workspace. Type the following line in your code and execute it. Why am I giving header=TRUE in the command?

# Read the data file #
wifi_data = read.csv("Free_WiFi_Hotspots_09042005.csv",header=T)

Step 6: Use the data to answer the questions
It will be helpful to learn some new concepts in R coding before we address the actual kiosk problem. The for loop, storing values in matrices, and if else statements.

Loops: Do you remember your fun trips to batting cages? Just like the pitching machine is set up to pitch the baseball, again and again, you can instruct R to run commands again and again. For example, if you want to print NYC ten times, instead of typing it ten times, you can use the for loop.

# print NYC 10 times
for(i in 1:10) 
{
 print("NYC")
}

You are instructing R to print NYC ten times by using the command for (i in 1:10). The lines within the {} will be repeated ten times. Select the lines and hit the “Run” button to execute. You will see this in the console screen.

[1] "NYC"
 [1] "NYC"
 [1] "NYC"
 [1] "NYC"
 [1] "NYC"
 [1] "NYC"
 [1] "NYC"
 [1] "NYC"
 [1] "NYC"
 [1] "NYC"

Matrices: I am sure you have been to a college or public library to check out books. Can you remember the bookcases that store the books? Just like a bookshelf with many horizontal and vertical dividers, in R you can create matrices with many rows and columns and use them to store values (numbers or text). Type the following lines in your code.

# create an empty matrix of NA in 10 rows and 1 column
x = matrix(NA,nrow=10,ncol=1)
print(x)

You can create an empty matrix called x with ten rows and 1 column. The NA is a space. We can fill this space with numbers later.

Now imagine we combine the above instructions, we want to store NYC in a matrix or shelf; like printing NYC ten times on a paper and arranging them in each row of a bookshelf. Type the following lines and see the results for yourself. The empty matrix x will be filled with NYC using the for loop.

# store NYC in a matrix of 10 rows and 1 columns
x <- matrix(NA,nrow=10,ncol=1)
for (i in 1:10)
{
 x[i,1] = "NYC"
}
print(x)

If Else Statement: Do you remember cleaning your room on a Sunday? Recall those terrible times for a minute and think about what you did. If you find a book lying around, you would have put it in the bookshelf. If you find a toy, you may have put it in your toy bag. If you find a shirt or a top, it goes into the closet. The if else statement works exactly like this. You are instructing R to perform an action if some condition is true, else some other work.

if (condition) {statement} else {statement}

For example, let us say we want to print a number 1 if the 10th row of the matrix x is “NYC”; else, we want to print a number 0. We can do this using the following lines.

# if else statement #
if (x[10,1] == "NYC") {print(1)} else {print (0)}

The conditon is x[10,1] == “NYC”, the output is printing 1 or 0.

Okay, let us get back to the wifi hotspot business. If you look at the Free_WiFi_Hotspots_09042005.csv file, you will notice that the 4th column is an indicator for which borough the kiosk is in, and the 5th column is the indicator for the service provider. LinkNYC is not the only free wifi provider in the city. Time Warner, AT and T, Cable Vision are some of the other providers.

So here is a strategy or a set of instructions we can give R to see how many LinkNYC kiosks are there is Manhattan. We first check for the total number of kiosks. There are 2061 kiosks in the City (number of rows of the table). Wow..

We look through each row (kiosk) and check if the provider is LinkNYC, and the borough is Manhattan. If it is the case, we assign a 1; else we assign a 0. We count the total number of ones. Here is the code.

## finding NYC LINK kiosks in Manhattan ##
n = nrow(wifi_data)

linknyc_hotspots = matrix(NA,nrow=n,ncol=1)

for (i in 1:n)
{
 if((wifi_data[i,4]=="MN") & (wifi_data[i,5]=="BETA LinkNYC - Citybridge")) 
{linknyc_hotspots[i,1] = 1} else {linknyc_hotspots[i,1] = 0}
}

sum(linknyc_hotspots)

Notice what I did.

I first created an empty matrix with 2061 rows and 1 column.

Using the for loop, I am looking through all the rows.

For each row, I have a condition to check – the provider is LinkNYC, and the borough is Manhattan. Notice how I check for the conditions using the (condition 1) & (condition 2).
If these conditions are true, I will fill the empty matrix with 1, else, I will fill it with 0.

In the end, I will count the total number of ones.

There are 521 LinkNYC kiosks in Manhattan. Still Wow..

Can you now tell me how many LinkNYC kiosks are there in Brooklyn and how many Time Warner Cable kiosks are there in the Bronx?

While you tell me the answer, I want to sing Frank Sinatra for the rest of the day. I love my City. I can take long walks; I can be car free; I can get free wifi access; I can even get free data to analyze.

Wait, maybe I should not use “free” in my expression of the love for the City.

April 18th is fast approaching.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

3 thoughts on “Lesson 8 – The search ‘for’ w’if’i”

  1. Just in case anyone gets stuck, it seems the wifi data gets updated. At time of writing (oct2017), the location is in column 3 (so use ~if wifi_data = [i,3]==”MN”~) instead of what is written.

    There are currently 711 hotspots

  2. I know the point is to illustrate the “for loop” functionality, but R is pretty good at handling vectors. The same result can be obtained without looping, with the following code:

    sum((wifi_data[,3]== “MN”) & (wifi_data[,5] ==”LinkNYC – Citybridge”))

  3. I want first to congratulate you on your wonderful blog.
    I’m a medical student who is approaching to DataScience and I’m finding it very helpful and instructive.
    I have a question on the for loop ou built up.

    Isn’t that a better solution to identify how many “LinkNYC – Citybridge” do we have in Manhattan?
    nrow(wifi_data[wifi_data$PROVIDER == “LinkNYC – Citybridge” & wifi_data$BORO == “MN”,])

    From what I’ve learned on R, looping through an array isn’t the most efficient way to analyze your data.
    Please correct me if I’m wrong.

Comments are closed.

error

Enjoy this blog? Please spread the word :)