-
-
Notifications
You must be signed in to change notification settings - Fork 314
Open
Description
How could the content be improved?
The following section introduce how data can be processed using loops
Automating data processing using For Loops
I believe it would also be advantageous to have a similar section in the following
Here we can briefly introduce python generators as well. For example, consider a CSV file where entries are name, age, location. We can parse this data to a dataframe using a generator. Image location is a comma separated string field and we want to read latitude and longitude separately.
| name | age | location |
|---|---|---|
| John | 50 | 123341,123321 |
| Emily | 25 | 321321,123321 |
| Wick | 35 | 123341,654789 |
| Raj | 40 | 987789,123321 |
import csv
import pandas as pd
def transform_lines(csv_path):
reader = csv.reader(open(csv_path))
for line_no, line in enumerate(reader):
if line_no == 0:
yield ["Name", "Age", "Latitude", "Longitude"]
else:
name, age, location = line
lat, lng = location.split(",")
yield [name, int(age), float(lat), float(lng)]
lines = transform_lines("./data.csv")
df = pd.DataFrame(lines)
print(df.head())This is specially useful in large datasets where loading large amount of data in text form is memory consuming.
Metadata
Metadata
Assignees
Labels
No labels