hackacity2019/accessing_the_open_data.py at master · DataSciencePortugal/hackacity2019 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# -*- coding: utf-8 -*-
"""accessing_the_open_data.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/10HxfmOg4smr8_0jv0iOYZuEZQrPObmA7

# PORTO CITY DATA


All the city data from Porto is indexed at Porto's CKAN platform: https://opendata.porto.hackacity.eu/.

All the data is downloadable from the CKAN platform directly in various formats (CSV, images, etc...). The exception is the IoT data, which is available from a separate API endpoint, but which is indexed by CKAN.

# The CKAN Platform

CKAN is a tool for making open data websites. (Think of a content management system like WordPress - but for data, instead of pages and blog posts). It helps you manage and publish collections of data.

Once your data is published, users can use its faceted search features to browse and find the data they need, and preview it using maps, graphs and tables - whether they are developers, journalists, researchers, NGOs, citizens, or even your own staff.

**Datasets and resources**

For CKAN purposes, data is published in units called “datasets”. A dataset is a parcel of data - for example, it could be the crime statistics for a region, the spending figures for a government department, or temperature readings from various weather stations. When users search for data, the search results they see will be individual datasets.

A dataset contains two things:

* ***Metadata***: Information about the data.
For example, the title and publisher, date, what formats it is available in, what license it is released under, etc.
* ***Resources***: The data itself. CKAN does not mind what format the data is in. A resource can be a CSV or Excel spreadsheet, XML file, PDF document, image file, linked data in RDF format, etc. CKAN can store the resource internally, or store it simply as a link, the resource itself being elsewhere on the web.

Note: On early CKAN versions, datasets were called “packages” and this name has stuck in some places, specially internally and on API calls. Package has exactly the same meaning as “dataset”.

**API documentation**
* CKAN: https://docs.ckan.org/en/2.8/api/

**API endpoints**
* CKAN: https://opendata.porto.hackacity.eu/api/3/

## Available Data Sets & Resources

Bellow you can find the API call which lists all available datasets.
"""

## Porto's CKAN Platform examples
## Show all the datasets available (aka packages)

import pprint
import requests
url = "https://opendata.porto.hackacity.eu/api/3/action/package_list";
r = requests.get(url)
pprint.pprint(r.json())

"""## Resource

You can find all kind of information regarding a dataset. Each dataset can have multiple resources, which can be accessed through the respective URL.
"""

## Porto's CKAN Platform examples
## In this example we look at the parking zones in the city

import pprint
import requests
url = "https://opendata.porto.hackacity.eu/api/3/action/package_show?id=transportes-publicos-stcp";
r = requests.get(url)
j = r.json()
print("Resources: ")
for resource in j['result']['resources']:
  print("{}({}):".format(resource['name'], resource['format']))
  print(resource['url'] + "\n")
print("Full resource information: ")
pprint.pprint(j)

## Example of an API enpoint indexed by CKAN

import requests
url = "https://opendata.porto.hackacity.eu/api/3/action/package_show?id=porto-meteorologia"
r = requests.get(url)
j = r.json()
for resource in j['result']['resources']:
  print("{}({}):".format(resource['name'], resource['format']))
  print(resource['url'] + "\n")

"""# IoT Data

A special resource is the information from the IoT platform, which is accessible through the NGSI API. The historical data for each is available at a seperate endpoint. The data structure for each API access point is available through the NGSI API (see below).

**Types**: Are types of devices or data \\
**Entities**: Each entity is a device of a certain type

**API documentation**
* NGIS: https://fiware-orion.readthedocs.io/en/2.0.0/user/walkthrough_apiv2/#query-entity
* Data format: https://gitlab.com/synchronicity-iot/synchronicity-data-models/
* Historical data: http://history-data.urbanplatform.portodigital.pt/v2/ui/

**API endpoints**
* Live Data:  https://broker.fiware.urbanplatform.portodigital.pt/v2/
* Historical Data: http://history-data.urbanplatform.portodigital.pt/v2/
"""

## List of all IoT resources with the respective data type

import requests
import json

url = "https://broker.fiware.urbanplatform.portodigital.pt/v2/types"
r = requests.get(url)
j = r.json()

for item in j:
  print("{} ({}):".format(item["type"], item["count"]))
  attrs = item['attrs']
  print("{:<30} {:<15}".format('Name','Types'))
  for k, v in attrs.items():
    print("{:<30} {:<15}".format(k, ", ".join(v['types'])))
  print("")

"""## Accessing Live Value
All the live value are available through the FiWare broker, as described in the documentation of the link on top. The example below ilustrates how to access all entities (an entity is a single sensor) of a certain type.
"""

## You can access live values through the fiware broker

import requests
import pprint
url = "https://broker.fiware.urbanplatform.portodigital.pt/v2/entities?type=AirQualityObserved"
r = requests.get(url)
j = r.json()
pprint.pprint(j)

"""### Georeferenced Queries
You can also do Georeferenced queries (check documentation here: http://telefonicaid.github.io/fiware-orion/api/v2/stable/)
"""

## We are going to find all the points of interests within 150 meters radius of Alfandega.

import requests
import pprint
url = "https://broker.fiware.urbanplatform.portodigital.pt/v2/entities?type=PointOfInterest&georel=near;maxDistance:150&geometry=point&coords=41.143347472409914,-8.621363679260412"
r = requests.get(url)
j = r.json()
#pprint.pprint(j)
for poi in j:
  print(poi['name']['value'])

  if (poi['description']['value'] is not None):
    print(poi['description']['value'])

  if (poi['description']['value'] is not None):
    print("Multimedia:")
    for mm in poi['multimedia']['value']:
      print(mm['url'])
  print("")

"""## Accessing historical values
The historical values are accessible through the documentation described above. Because the query is quite heavy, you can only access one entity at once. \\
In the example below we took the id of the first entity of the query above. \\
Please note: The API automatically limits the number of records, and you will need to query more using the starting and ending date (see the full documentation on the link above).
"""

## Example of IoT Resource - Air Quality

import requests
import pprint
# First let's the ID's from the FiWare
url = "https://broker.fiware.urbanplatform.portodigital.pt/v2/entities?type=AirQualityObserved"
r = requests.get(url)
j = r.json()
print("Available sensores:")
for sensor in j:
  print(sensor['id'])
print("")

# Get the historical data of one of the sensors
url = "http://history-data.urbanplatform.portodigital.pt/v2/entities/urn:ngsi-ld:AirQualityObserved:porto:environment:ubiwhere:5adf39366f555a4514e7ea54?limit=20"
r = requests.get(url)
j = r.json()
pprint.pprint(j)

"""# Georeferenced Data

The final resource are the georeferenced data, which is the API of the city hall that has geographical data. Each dataset has a query builder, which you can experiment with. It is accessible by the URL and on the bottom of the page you can click “query”. All the queries must have at least the “where” parameter set, which works just like a postgresql query.

Example: [https://servsig.cm-porto.pt/arcgis/rest/services/OpenData_APD/OpenData_APD/MapServer/13](https://servsig.cm-porto.pt/arcgis/rest/services/OpenData_APD/OpenData_APD/MapServer/13)
"""

import json
import requests
import pprint


req_params = {
    'f': 'json',
    'where': "1=1",                        # 'where' clause is mandatory it takes a postgres-like query
    #'where': "n_o > 10",                  # example where the number of reports event was over 10
    #'where': "freguesia = 'Bonfim'",      # example for all the reports in 'Bonfim'
    #'where': "ano > 2000",                # Caveat: For example in this dataset, this will not work as 'ano' is defined as "esriFieldTypeString"
    'returnGeometry': 'true',
    'outFields': '*',                      # the fields that you want returned
    'orderByFields': 'objectid ASC',
    #'resultOffset': '4000',
    'resultRecordCount': '10',             #for the purpose of the demonstration we are limiting to 10 results
    'outSR': '4326',
    #'token': str(TOKEN)
}

url = 'https://servsig.cm-porto.pt/arcgis/rest/services/OpenData_APD/OpenData_APD/MapServer/13/query'
r = requests.get(url, params = req_params)
data = r.json()

for myItem in data['features']:
  myItemAttributes = myItem['attributes']
  pprint.pprint(myItemAttributes)