Python-basic(8) Web Data Handling

Web Data Handling


Make a Request

1
2
3
import urllib.request ## urllib module is used to send request and receive response from a server. It can used to get html / JSON / XML data from an api.

webData = request.urlopen("http://www.google.com") ## It opens a connection to google.com and returns an object of class http.client.HTTPResponse

Read

return the HTML data of the webpage.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

### Get Code
```.getcode()``` returns the status code of the connection establishment.


### HTML Parsing

```python
from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
def error(self, message):
pass

parser = MyHTMLParser()
f = open("check.html")
if f.mode == 'r': # file successfully opened
contents = f.read()
parser.feed(contents)

JSON Parsing

1
2
import json
json_data = json.loads(response)

XML Parsing

1
2
import xml.dom.minidom
doc = minidom.parse("check.xml")