Parse Html File Using Python Without External Module
I am trying to Parse a html file using Python without using any external module. The reason is I am triggering a jenkins job and running into some import issues with lxml and Beau
Solution 1:
For one element you could try to use re
module or even string functions.
data = '''<trclass="test"><tdclass="test"><ahref="no.html">track</a></td><tdclass="duration">0.390s</td><tdclass="zero number">0</td><tdclass="zero number">0</td><tdclass="zero number">0</td><tdclass="passRate">N/A</td></tr><trclass="suite"><tdcolspan="2"class="totalLabel">Total</td><tdclass="passed number">271</td><tdclass="zero number">0</td><tdclass="failed number">3</td><tdclass="passRate suite">98%</td></tr>'''
# re module
import re
print(re.search('suite">(\d+)%', data).group(1))
# string functions
before = 'passRate suite">'
after = '%'
start = data.find(before) + len(before)
stop = data.find(after, start)
print(data[start:stop])
EDIT: to get othere values with re
import re
print('passed:', re.search('passed number">(\d+)', data).group(1))
print('zero:', re.search('zero number">(\d+)', data).group(1))
print('failed:', re.search('zero number">(\d+)', data).group(1))
print('Rate:', re.search('suite">(\d+)', data).group(1))
passed: 271
zero: 0
failed: 0
Rate: 98
Solution 2:
import re
f =open(HTML_FILE)
data = f.read()
before ='<td colspan="2" class="totalLabel">Total</td>'
after ='%<'start= data.find(before) + len(before)
stop = data.find(after, start)
suite_filter = data[start:stop].strip()
RATE_PASS = re.search('suite">[ \n]+(\d+)', suite_filter).group(1)
PASS_COUNT = re.search('passed number">(\d+)', suite_filter).group(1)
SKIPPED_COUNT = re.search('zero number">(\d+)', suite_filter).group(1)
FAIL_COUNT = re.search('failed number">(\d+)', suite_filter).group(1)
TESTS_TOTAL =int(PASS_COUNT) +int(SKIPPED_COUNT) +int(FAIL_COUNT)
print RATE_PASS, PASS_COUNT, SKIPPED_COUNT, TESTS_TOTAL
Here is my solution as per the suggestions from @furas. Any improvements/suggestions are welcomed.
Post a Comment for "Parse Html File Using Python Without External Module"