Issue
I was try to get the test or data from table using scrapy. But the table doesn't have a class. the part of structur HTML is like this :
<div class="content_e">
<div class="content-ranklist">
<div class="rank-title"><span><h1><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Beijing gourmet restaurant
</font></font></h1></span><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Updated on November 20th</font></font>
</div>
<section class="ranklist-table">
<table>
<tbody>
<tr>
<th class="th-label-0">
<div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Ranking</font></font>
</div>
</th>
</tr>
<tr>
<td class="td-rank">
<div class="td-div-1"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">1</font></font>
</div>
</td>
i was try to solve the problem with a different ways. But, i alwasy getNone
of []
.
What i did is like this :
response.css('div.content-ranklist section.ranklist-table table').extract()
response.css('div.content-ranklist section.ranklist-table table tr td.td-shopName').extract()
response.css('//td[contains(@class, "td-shopName")]/text()').extract()
response.xpath("//table/tbody/tr//td[@class='td-shopName']//a[@class='J_shopName']").extract()
The results always None
and []
this is the results
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
``
i was try to get this class :
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/40x4o.png
Solution
Since it's XHR
so here we go:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Firefox()
url = 'http://www.dianping.com/shoplist/shopRank/pcChannelRankingV2?rankId=83f473b08cba2af53642a889d8802c50'
browser.get(url)
time.sleep(3) # wait 3 seconds for the site to load
html = browser.page_source
soup = BeautifulSoup(html, features='html.parser')
imgs = soup.findAll('a', attrs={'class': 'J_shopName'})
for img in imgs:
print(img.get('href'))
output is:
http://www.dianping.com/shop/68193557
http://www.dianping.com/shop/112393652
http://www.dianping.com/shop/93227192
http://www.dianping.com/shop/132799437
http://www.dianping.com/shop/67917756
http://www.dianping.com/shop/17637181
http://www.dianping.com/shop/102198900
http://www.dianping.com/shop/130316435
http://www.dianping.com/shop/121684828
http://www.dianping.com/shop/130834244
http://www.dianping.com/shop/129948761
http://www.dianping.com/shop/73410505
http://www.dianping.com/shop/129320981
http://www.dianping.com/shop/111876029
http://www.dianping.com/shop/93659299
Answered By - αԋɱҽԃ αмєяιcαη
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.