Issue
I have a problem with constructing csv type data file from scraped data. I have managed to scrape the data from the table but when it comes to writing it I can't do that for days. I am using items and trying to write it to pandas data frame. I am using items list.
import scrapy
from wiki.items import WikiItem
import pandas as pd
class Spider(scrapy.Spider):
name = "wiki"
start_urls = ['https://datatables.net/']
def parse(self, response):
items = {'Name':[], 'Position':[], 'Office':[], 'Age':[],
'Start_Date':[],'Salary':[]}
trs = response.xpath('//table[@id="example"]//tr')
name = WikiItem()
pos = WikiItem()
office = WikiItem()
age = WikiItem()
start_data = WikiItem()
salary = WikiItem()
name['name'] = trs.xpath('//td[1]//text()').extract()
pos['position'] = trs.xpath('//td[2]//text()').extract()
office['office'] = trs.xpath('//td[3]//text()').extract()
age['age'] = trs.xpath('//td[4]//text()').extract()
start_data['start_data'] = trs.xpath('//td[5]//text()').extract()
salary['salary'] = trs.xpath('td[6]//text()').extract()
items['Name'].append(name)
items['Position'].append(pos)
items['Office'].append(office)
items['Age'].append(age)
items['Start_Date'].append(start_data)
items['Salary'].append(salary)
x = pd.DataFrame(items, columns=['Name','Position','Office','Age',
'Start_Date','Salary'])
yield x.to_csv("r",sep=",")
From this code what I get is like this ;
,Name,Position,Office,Age,Start_Date,Salary
0,"{'name': [u'Tiger Nixon',
u'Garrett Winters',
u'Ashton Cox',
u'Cedric Kelly',
u'Airi Satou',
u'Brielle Williamson',
u'Herrod Chandler',
I am getting the names column but I get it 59 times.For instance I have the first row, 'Tiger Nixon' 59 times. I get 59 times position column also and so on. And the scraped data is not in good shape also. I am new to scrapy and open to any help or suggestions. Thanks in advance!
EDIT : My items.py is like this;
import scrapy
class WikiItem(scrapy.Item):
name = scrapy.Field()
position = scrapy.Field()
office = scrapy.Field()
age = scrapy.Field()
start_data = scrapy.Field()
salary = scrapy.Field()
Solution
Ok, I can't comment and I can't test your code because I don't have the definition of WikiItem. But let iterate over this response, ok? Can you check what do you get with this code?
class Spider(scrapy.Spider):
name = "wiki"
start_urls = ['https://datatables.net/']
def parse(self, response):
trs = response.xpath('//table[@id="example"]//tr')
if trs:
items = []
for tr in trs:
print tr.xpath('td[2]//text()').extract()
item = {
"Name": tr.xpath('td[1]//text()').extract(),
"Position": tr.xpath('td[2]//text()').extract(),
"Office": tr.xpath('td[3]//text()').extract(),
"Age": tr.xpath('td[4]//text()').extract(),
"Start_Date": tr.xpath('td[5]//text()').extract(),
"Salary": tr.xpath('td[6]//text()').extract()
}
items.append(item)
x = pd.DataFrame(items, columns=['Name','Position','Office','Age',
'Start_Date','Salary'])
yield x.to_csv("r",sep=",")
Answered By - Esteban Martinena Guerrero
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.