Issue
I am new to Scrapy. I just followed a course and did write the code and understand it somehow. The problem I am facing is caching the first table's data only.
I did try this here's the code.
from ast import parse
from fileinput import filename
import scrapy
class PostsSpider(scrapy.Spider):
name = "posts"
start_urls= [
'https://publicholidays.com.bd/2022-dates/'
]
def parse(self, response):
for post in response.css('table'):
yield{
'date' : post.css('td::text').getall()[0],
'day' : post.css('td::text' ).getall()[1],
'event' : post.css('tr td a::text').getall()[0]
}
and when I am crawling this:
{"date": "21 Feb", "day": "Mon", "event": "Shaheed Day"}
How I can get the table's all data?
Solution
A little bit problem was in css element selection. Now it's working fine. You can just run the code.
from ast import parse
from fileinput import filename
import scrapy
from scrapy.crawler import CrawlerProcess
class PostsSpider(scrapy.Spider):
name = "posts"
start_urls= ['https://publicholidays.com.bd/2022-dates']
def parse(self, response):
for post in response.css('.publicholidays tbody tr'):
yield{
'date' : post.css('td:nth-child(1)::text').get(),
'day' : post.css('td:nth-child(2)::text' ).get(),
'event' : post.css('td:nth-child(3) a::text').get() or post.css('td:nth-child(3) span::text').get()
}
if __name__ == "__main__":
process = CrawlerProcess()
process.crawl(PostsSpider)
process.start()
Output:
{'date': '21 Feb', 'day': 'Mon', 'event': 'Shaheed Day'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '17 Mar', 'day': 'Thu', 'event': "Sheikh Mujibur Rahman's Birthday"}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '18 Mar', 'day': 'Fri', 'event': 'Shab e-Barat'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '26 Mar', 'day': 'Sat', 'event': 'Independence Day'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '14 Apr', 'day': 'Thu', 'event': 'Bengali New Year'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '28 Apr', 'day': 'Thu', 'event': 'Laylat al-Qadr'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '29 Apr', 'day': 'Fri', 'event': 'Jumatul Bidah'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '1 May', 'day': 'Sun', 'event': 'May Day'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '2 May', 'day': 'Mon', 'event': 'Eid ul-Fitr Holiday'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '3 May', 'day': 'Tue', 'event': 'Eid ul-Fitr'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '4 May', 'day': 'Wed', 'event': 'Eid ul-Fitr Holiday'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '16 May', 'day': 'Mon', 'event': 'Buddha Purnima'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '9 Jul', 'day': 'Sat', 'event': 'Eid ul-Adha Holiday'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '\n', 'day': None, 'event': None}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '10 Jul', 'day': 'Sun', 'event': 'Eid ul-Adha'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '11 Jul', 'day': 'Mon', 'event': 'Eid ul-Adha Holiday'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '9 Aug', 'day': 'Tue', 'event': 'Ashura'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '15 Aug', 'day': 'Mon', 'event': 'National Mourning Day'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '19 Aug', 'day': 'Fri', 'event': 'Shuba Janmashtami'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '5 Oct', 'day': 'Wed', 'event': 'Vijaya Dashami'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '9 Oct', 'day': 'Sun', 'event': 'Eid-e-Milad un-Nabi'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
{'date': '16 Dec', 'day': 'Fri', 'event': 'Victory Day'}
2022-04-01 16:30:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://publicholidays.com.bd/2022-dates/>
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.