Issue
I am a newbie in python, I have to loop through a csv containing article linksa and write the body into a text file. I can't loop through each link and Iam getting the error "string indices must be integers"
with open('/content/Input.csv', mode='r') as file:
csvfile = csv.reader(file)
for i in csvfile :
blogurls = i[1]
# print(blogurls)
for rows in blogurls:
url = blogurls[rows]
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
the input format looks like this:
Help would be appreciated.
Solution
As your csv file contains headers, use csv.DictReader
instead of csv.reader
, it will be more intuitive:
Full code:
import requests
import csv
from bs4 import BeautifulSoup
with open('/content/Input.csv', mode='r') as file:
csvfile = csv.DictReader(file)
for row in csvfile :
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"}
page = requests.get(row['URL'], headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
with open(f"article{row['URL_ID']}.txt", 'w') as out:
# Extract article title
title = soup.find('h1', {'class': 'entry-title'}).text
print(title, file=out, end='\n\n')
# Extract related content <p>...</p>
for p in soup.select('div p', {'class': 'td-post-content'}):
if not p.has_attr('class'):
print(p.text.strip(), file=out, end='\n\n')
Output: article37.txt
:
AI in healthcare to Improve Patient Outcomes
Introduction
“If anything kills over 10 million people in the next few decades, it will be a highly infectious virus rather than a war. Not missiles but microbes.” Bill Gates’s remarks at a TED conference in 2014, right after the world had avoided the Ebola outbreak. When the new, unprecedented, invisible virus hit us, it met an overwhelmed and unprepared healthcare system and oblivious population. This public health emergency demonstrated our lack of scientific consideration and underlined the alarming need for robust innovations in our health and medical facilities. For the past few years, artificial intelligence has proven to be of tangible potential in the healthcare sectors, clinical practices, translational medical and biomedical research.
After the first case was detected in China on December 31st 2019, it was an AI program developed by BlueDot that alerted the world about the pandemic. It was quick to realise AI’s ability to analyse large chunks of data could help in detecting patterns and identifying and tracking the possible carriers of the virus.
Many tracing apps use AI to keep tabs on the people who have been infected and prevent the risk of cross-infection by using AI algorithms that can track patterns and extract some features to classify or categorise them.
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.