Issue
I have been trying to scrape some images using Beautifulsoup in Python and I am facing some problems, so the thing is that I am successfully able to scrape the link as well as store it in the folder but the images are in an unsupported format.
res = requests.get('https://books.toscrape.com/')
res.raise_for_status()
file = open('op.html', 'wb')
for i in res.iter_content(10000):
file.write(i)
os.makedirs('images', exist_ok=True)
newfile=open("op.html",'rb')
data=newfile.read()
soup=BeautifulSoup(data,'html.parser')
for link in soup.find_all('img'):
ll=link.get('src')
ima = open(os.path.join('images', os.path.basename(ll)), 'wb')
for down in res.iter_content(1000):
ima.write(down)
It says file format not supported even though it's in a jpeg format output image in a folder
Solution
Your problem is that after you find the URL of the image you don't do anything with it and instead you try to save the whole inital request which is just the html file of the whole website. Try something like this instead:
base_url = 'https://books.toscrape.com/'
res = requests.get('https://books.toscrape.com/')
res.raise_for_status()
file = open('op.html', 'wb')
for i in res.iter_content(10000):
file.write(i)
os.makedirs('images', exist_ok=True)
newfile=open("op.html",'rb')
data=newfile.read()
soup=BeautifulSoup(data,'html.parser')
for link in soup.find_all('img'):
ll=link.get('src')
ima = os.path.join('images', os.path.basename(ll))
current_img = os.path.join(base_url, ll)
img_res = requests.get(current_img, stream = True)
with open(ima, 'wb') as f:
shutil.copyfileobj(img_res.raw, f)
del img_res
Answered By - WholesomeGhost
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.