Issue
I have the HTML code below:
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-31121 status-publish first instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-31301 status-publish instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-28416 status-publish last instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
I need to extract the Id of each product presented in the class description using beatiful soup (31121/ 31301/ 28416 are the ids) how can i do that ?
Solution
Iterate over your selection extract class
attribute, iterate over its classes and pick class
starts with post-
:
[c.split('-')[-1] for e in soup.select('div.type-product') for c in e['class'] if c.startswith('post-')]
or
[c.split('-')[-1] for e in soup.select('div[class*="post-"]') for c in e['class'] if c.startswith('post-')]
Example
html = '''
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-31121 status-publish first instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-31301 status-publish instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-28416 status-publish last instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
'''
soup = BeautifulSoup(html)
[c.split('-')[-1] for e in soup.select('div.type-product') for c in e['class'] if c.startswith('post-')]
output
['31121', '31301', '28416']
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.