Issue
I'm using Beautiful Soup, together with Flask, to scrape and show elements on a page. However, I'm having some trouble understanding what I need to do to get multiple items from a single div, and how to write it in a for loop, to show up properly in a HTML file. I'll explain it much better with code.
This is the HTML layout I'm attempting to scrape:
<div class="card card-job">
<div class="container">
<div class="row">
<div class="col-12">
<div class="card-body">
<h2 class="card-title"><a class="stretched-link js-view-job" href="#">Associate Account Manager
[MedTech]</a></h2>
<div class="card-job-actions js-job" data-id="2306150237w"
data-jobtitle="Associate Account Manager [MedTech]">
<button class="btn-add-job " aria-label="Save Associate Account Manager [MedTech]" title="Save">
<svg class="icon-sprite">
<use xlink:href="/images/sprite.svg"></use>
</svg>
<span class="sr-only">Save</span>
</button>
<button class="btn-remove-job d-none" aria-label="Remove Associate Account Manager [MedTech]"
hidden="" title="Remove">
<svg class="icon-sprite">
<use xlink:href="/images/sprite.svg"></use>
</svg>
<span class="sr-only">Saved</span>
</button>
</div>
<ul class="list-inline job-meta">
<li class="list-inline-item">Sales - Selling MDD</li>
<li class="list-inline-item">Mongkok, China</li>
</ul>
</div>
</div>
</div>
</div>
</div>
And I need two things from it:
- The card title text, inside h2, under the
card-title
class, and - The description text, inside the list, under the
job-meta
class.
I can get them individually, without any issues. For example, for card title:
job_title = job.find_all("h2", {"class": "card-title"})
jobs = [i.get_text() for i in jobs_title]
@app.route('/')
def home():
return render_template('home.html', jobs=jobs)
And then I write a for loop inside my HTML file, that gets the list of job titles:
<div class="jobs">
{% for job in jobs %}
<h3>{{ job }}</h3>
{% endfor %}
</div>
However, if I take this approach and scrape the job description the same way, I have no way of adding it to the loop inside the HTML, so that it shows in proper order:
- Job Title 1, Job Description 1,
- Job Title 2, Job Description 2
Which leads me to believe I need a for loop inside my py file, as well. So this was the best I could come up with, but it's giving me a TypeError: 'ResultSet' object is not callable.
soup = BeautifulSoup(source, 'lxml')
jobs = soup.find_all("div", {"class": "card-job"})
for job, desc in jobs(soup.find_all("h2", {"class": "card-title"}),
soup.find_all("ul", {"class": "job-meta"})):
print(job, desc)
What did I do wrong here? And how do I pass it into def home, and use it inside the HTML?
Thank you!
Solution
Just to give you a hint concerning selection strategy - Try to get each card and scrape all related information in one go:
data = []
for e in soup.select('.card'):
data.append({
'title':e.h2.text,
'descr':e.ul.get_text('\n',strip=True)
})
Now you should be able to iterate list
of dicts
to glue everything together.
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.