python - requesting different paragraphs of the body part of different articles with bs4

Question

Welcome To Ask or Share your Answers For Others

python - requesting different paragraphs of the body part of different articles with bs4

asked Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I need the body part of different articles on a page. They've been written in a section tag including several p tags for each paragraph. like:

<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>

<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>

If I use code below :

import requests
import re
from bs4 import BeautifulSoup

r = requests.get('url')

all_bodies = soup.find_all('section')
for i in range(len(all_bodies)):
    print(all_bodies[i])

It returns the complete content of section and if I add p tag to find_all it returns each p tag as an element of the list, but I want whole p tags of a section in one list element.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

3.5k views

1 Answer

深蓝 · Answer 1 · 2021-01-24T02:50:16+0000

Add an additional loop and find all <p>:

for i in all_bodies:
    for p in i.find_all('p'):
        print(p)

Or as alternativ use css selectors to avoid that additional loop:

for p in soup.select('section p'):
    print(p)

Example with additional for loop

from bs4 import BeautifulSoup

html = '''
<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>

<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>
'''
soup = BeautifulSoup(html, 'lxml')

all_bodies = soup.find_all('section')

for i in all_bodies:
    for p in i.find_all('p'):
        print(p)

Output

<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>

Categories

python - requesting different paragraphs of the body part of different articles with bs4

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags