Issue
I want to scrape all the elements from what's app but I'm having problem with selenium python This is my code :
from selenium import webdriver
from selenium.webdriver.common import keys
import time
driver = webdriver.Chrome("path/to/chromedriver")
driver.get("https://web.whatsapp.com/")
time.sleep(10)
input("Qr Code: ")
driver.implicitly_wait(10)
numbers = driver.find_elements_by_class_name('_ccCW')
for n in numbers:
print(n.text)
and my script is scraping only 17 items and whatsapp have 3 classes _ccCW
FqYAR
i0jNr
and one of each class had 17 items so how scrape these three classes
Solution
same element of WhatsApp can be access with 2/3 different xpath so using single xpath for an element will not return the text alaways.
some chrome extension will help to identify xpath well. i used following 4 extension.
- HTML DOM Navigation
- TruePath
- XPath Helper
- xPath Finder
following are the fixed elements of whatsapp
chat_search_box = "/html/body/div[1]/div[1]/div[1]/div[3]/div/div[1]/div/label/div/div[2]"
selected_profile_header_name = "//HEADER[@class='_23P3O']//SPAN[@class='_ccCW FqYAR i0jNr']"
chat_name = "/html/body/div[1]/div[1]/div[1]/div[4]/div[1]/header/div[2]/div[1]/div/span"
footer = "/html/body/div[1]/div[1]/div[1]/div[4]/div[1]/footer/div[1]"
footer_textbox = "/html/body/div[1]/div[1]/div[1]/div[4]/div[1]/footer/div[1]/div[2]/div/div[1]/div/div[2]"
msg_nav_arrow = "/html/body/div[1]/div[1]/div[1]/div[4]/div[1]/div[3]/div/div[1]/span/div/span[2]"
below xpath are for accessing chat name in left pane. these are fixed but div number is dynamic. i = div # chat/group
pane_base = "(//div[@id='pane-side']//div[@class='_3OvU8'])"
pane_search_parent = lambda x : pane_base + "[" + str(x) + "]"
pane_group_last_sender = lambda i: pane_search_parent(i) + "//span[@class='FqYAR i0jNr']"
pane_user_sms = lambda i : pane_search_parent(i) + "//span[@class='_ccCW FqYAR i0jNr']"
pane_sms_date = lambda i: pane_search_parent(i) + "//div[@class='_3vPI2']/div[@class='_1i_wG']"
pane_notif = lambda i: pane_search_parent(i) + "//span[@class='_23LrM']"
pane_username = lambda i: pane_search_parent(i) + "//div[@class='zoWT4']"
for accessing message from chat is hardest part and same elements has multiple xpath.
msg_base = "(//div[@id='main']//div[@class='y8WcF']/div)[2]" #u have to change [2] with [n] to read specific msg
#u could find SENDER from any of 3 xpath depending on msg type.. sometime msg could be quoted text or first post or forwared msg or could be with image.
SENDER = [msg_base + "/div/div/div/div[1]/span[1]", msg_base + "//span[@class='a71At _3xSVM i0jNr']", msg_base + "//span[@class='_1BUvv']"]
#following item follows same as SENDER. please add 'msg_base' to each list items.
SENDER_NAME = ['/div/div/div/div[1]/span[2]']
SENDER_TEXT = ['/descendant::div/span/span[1]','//span[@class="i0jNr selectable-text copyable-text"]']
QUOTED_TEXT = ['/div/div/div/div[2]/div[1]/div/div/div/div/div[2]',"//span[@class='quoted-mention i0jNr']",'/div/div/div/div[1]/div[1]/div/div/div/div/div[2]']
QUOTED_SENDER = ['/div/div/div/div[2]/div[1]/div/div/div/div/div[1]/span[1]','/div/div/div/div[1]/div[1]/div/div/div/div/div[1]/span',"//span[@class='a71At i0jNr']"]
TIME = ["//span[@class='kOrB_']",'/descendant::div[last()]']
Please feel free to ask any question about whatsapp scraping using python selenium
Answered By - ahmedul_Kabir_Omi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.