Scraping a jobsite. To get a job...I want another bike
Автор: aussie wantok
Загружено: 2022-07-01
Просмотров: 216
Instead of looking for work, I've set up a bot to give me a list of jobs that I can apply for
I'm excluding jobs with certain buzz words like SALES, RESTAURANT etc...
Tools I'm using, PyCharm, Python with BeautifulSoup and request packages
Source code at end of this description
0:00 intro
0:27 Indeed.com jobsite layout
1:04 Indeed.com URL
1:47 PyCharm and basic setup of packages
2:06 request.get() call
2:52 BeatifulSoup
3:11 finding HTML tags in the page
3:25 inspecting tags
4:21 getting first a tag
5:02 getting text and hyperlink from a tag
6:09 get all a tags with class from page
6:46 loop through list
7:57 drop unwanted jobs in list
8:55 sift through jobs - split(), loop, list.append()
10:23 first run
10:34 exclude more jobs
10:58 scrape the first 5 pages
12:12 following generated hyperlinks
12:46 outro
#python #beautifulsoup #indeed #lookingforwork
Source Code
---------------------
import requests
from bs4 import BeautifulSoup
no_no_jobs = ['SALE', 'RECEPTIONIST', 'RETAIL', 'COLES', 'SALES', 'JUNIOR', 'NURSE', 'BAR', 'NURSES', 'MEDICAL',
'SECURITY', 'ADMIN', 'PHYSIOTHERAPIST', 'OCCUPATIONAL', 'ENROLLED', 'TRAINEE', 'TABLELANDS', 'PODIATRIST',
'CARER', 'MASSAGE', 'THERAPIST', 'TOWNSVILLE', 'CHEF', 'BEAUTY', 'PATHOLOGIST', 'THERAPIST', 'BAR',
'AGED', 'RECEPTION', 'MAREEBA', 'CONSTRUCTION', 'BEVERAGE', 'LIBRARY', 'PHARMACY', 'ATTENDANT',
'HOSPITALITY', 'CLEANER', 'HOUSEKEEPING', 'CASINO', 'BARISTA', 'LINGERIE', 'MEDICAL', 'RETAIL',
'MERCHANDISER', 'MERCHANDISERS', 'WAIT', 'WAITING', 'YOUTH', 'RESTAURANT', 'BWS', 'HEALTH', 'WOOLWORTHS',
'WEIPA', 'HOTEL', 'MEDICAL', 'OFFICER']
final_list = []
for z in range(0, 100, 20):
response = requests.get(f"https://au.indeed.com/jobs?l=Cairns%2...{z}&vjk=087c224d4fda7669")
n = response.text
soup = BeautifulSoup(n, "html.parser")
my_a_tags = soup.find_all('a', class_="jcs-JobTitle")
print(len(my_a_tags))
temp_list = []
for i in range(len(my_a_tags)):
print(my_a_tags[i].getText())
print('https://au.indeed.com' + str(my_a_tags[i]['href']))
temp_list.append([my_a_tags[i].getText(), 'https://au.indeed.com' + str(my_a_tags[i]['href'])])
print(temp_list)
#
for i in temp_list:
split_title = i[0].split()
keep = True
for y in range(len(split_title)):
if split_title[y].upper() in no_no_jobs:
keep = False
if keep:
final_list.append(i)
for i in final_list:
print(i)
print(len(final_list))
exit()
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: