So im trying to parse out all of the href links on a page here: https://data-wake.opendata.arcgis.com/datasets but Ive noticed that none of the links im looking for are returning from my python code which is here:
driver = webdriver.PhantomJS("C:\Users\Jlong\Desktop\phantomjs.exe")
driver.get(r"https://data-wake.opendata.arcgis.com/datasets")
pagesource = driver.page_source
bsobj = BeautifulSoup(pagesource,'lxml')
for line in bsobj.find_all('a'):
print(line.get('href'))
Here is a snipit of the html from chromes inspect: Html Inspect
the expected result would be to return something like the following:
"/datasets/wakeforestnc::state-system-streets"
I have also noticed that there is something called Ember application.js running on the page and I think that maybe preventing me from accessing the href attributes that are deeply nested in the main ember tag. IM not familair with ember or how to parse complex pages like this, any help would be greatly appreciated!
Aucun commentaire:
Enregistrer un commentaire