Python Pubmed
Web Scrape Pubmed Using Python Script
#!/usr/bin/env python
##################
# PYTHON SCRIPT
# PERFORM WEBSITE SCRAPE OF PUBMED
# PULL RELEVANT ARTICLE INFO FROM WEBPAGE
# FORMAT CONTENT FOR WIKI TEMPLATE
# REQUIRES 'BeautifulSoup'
# AUTHOR: BRADLEY MONK
# LICENSE: GNU
#################
import re
# re.compile('<title>(.*)</title>')
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.ncbi.nlm.nih.gov/pubmed/10731148').read())
print("#################------------------#################")
#------- pubmed authors ---------#
print("{{Article|")
div_tag = soup.find_all('div', attrs={"class": "auths"})
for div_tag.a in div_tag:
diva = div_tag.a
for string in diva.strings:
auts = string
print(string)
#------- pubmed authors ---------#
print(auts)
#------- pubmed year ------------#
print("|")
jouryear = soup.find_all(attrs={"class": "cit"})
year = jouryear[0].get_text()
yearlength = len(year)
titleend = year.find(".")
year1 = titleend+2
year2 = year1+1
year3 = year2+1
year4 = year3+1
year5 = year4+1
print(year[year1:year5])
#------- pubmed year ------------#
#------- pubmed journal ---------#
journal = soup.find_all(attrs={"class": "cit"})
print("|")
print(journal[0].a.string)
#------- pubmed journal ---------#
print("- [http://domain.com/linktofile.pdf PDF]")
#--------- pubmed PMID -----------#
PMID = soup.find_all(attrs={"class": "rprtid"})
print("|")
print(PMID[0].dd.string)
#--------- pubmed PMID -----------#
#------- pubmed title ---------#
title = soup.find_all(attrs={"class": "rprt abstract"})
print("|")
print(title[0].h1.string)
#------- pubmed title ---------#
print("}}")
print("{{ExpandBox|Expand to view experiment summary|")
#------- pubmed abstract ---------#
abstract = soup.find_all(attrs={"class": "abstr"})
print(abstract[0].p.string)
#------- pubmed abstract ---------#
print("}}")
Result
Hayashi Y Shi SH Esteban JA Piccini A Poncer JC Malinow R • 2000 • Science PDF
Expand to view experiment summary
To elucidate mechanisms that control and execute activity-dependent synaptic plasticity, alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionate receptors (AMPA-Rs) with an electrophysiological tag were expressed in rat hippocampal neurons. Long-term potentiation (LTP) or increased activity of the calcium/calmodulin-dependent protein kinase II (CaMKII) induced delivery of tagged AMPA-Rs into synapses. This effect was not diminished by mutating the CaMKII phosphorylation site on the GluR1 AMPA-R subunit, but was blocked by mutating a predicted PDZ domain interaction site. These results show that LTP and CaMKII activity drive AMPA-Rs to synapses by a mechanism that requires the association between GluR1 and a PDZ domain protein.