Python Pubmed

From bradwiki
Jump to navigation Jump to search

Web Scrape Pubmed Using Python Script

#!/usr/bin/env python
##################
#  PYTHON SCRIPT
#  PERFORM WEBSITE SCRAPE OF PUBMED
#  PULL RELEVANT ARTICLE INFO FROM WEBPAGE
#  FORMAT CONTENT FOR WIKI TEMPLATE
#  REQUIRES 'BeautifulSoup'
#  AUTHOR: BRADLEY MONK
#  LICENSE: GNU
#################

import re
# re.compile('<title>(.*)</title>')
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.ncbi.nlm.nih.gov/pubmed/10731148').read())


print("#################------------------#################")
#------- pubmed authors ---------#
print("{{Article|")
div_tag = soup.find_all('div', attrs={"class": "auths"})

for div_tag.a in div_tag:
	diva = div_tag.a

for string in diva.strings:
	auts = string
	print(string)

#------- pubmed authors ---------#
print(auts)

#------- pubmed year ------------#
print("|")
jouryear = soup.find_all(attrs={"class": "cit"})
year = jouryear[0].get_text()
yearlength = len(year)
titleend = year.find(".")
year1 = titleend+2
year2 = year1+1
year3 = year2+1
year4 = year3+1
year5 = year4+1
print(year[year1:year5])
#------- pubmed year ------------#

#------- pubmed journal ---------#
journal = soup.find_all(attrs={"class": "cit"})
print("|")
print(journal[0].a.string)
#------- pubmed journal ---------#

print("- [http://domain.com/linktofile.pdf PDF]")

#--------- pubmed PMID -----------#
PMID = soup.find_all(attrs={"class": "rprtid"})
print("|")
print(PMID[0].dd.string)
#--------- pubmed PMID -----------#

#------- pubmed title ---------#
title = soup.find_all(attrs={"class": "rprt abstract"})
print("|")
print(title[0].h1.string)
#------- pubmed title ---------#
print("}}")
print("{{ExpandBox|Expand to view experiment summary|")

#------- pubmed abstract ---------#
abstract = soup.find_all(attrs={"class": "abstr"})
print(abstract[0].p.string)
#------- pubmed abstract ---------#
print("}}")


Result

Hayashi Y Shi SH Esteban JA Piccini A Poncer JC Malinow R • 2000 • Science PDF

Expand to view experiment summary


To elucidate mechanisms that control and execute activity-dependent synaptic plasticity, alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionate receptors (AMPA-Rs) with an electrophysiological tag were expressed in rat hippocampal neurons. Long-term potentiation (LTP) or increased activity of the calcium/calmodulin-dependent protein kinase II (CaMKII) induced delivery of tagged AMPA-Rs into synapses. This effect was not diminished by mutating the CaMKII phosphorylation site on the GluR1 AMPA-R subunit, but was blocked by mutating a predicted PDZ domain interaction site. These results show that LTP and CaMKII activity drive AMPA-Rs to synapses by a mechanism that requires the association between GluR1 and a PDZ domain protein.