What are the libraries that support XPath? Is there a full implementation? How is the library used? Where is its website?
This question is tagged with
~ Asked on 2008-08-12 11:28:36
libxml2 has a number of advantages:
If you are doing simple path selection, stick with ElementTree ( which is included in Python 2.5 ). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2.
Sample of libxml2 XPath Use
import libxml2 doc = libxml2.parseFile("tst.xml") ctxt = doc.xpathNewContext() res = ctxt.xpathEval("//*") if len(res) != 2: print "xpath query: wrong node set size" sys.exit(1) if res.name != "doc" or res.name != "foo": print "xpath query: wrong node set value" sys.exit(1) doc.freeDoc() ctxt.xpathFreeContext()
Sample of ElementTree XPath Use
from elementtree.ElementTree import ElementTree mydoc = ElementTree(file='tst.xml') for e in mydoc.findall('/foo/bar'): print e.get('title').text
~ Answered on 2008-08-26 13:06:39
Sounds like an lxml advertisement in here. ;) ElementTree is included in the std library. Under 2.6 and below its xpath is pretty weak, but in 2.7+ much improved:
import xml.etree.ElementTree as ET root = ET.parse(filename) result = '' for elem in root.findall('.//child/grandchild'): # How to make decisions based on attributes even in 2.6: if elem.attrib.get('name') == 'foo': result = elem.text break
~ Answered on 2012-11-22 01:05:52
Use LXML. LXML uses the full power of libxml2 and libxslt, but wraps them in more "Pythonic" bindings than the Python bindings that are native to those libraries. As such, it gets the full XPath 1.0 implementation. Native ElemenTree supports a limited subset of XPath, although it may be good enough for your needs.
~ Answered on 2009-11-13 23:11:17
Another option is py-dom-xpath, it works seamlessly with minidom and is pure Python so works on appengine.
import xpath xpath.find('//item', doc)
~ Answered on 2010-01-23 09:30:19
You can use:
from xml.dom.ext.reader import Sax2 from xml import xpath doc = Sax2.FromXmlFile('foo.xml').documentElement for url in xpath.Evaluate('//@Url', doc): print url.value
import libxml2 doc = libxml2.parseFile('foo.xml') for url in doc.xpathEval('//@Url'): print url.content
~ Answered on 2010-08-23 13:00:01
The latest version of elementtree supports XPath pretty well. Not being an XPath expert I can't say for sure if the implementation is full but it has satisfied most of my needs when working in Python. I've also use lxml and PyXML and I find etree nice because it's a standard module.
NOTE: I've since found lxml and for me it's definitely the best XML lib out there for Python. It does XPath nicely as well (though again perhaps not a full implementation).
~ Answered on 2008-08-14 09:48:59
You can use the simple
from lxml.html.soupparser import fromstring tree = fromstring("<a>Find me!</a>") print tree.xpath("//a/text()")
~ Answered on 2015-11-15 05:31:21
If you want to have the power of XPATH combined with the ability to also use CSS at any point you can use
>>> from parsel import Selector >>> sel = Selector(text=u"""<html> <body> <h1>Hello, Parsel!</h1> <ul> <li><a href="http://example.com">Link 1</a></li> <li><a href="http://scrapy.org">Link 2</a></li> </ul </body> </html>""") >>> >>> sel.css('h1::text').extract_first() 'Hello, Parsel!' >>> sel.xpath('//h1/text()').extract_first() 'Hello, Parsel!'
~ Answered on 2017-12-16 22:16:20
Another library is 4Suite: http://sourceforge.net/projects/foursuite/
I do not know how spec-compliant it is. But it has worked very well for my use. It looks abandoned.
~ Answered on 2010-08-23 12:57:47
PyXML works well.
You didn't say what platform you're using, however if you're on Ubuntu you can get it with
sudo apt-get install python-xml. I'm sure other Linux distros have it as well.
If you're on a Mac, xpath is already installed but not immediately accessible. You can set
PY_USE_XMLPLUS in your environment or do it the Python way before you import xml.xpath:
if sys.platform.startswith('darwin'): os.environ['PY_USE_XMLPLUS'] = '1'
In the worst case you may have to build it yourself. This package is no longer maintained but still builds fine and works with modern 2.x Pythons. Basic docs are here.
~ Answered on 2008-08-12 19:34:44
If you are going to need it for html:
import lxml.html as html root = html.fromstring(string) root.xpath('//meta')
~ Answered on 2019-05-29 13:48:30