Jacks_Python-libxml2
Beware the python bindings for libxml. I was using it in a server application and the thing kept crashing. It would fill up all the memory and throw memory errors. Let me show you the proof I created. If you see something wrong let me know.
import libxml2
alphabit = 'abcdefghijklmnopqrstuvwxyz'
rand = random.Random()
def main():
while True:
randXml = makeRandomXml()
resultChar = parseForResult(randXml)
sys.stdout.write(resultChar)
sys.stdout.flush()
try:
time.sleep(0.1)
except KeyboardInterrupt:
print('-')
break
def parseForResult(xmlStr):
rootElement = libxml2.parseDoc(xmlStr)
childKey = getText(rootElement, '*/@key')
finalChar = getText(rootElement, '*/%s' % childKey)
return finalChar
def makeRandomXml():
child = randWord(8)
other = randWord(10)
result = randChar()
return '<randomXml key="%s"><%s/><%s>%s</%s></randomXml>' % (child, other, child, result, child)
def randWord(length):
output = ''
for x in range(length):
output = output + randChar()
return output
def randChar():
return rand.choice(alphabit)
def getText(element, expression=None):
if expression:
returnVal = None
node = element.xpathEval2(expression)
if node:
returnVal = node[0].content
del node
return returnVal
else:
return element.content
if __name__ == "__main__":
main()
What the code does is simple. It creates a xml string, the schema never changes but the tag names are random. For example...
This is just for something to parse. The root node is parsed. Then does an xpath search for the key string. Similar to my real world case.
I chose the gnu 'top' command to watch the memory usage. I noticed it growing randomly at about 4 bytes a second. For something that will be running for months or years without needing to be restarted, this is a real problem. Of course, this does not prove that the problem is with libxml2. So, lets replace the parseForResult function.
finalChar = xmlStr[49]
return finalChar
Run it again, observe for a while. I did not see it grow by a single byte in the whole time I watched it.
For good measure, lets try and make sure all the tracks are covered. Make sure these vars get deleted.
rootElement = libxml2.parseDoc(xmlStr)
childKey = getText(rootElement, '*/@key')
finalChar = getText(rootElement, '*/%s' % childKey)
del rootElement
return finalChar
def getText(element, expression=None):
if expression:
returnVal = None
node = element.xpathEval2(expression)
if node:
returnVal = node[0].content
del node
return returnVal
else:
return element.content
It looks to me that the memory size grows a little slower but still growing none the less.
And this is what I did to fix the problem...
def parseForResult(xmlStr):
rootElement = ElementTree.fromstring(xmlStr)
childKey = 'None'
if 'key' in rootElement.attrib:
childKey = rootElement.attrib['key']
focusNode = rootElement.find('./%s' % childKey)
finalChar = 'X'
if focusNode is not None:
finalChar = focusNode.text
return finalChar
python-lxml seems to be better when it comes to memory but still leaks. I heard of ElementTree before the libxml bindings. I did not use it though becuse of its poor xpath support. It would seem there is not a single native python lib that has full support of xpath.
