Monday, June 29, 2009

Memory Leakage while using Mechanize

I was working on a task that scrape several web pages. After running this task for a while, i found that memory taken by my process is raising forever until it was about to eat all memory available of the server.

after some investigation regarding this matter, i knew that the problem was in my understanding to how mechanize agent works

let me explain with an example

agent = WWW::Mechanize.new
while(true)
page = agent.get(“www.example.com”)
end

in this example, memory will be consumed because mechanize keeps history within the agent, i looked in its documentation and found that there is a parameter which is called “max_history” which when set will fix this issue i think but didn’t try

also a fix to such issue, if you don’t need history is to write your code like that

while(true)
agent = WWW::Mechanize.new
page = agent.get(“www.example.com”)
end

That’s it, maybe this piece of information can be useful for someone facing this issue just like me

No comments: