I was working on a task that scrape several web pages. After running this task for a while, i found that memory taken by my process is raising forever until it was about to eat all memory available of the server.
after some investigation regarding this matter, i knew that the problem was in my understanding to how mechanize agent works
let me explain with an example
agent = WWW::Mechanize.new
while(true)
page = agent.get(“www.example.com”)
end
in this example, memory will be consumed because mechanize keeps history within the agent, i looked in its documentation and found that there is a parameter which is called “max_history” which when set will fix this issue i think but didn’t try
also a fix to such issue, if you don’t need history is to write your code like that
while(true)
agent = WWW::Mechanize.new
page = agent.get(“www.example.com”)
end
That’s it, maybe this piece of information can be useful for someone facing this issue just like me
No comments:
Post a Comment