BioNuc Technical: February 2009

Thursday, February 19, 2009

MySQL 5.1 Error with Rails 2.2

if you found yourself stuck with such error

ArgumentError (NULL pointer given):
(eval):3:in `each_hash'
(eval):3:in `all_hashes'

then this might be useful for you.

Actually i don't know why it happened but i reached that this only happened between Rails 2.2 and MySQL 5.1
when i downgraded to MySQL 5.0 everything went fine

of course it will be nice from you to dig and search inside to know WHY THIS HAPPENED but if you have no time AS WAS THE CASE WITH ME, such info can help you survive

Tuesday, February 17, 2009

Which Stemming Algorithm To use ??

suppose you want to search for something in a full database of words, using exact matching is a solution but will miss some data you may need

for example if you search for fishes, then document having fish as one of its words will be nice to retrieve them. That's why it is suggested that you use a stemmer so that words in your documents is transferred to its root making retrieval much wider than before

stemmers will take fishes and return it to fish and take your search keyword and stem also then search returning documents you want

there are many types of automatic stemming techniques, i found them to be one of those

Brute Force
Suffix Stripping
Lemmatisation
Stochastic
Hybrid
Affix
Matching

This is a long list of algorithms but the question that i didn't find an answer for it was which one to choose and in which cases and why
i didn't find a fast solution on the internet and i think to get a solution i have to dig inside papers published in that field

anyway, in my two projects i was using Ruby and used in both a stemmer belonging to the Affix class
i found two solutions one called porter and another called snowball

i tried both on a small list of 116 terms that i have in my db and results was the same except in these cases

emotionality -- emotion -- emot
joy -- joi -- joy
negativity -- neg -- negat

the first is the actual word, the second is porter result, the third is snowball result
my opinion is that the result in this small sample is 2 for snowball and 1 for porter
BUT still i can't say which one should i use, that's why i have shown these cases here in that post to see your opinions in that matter

One last point to say is that porter overstemmed negativity and snowball overstemmed emotionality

I am waiting for your comments as i need to take decision and choose one to use in my new project

Wednesday, February 11, 2009

HTML Scrapping using Javascript ((for google gadgets))

Some friends at my company were working on doing some google gadgets. A large sum of gadgets were depending on data gathered from other websites which lack of any XML or RSS service providing this data in a direct way.

Since this is a problem we will face every now and then, we started thinking about a more generic solution to use in any gadget depending on such source of data.

The solution we reached was one of these three

using a scrapping service such as Dapper or Yahoo pipes to do the scrapping on behalf of us and returns a well formed XML file to use in any gadget
create a google app engine that we call and it scrape the data and returns XML to us
using JS for scrapping HTML pages

the first and second solutions may seam the same and actually they are except that Dapper isn't that reliable as it sometimes fails due to extra load on it while google app engine was proven to survive under high request rates

Anyway, i liked the third solution and said to myself lets give it a try and see if it will be performant enough or not. I thought scrapping html using JS is an easy matter that can be done easily in any google gadget but i was proven not to be like that at all. I will summerize the trials i made here starting from those who failed to the last solution that worked.

Depending on Google Api method "_IG_FetchXmlContent". This way failed easily because it was expecting XML document and was faced with HTML Page. It gave me parse error on Doctype line. The result is FAILURE
Depending on Google Api method "_IG_FetchContent". This way gave us the html as it is and it was time to parse it using DOM Parsers built already inside browsers. I tried doing so using Firefox browser but also got parse error because this is not a XML document but HTML one and parsers available only expects XML. The result is FAILURE
Repeating step 2 again but after using a regular expression to take only inner HTML of body tag. DOM Parser failed on one of the comments lines present in the HTML page which may appear in
may pages so this isn't a generic solution to be accepted. The result is FAILURE
Using Regular expression to get body inner html and then add this to a hidden div then using normal JS methods for traversing DOM nodes considering this Div as my root. The result is SUCCESS

Since, the fourth trial was successful i made a generic method that anyone can use in his gadget. this simple method will just get html and scrape based on your scrapping function. To understand what i mean, have a look at the function definition first

scrapeHTMLBody = function(url, dataHolderId, scrapeFunction){}

as in this definition we see that the function needs some parameters

url to retrieve html from
dataHolderId the id of the hidden div that the retrieved html will be added to it
scrapeFunction a function that takes the hidden div as a root element and use JS to get data desired "every one should write his according to what he wants to retrieve"

and this is the implementation of it

scrapeHTMLBody = function(url, dataHolderId, scrapeFunction){ _IG_FetchContent(url, function(responseText){ operate(responseText, dataHolderId, scrapeFunction); }); }

operate = function(responseText, dataHolderId, scrapeFunction){ var body = /<body.*?>((.|\n|\r)*)<\/body>/.exec(responseText); var bodyData = body[1]; _gel(dataHolderId).innerHTML = bodyData; scrapeFunction(dataHolderId); }

these two functions are used to get html page then retrieve body inner html then call the scrape function passing to it the if of the hidden div containing the html body
it is your responsibility now to write the scrapping function desired based that this div is the root of your DOM tree

this is an example of a scrapping function i defined

scrape = function(dataHolderId){ var elements = _gel(dataHolderId).getElementsByClassName('main'); var noktas = []; var num = elements.length; for(i=0 ; i<num ; i+=2) noktas.push(elements[i].childNodes[0].innerHTML); for(i=0 ; i<noktas.length ; i++){ var e = document.createElement('p'); e.innerHTML = noktas[i]; document.body.appendChild(e); } }

That's it, i think you are ready now to use these two functions in any gadget whose data source should be scrapped
This method should be better as here all processing is made on client machine rather than any other servers

Wednesday, February 4, 2009

Ubuntu No Audio Bug After Hibernating

Well, this was very annoying bug in Ubuntu latest version

but, thanks God it was fixed on 22 Jan

This is just an announcement, you can follow the ticket here

Monday, February 2, 2009

[>30] Mounting my N80 mobile on Ubuntu Machine

This shall be the first episode in my long series called ">30" which simply means that it took more than 30 minutes. I am collecting here some of the problems that i fought with for more than 30 minutes in order not even solve it but finding a resource on the internet that can relief me from such a pain.

The first episode of this series is talking about how to mount your Bluetooth device on your Ubuntu machine. I will explain such a thing on my N80 mobile and Ubuntu machine but i think anyone can use such points on any other devices and Linux machines.

Before getting into the main point, i would like to state that this post might get you the solution or not. I am writing it while doing such a task and while i am writing such introduction, i actually reach to the steps needed to get out of such misery journey. But, anyone whether we reach a solution together or not, i think this post will put you on a track to continue from so you won't lose your time reading it.

Note: points highlighted in bold style are points related to my case so it should change from one case to another.

The journey starts in this post Mounting a Nokia Phone a Little Bit Easier which is really a nice point to start with. It states these points

Find out your phone’s Bluetooth MAC address if you don’t know it already:
hcitool scan
Find out the OBEX FTP channel it uses
sdptool search FTP
Load the fuse kernel module:
sudo modprobe fuse
Make a suitable mount point for your phone:
mkdir /media/n80
Mount
obexfs -bXX:XX:XX:XX:XX:XX -BYY /media/n80
(where XX:XX:XX:XX:XX:XX is your phone’s MAC and YY is the OBEX channel)
Unmount when you’re done with your file transfers:
fusermount -u ~/Phone

the above steps depends on having some libraries installed on your pc which are obexfs and obexfs. i got them easily using apt-get. If after doing the above steps everything went well then you are done. Yes that easy but if not then continue with me to know what i did after this step.

For me, i was capable of entering the mounted folder and even list files and not only this i removed a file. i thought it is time now to copy some files from my pc to the device. after using the cp command, i took no time at all and then it finished. I was sure something went wrong because i am connecting using bluetooth and things shouldn't end that fast. I used ls command to check that it was added but then it hanged for some time and stated that the folder i am inside isn't a valid one. Simple, mounting failed. I went out and entered again and ran ls command but nothing was listed. I unmounted and remounted but it sometimes shows everything or simply nothing

In some cases, i run "hcitool scan" again and can't find my device even. I disable my bluetooth on the device and enable it to start working again.

Anyway, it was time to search for a solution for such a problem. I found this can't add files to mounted volume, no free space sony Z610i ticket. I digged inside it and read for a while and knew that they have the problem of copying files since the mounted point is stating no free space on the mounted device. They stated this is a bug and pointed to its ticket here. It stated that current version of obexfs don't manage S60 3rd edition mobiles probably and found the ticked closed invalid. Looking at comments i found someone stating that this should be fixed if i used "ObexFTP 0.22 / ObexFS 0.11" instead of the installed ones. These should be compiled as those got by apt-get are one step behind. Ok, no problem let's compile.

prepare an empty folder then wget these files one by one

http://downloads.sourceforge.net/openobex/obexfs-0.11.tar.gz?modtime=1213568386&big_mirror=0
http://downloads.sourceforge.net/openobex/obexftp-0.22.tar.bz2?modtime=1213568417&big_mirror=0
http://downloads.sourceforge.net/openobex/openobex-1.3.tar.gz?modtime=1150294112&big_mirror=0

tar these files using tar -xzf for .gz files and -xjf for .bz2 files. Now lets install them in this order openobex then obexftp and obexfs. This order is important because each one depends on the previous one. But, before doing so, these are some libraries i needed while compiling so make sure you have them installed

python-dev
libfuse-dev
libusb-dev
tcl8.4-dev
tcl-dev

now in each folder. run "./configure" then "make" then make "install" and you should be ready to redo the steps stated above which i wrote them at the beginning of my journey.

I did so but i am sorry to say that all this was useless for me. If everything went ok with you at this step then you are ready to have fun. If not then continue with me, i still have some points to state.

I reached this link now http://www.thinkwiki.org/wiki/How_to_setup_Bluetooth, i noticed that there is a section for symbian mobiles that is different from other mobiles. seems that i have to continue searching but i think enough for me this night. maybe tomorrow. If you still have problems till tomorrow then wait for my next post that i wish will settle things and put an end to this misery

Blogged with the Flock Browser

BioNuc Technical