Thursday 23rd of February 2012

logo

CodeFire Blog
Data scraping using cURL in PHP PDF Print E-mail
User Rating: / 16
PoorBest 
Technology
Written by Pranjal   
Friday, 01 July 2011 15:23

(Please ensure that you are a genuine user of the site, and site allows you to do some automation and does not consider it as hacking attempt)

Data scraping

Let’s assume that the task for the day is to pull some data out of a site (eg. download a csv file) programmatically where data is behind a login. We shall use PHP/cURL for this automation. I am not going to be talking about cURL or any basic technical details associated with it. So please do not treat this as tutorial for learning cURL J. I am only going to talk about high level process to be followed and some gotchas that could make your life difficult in this process.

First step: You will need to login to the site using cURL. Inspect the login form using a tool such as Firebug or view source to see what all fields are being sent and what is the endpoint of the request. You will need all this information to send login request to the site using cURL. In fact, make a small HTML version of form (using code from the site) on local machine and try and login with that. If it works part of the job is done.

Another point that could be very helpful here is, if the form uses POST request convert that to GET and send the request to a local url to see what all parameters are passed. Sometimes there could be some hidden variables which are not very easy to track. Now inspect the query string from the GET request and create a url for cURL POST based on this string. One important point while writing login request is not to forget saving the cookie. So set the option CURLOPT_COOKIEFILE with filename. Also, you could get the filename using $tmp_fname = tempnam("/tmp", "COOKIE"); to make it platform independent (windows, Linux, Mac)

Every site comes with its own site of rules for login but broadly keeping above points in mind you should be able to login to any site using cURL.

 

Second Step: After login the next step is to get the file and save it on disk. If the site URL is simple (non dynamic) then there is no problem just invoke a simple GET request for that URL with cURL and save it to disk. However if the URL for the file is dynamic then you need to fetch the page which has the link, search for the link in the page of that text (knowledge of REGEX would come handy here) and get the dynamic URL. And then invoke cURL again on the dynamic URL to get the data. One point to keep in mind in case you are dealing with dynamic URL is that when you get the string for URL in php variables if there is any & it gets converted to & so if you directly invoke the url to get the data it will not work. Use htmlspecialchars_decode to get actual URL and you should be able to save the data.

 

 

 

 

 
Open Source option (Gammu) to connect your Phone PDF Print E-mail
Technology
Written by Pranjal   
Sunday, 26 June 2011 12:25
I started this experiment to be able to test the integration of a web based (PHP) application and SMS using a phone connected to machine on which web server is running. 
Gammu seems to be a good open source option to setup PC to phone connection. Gammu comes with multiple utilities such as Wammu (which adds UI based interface to this).  Since I was looking specifically for SMS based options, there is a utility called gammu-smsd which comes bundled with Gammu which runs as daemon and can be used to integrate incoming and outgoing SMS.  It comes with multiple options for integration such as file based integration, SQL (MySQL, PostgreSQL, SOL Lite) based integrations. 
While trying Gammu or gammu-smsd one of the first things to so it set up configuration file so that these applications can connect with Phone and if needed DB (which was true in our case). Config file follows ini file pattern and all the options are very well explained at gammu-config , for gammu-smsd config options look at gammu-smsd config. After going through these links I created a simple config file 
 
---------------------------------
[gammu]
device = com3:
connection = at115200
[smsd]
Service = sql
Driver = native_mysql
PIN = 1234
LogFile = c:\syslog
User = root
Password = password
PC = localhost
Database = sms  
----------------------------------------
 
  I connected my Nokia 5800 using Bluetooth with my windows based machine. Com3 was the port on which my phone was connected. After setting up the config file I tried the information command from command line and got following output
 
 
So that means the connection was established.  After this the next step I tried was connecting gammu-smsd. Gammu comes with db schema for creating database so that gammu-smsd can connect with it. The schema is located in source code under folder 
Gammu-1.29.0-Windows\Gammu-1.29.0-Windows\share\doc\gammu\examples\sql
 
I created the db named smsd and imported the tables and ran the command to connect gammu-smsd and got error
 
 
 

I thought I had setup the DB etc correctly but it was somehow looking foe sms database and not smsd (Initially I had set ‘smsd’ as database name in config file). So it seems there is a bug in version 1.29.0 that it always looks for sms named database. Realizing this I modified the DB name and modified the config file and then the gammu-smsd ran successfully.
It seems that gammu is not able to send SMS using Nokia 5800 Express music since I got following output 
 
 
 

 Same is reported on Gammu site as well (Nokia 5800).   I think I will have to get hold of a supported phone to test outgoing and incoming messages. 
Gammu also comes with PHP classe gammu, which has wrapper to send and other functions exposed by gammu which can easily be used to integrate with PHP. 
So it seems very much possible to be able to create a web based application that can send SMS using a GSM phone connected via Gammu. 

 
Adding custom Facebook tab PDF Print E-mail
Technology
Written by Pranjal   
Friday, 24 June 2011 12:34
CodeFire Facebook page

 

Facebook gives you a functionality to add facebook applications as tabs on your existing pages. Also there are (programmatic) ways to be able to find out if users have liked your page or not. This functionality can very well be used to 'entice' users to like your pages. We have added a "Welcome" tab to our Facebook page. Do take a look at the same and let us know what you think about it. This application is completely CMS driven, so you can change what displays on the pages without knowing any HTML. If you like the application and want to know more about it mail us at fbapp (at) codefire.in

 
Google Panda Update 2.2 is live (16 June 2011) PDF Print E-mail
Search Engine Optimization
Written by Pranjal   
Friday, 17 June 2011 00:00
What is Google Panda
 
Google panda is basically a new algorithm announced by Google. Its aim is to fight against content farms (duplicate content website).
Google started rolling out its Panda update initially in February in the US and later in April it reached a few other regions. The Google Panda update was made to reduce the amount of spam within the Google search results

Google panda is not a complete overhaul of the ranking algorithm but is a value that feeds into the overall Google algorithm. If it helps consider it as if every site is given a PandaRank score. Those low in Panda come through OK; those high get hammered.
Google Panda
 
The Panda Ranking Factor
 
Rather than being a change to the overall ranking algorithm, Panda is more a new ranking factor that has been added into the algorithm

Panda is a filter that Google has designed to spot what it believes are low-quality pages. Have too many low-quality pages, and Panda effectively flags your entire site. If you have a high Panda rank that doesn’t mean that your entire site is out of Google. But what it does is adds a penalty to help ensure only the better ones make it into Google’s top results.


Conclusion
 
Google keeps making small algorithm changes all the time, which can cause sites to fall (and rise) in rankings independently of Panda. It also keeps updating factors that feed into the overall algorithm, such as PageRank scores, on an irregular basis. Those updates can impact rankings independently of Panda. So far, Google has confirmed when major Panda factor updates have been released. If you saw a traffic drop during one of these times, there’s a good chance you have a Panda-related problem.


 
CodeFire Technologies Blog PDF Print E-mail
CodeFire News
Written by Pranjal   
Wednesday, 15 June 2011 00:00
This is the place where we will be keeping track of latest in technology, update the latest from the company and more. So stay tuned...
 



 Copyright © 2010 CodeFire Technologies Pvt Ltd All Rights Reserved.