Tuesday, July 22, 2014

Introduction to Hadoop

                    Hadoop is framework of Linux based set of tools. It is not any software that you can download on your computer and say "Hey, I have downloaded Hadoop". It is an open source tool and it is freely distributed under Apache license. It means no company is controlling Hadoop, It is maintain by Apache. The concept behind the Hadoop is big data.

       Hadoop supports maintaining big data. When we hear word "Big Data" what comes into our mind is"data" which is "big". There is no particular definition for big data but big data is creating large and growing number of files on daily basis which are measured in Terabytes (10^12) and Petabytes (10^15). Yes, we are talking about very large amount of data.

          The attribute of big data is, it is unstructured data, not organized in relational database in nicely created and arranged tables that has column and knew which type of data will go into which column. This big data comes from users like me and you and from applications like facebook, tweeter etc. from systems like ticket booking system and sensors in factories. This big data comes with different challenges and those are: 

  • Velocity 
  • Volume 
  • Variety 

  Velocity means speed in which data is coming in. Volume means size of data, large and growing amount of data is coming in and variety means type of data, data can be audio, video, voice, image, documents, messages, email, photos, text, public record, log files etc. For example 400 million of tweets are posted every day and 1 million transactions are done by Walmart every hour. To handle this type of data powerful technology is used and that is Hadoop.
Hadoop framework is divided into two main components and those are MapReduce and HDFS (Hadoop Distributed file system). Hadoop file system breaks the big data into small pieces of data and stored it on distributed systems. Here distributed systems are not big powerful computers; here systems are numerous low cost computers. Like a data computation is also divided into the small pieces. When there is request for the data, computation is perform individually on each piece of data and after performing the computation result is combined and sends it to the application.
    Hadoop is widely and commonly used in following areas:  

  •  Social media
  •  Retails
  •  Search tools
  •  Government services 
  •  Financial services 
  •  Intelligence 
            If we look at the HDFS it is much similar to the GFS (Google File System) and MapReduce. It is because the technology is firstly invented and used by Google. In early 90's people were using search engines like Exite, Altavista, Lycos, Infoseek and many more. Yes, these are search engines. Then Google came into the picture and suddenly it became most popular and till today it’s the number one search engine. How Google achieved this victory? Google helped by breaking this suspense, In 2003 Google has released paper on GFS (Google file system). They told the world how they store the data but It was part of the story because world did not know yet how they perform computation. In 2004 Google released one more paper and they told the world how they use MapReduce to perform computation on big data. In 2005 Doug Cutting and Michael Cafarella was working in Yahoo. They got very interested in paper. They started creating something based on this papers and result was Hadoop. Hadoop is indeed strange name and it was the name of toy elephant Doug's son used to play with. It was Doug's son who invented this name latter borrowed by Doug. In 2006, Yahoo donated this project to Apache.
      There are two main components MapReduce and HDFS but with that there are few more projects that fall under Apache Hadoop. Those projects are :
     Hive is the warehouse structure build on the top of the Hadoop for providing data summarization, query and analysis.
      It is open source non-relational database written in java, runs on Hadoop file system.

    Mahout is distributed scalable machine learning algorithm. Mahout's output is recommendation on users search.
     Actual computation is done by MapReduce but to make programming more easy for programmers new programming language is created called Pig.
    It is java base application, which is responsible for scheduling jobs in Hadoop system.
    Flume is used to collect, move and aggregating large amount of log data.
     This is tool used to transfer bulk of data between Hadoop and relational model.

   There are some companies that heavily used Hadoop some of them are Yahoo, Facebook, Amazon, eBay, American Airline, Walmart, The New York Times, Federal Reserve Board, and IBM.

Monday, July 21, 2014

Breakthrough – Converting your Partition from Primary to Logical

In the technological race of Computers, People and Programmers are obsessed with OS (Operating System). At the same time we are unable to loosen the love for our regular OS; since we are so much habitual to it, and we have a solution called DUAL-BOOTING. That’s where your partitioning comes into concern.

Most of the Desktops and Laptops now-a-days that you buy are partitioned as Primary. The Primary partition is mostly seen when you have bought a Windows OS installed machine. But in the craze for other OS you tend to take a risk, but that may be fatal to your system, since your data is on stake. The important part is when you have to dual-boot, that is you want to install another OS, you “should not install” it on “primary partition”, since it is just a partition visible to end user (you), internally it’s your whole Monolithic HDD!

Sunday, July 20, 2014

Problem Step Recorder (PSR) In Windows 7/8

          Problem Step Recorder (PSR) is an in-built step recorder provided in Windows 7 and Windows 8. It maintains a record of steps taken, while doing some sequence of actions on computer. Thus snapshots of screen are recorded in one file once the PSR is enabled. 

Friday, July 18, 2014

New Born Malware

My today's blog is about the newly born malware. Before getting to that newly born malware let me firstly tell you what is the malware.
Malware is malicious software, is any software used to disrupt computer operation, gather sensitive information, or gain access to private computer systems. It can appear in the form of executable code. 

Wednesday, July 16, 2014

How to switch on LED on board using Raspberry pi

Hello folks !
Welcome to raspy tutorial ...

This post is for switching a LED light using terminal in raspberry pi.

To do this activity you need to get Administrative rights first of all. So, type the following commands in your terminal.

$sudo su

Monday, March 24, 2014

Photo Morphing

Morphing  means transformation of one image into another image. This technique are use for one image turning in to another image. Different technique are use for transformation of one image smoothly in another. Morphing is the word derive from this word metamorphosis. Metamorphosis means the change the size and scale or appearance form of images.

Friday, March 21, 2014


We all knows that in Linux operating system whatever we execute is a process. So many time we don't know how to get proper information about which process. Basically it's depend on our need that why we have to get information of process.Now if we consodering a normal system user, then we can consider that he needs atleast process id to do some basic operations on that process. For example I open a firefox browser and i want to know what it's process id, so there are two way to get process id.