With my latest assignment I have started exploring Hadoop and related technologies. When exploring HDFS and playing with it, I came across these two syntaxes of querying HDFS:

> hadoop dfs
> hadoop fs

Initally could not differentiate between the two and keep wondering why we have two different syntaxes for a common purpose. I googled the web and found people too having the same question and below are there reasonings:

Per Chris explanation looks like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop
...
elif [ "$COMMAND" = "datanode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS"
elif [ "$COMMAND" = "fs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfsadmin" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
...
that's his reasoning behind the difference.

I am not convinced with this, I looked out for a more convincing answer and here's are a few excerpts which made better sense to me:

FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination . But specifying DFS operation relates to HDFS.

Below are the excerpts from hadoop documentation which describes these two as different shells.


FS Shell
The FileSystem (FS) shell is invoked by bin/hadoop fs . All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. 

DFShell
The HDFS shell is invoked by bin/hadoop dfs . All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands. 

So from the above it can be concluded that it all depends upon the scheme configure. When using this two command with absolute URI, i.e. scheme://a/b the behavior shall be identical. Only its the default configured scheme value for file and hdfs for fs and dfs respectively which is the cause for difference in behavior.
0

Add a comment

You might be seeking for the option to profile (capturing method's execution time) your spring application. Spring provides different ways to profile the application. Profiling should be treated as a separate concern and spring AOP facilitates easy approach to separate this concern.
Why Locking is required?

When two concurrent users try to update database row simultaneously, there are absolute chances of losing data integrity. Locking comes in picture to avoid simultaneous updates and ensure data integrity.
Few months back we had a debate about using Camel vs. Enterprise bus in a new project. I was on the camel side , I found hard to have an ESB just for integration and service chaining. With this blog post, based upon my understanding I will try to summarize when to use what.
The last time our team worked on Esper for complex event processing, it was version 3.4.0. One of the requirement we envisaged was for EPL statements to be externalized into configuration files rather than keeping them in code.
More than a year back, during some research related to CEP, I came across Storm which was "touted" as being a CEP engine and it was very difficult to come to terms with these assertions.
Yesterday our Cassandra development cluster broke down, Mahendra reported that on executing any statement on cassandra-cli, it errs prompting a weird message 'schema disagreement error' on console. I googled, my usual way of being :) and found this FAQ on Cassandra wiki.
We faced a weird issue with Hector, one of the API which we developed to read data from cassandra crashed when we tried integrating DAO layer with other app dependencies. We were shocked and had no clues on what went wrong. Though the code we developed was thoroughly tested.
Cassandra composites as discussed in Datastax blog influenced us to adapt composite modelling in one of our pilot project. We used Hector APIs as the client library for this assignment. Below is the column family example which uses composite comparator type.
Here I am writing to cover up the difference between OrphanRemoval=true and CascadeType.REMOVE in JPA2.0 with hibernate.
Criteria Query has been introduced in JPA 2.0. With the help of criteria queries you can write your queries in a type-safe way. Before criteria queries, developers had to write queries through the construction of object-based query definitions.
One of our current assignment demanded to migrate data from HDFS to Mongo database. Data contained in HDFS was in JSON format and this was a plus since Mongo explicitly support JSON documents.

I started looking out for strategies how shall this migration be executed.
As it happens, there are many instances that one comes across while programming client side of a web-applications where one has to make a trade-off between the so called ‘best-practices’ and the current scope of change in the application.
Gone through Hortonworks prediction for 2013, I find one item missing from the list. I feel it's the time when most of enterprises shall be looking at building PaaS infrastructure to support there business, this is what even Gartner study reveals.
The word count example explained @ http://static.springsource.org/spring-hadoop/docs/current/reference/html/batch-wordcount.html didn’t run for me.
Couple of the issues encountered when I started experimenting with Spring for Apache Hadoop

One, the Hadoop job that I was running was not appearing on the Map/Reduce Administration console or the Job Tracker Interface

And the other: I was trying to run the job from Spring Tool Suite (STS) IDE on
Couple of set-up issues observed while installing and using hive.
Yesterday we were working on setting up our first Hadoop cluster. Though there are many online documentation on this even then we faced a few challenges getting with it.
Loading