With my latest assignment I have started exploring Hadoop and related technologies. When exploring HDFS and playing with it, I came across these two syntaxes of querying HDFS:

> hadoop dfs
> hadoop fs

Initally could not differentiate between the two and keep wondering why we have two different syntaxes for a common purpose. I googled the web and found people too having the same question and below are there reasonings:

Per Chris explanation looks like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop
...
elif [ "$COMMAND" = "datanode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS"
elif [ "$COMMAND" = "fs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfsadmin" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
...
that's his reasoning behind the difference.

I am not convinced with this, I looked out for a more convincing answer and here's are a few excerpts which made better sense to me:

FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination . But specifying DFS operation relates to HDFS.

Below are the excerpts from hadoop documentation which describes these two as different shells.


FS Shell
The FileSystem (FS) shell is invoked by bin/hadoop fs . All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. 

DFShell
The HDFS shell is invoked by bin/hadoop dfs . All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands. 

So from the above it can be concluded that it all depends upon the scheme configure. When using this two command with absolute URI, i.e. scheme://a/b the behavior shall be identical. Only its the default configured scheme value for file and hdfs for fs and dfs respectively which is the cause for difference in behavior.
0

Add a comment

You might be seeking for the option to profile (capturing method's execution time) your spring application. Spring provides different ways to profile the application. Profiling should be treated as a separate concern and spring AOP facilitates easy approach to separate this concern. With the help of spring AOP you can profile your methods without making any changes in actual classes. 

You just need to perform pretty simple steps to configure your application for profiling using spring AOP:

In application context file, add below tag to enable the load time weaving          context:load-time-weaver      

With this configuration, AspectJ's Load-time weaver is registered with current class loader.

Why Locking is required?

When two concurrent users try to update database row simultaneously, there are absolute chances of losing data integrity. Locking comes in picture to avoid simultaneous updates and ensure data integrity.

Types of Locking

There are two types of locking, Optimistic and Pessimistic. In this post optimistic locking is described with example.

Optimistic Locking: This locking is applied on transaction commit.

I have got a problem in coding contest. In this post, I would like to share the approach to solve this problem. I would definitely not say that I have invented something new and I am not trying to reinvent the wheel again. I just described the end to end approach that I followed to solve this problem.It was really a great brain storming activity.

Jenkins is an open source tool which provides continuous integration services for software development. If you want to get more detail about Jenkins and it's history I would suggest refer this link. This post will help you installing and configuring Jenkins and creating jobs to trigger maven builds. Setting up Jenkins is not a rocket science indeed. In this post, I would like to concise the installation and configuration steps.

Peer code review is an important activity to find and fix mistakes which are overlooked during development. It helps improving both software quality as well as developers skills. Though, it’s a good process for quality improvement. But this process becomes tedious if you need to share files and send review comments through mails, you need to organize formal meetings and you need to communicate peers who are in different time zone.

If you are using Hudson as continuous integration server and you might feel lazy about accessing Hudson explicitly to check the build status or checking Hudson build status mails, there is an option to monitor Hudson build and perform build activities in Eclipse IDE itself. This post describes about installing, configuring and using the Hudson in Eclipse IDE.

As a developer or architect, you always need to draw some sequence diagrams to demonstrate or document your functionality. And of course, if you do this manually you have to spare much time for this activity. Just think about, what If sequence diagram is generated automatically and you get it free of cost. Your reaction would be ‘wow’ this is great. But the next question will be ‘how’.

If you are using JPA 2.0 with Hibernate and you want to do audit logging from middle-ware itself, I believe you landed up on the exact place where you should be. You can try audit logging in your local environment by following this post.

Required JPA/Hibernate Maven Dependencies

JPA Configuration in Spring Application Context File

For JPA/Hibernate, you need to configure entity manager, transaction, data source and JPA vendor in your ‘applicationContext.xml’ file.

If any issue is observed in production, there are two major aspects related to providing the solution. First ‘how quick you can analyze the root cause’ and second ‘how quick you can fix the issue’. Story starts from analyzing the root cause. Unless, you find the root cause, you can’t even think about providing solution for that. Can you?

Now let’s think about actual production environment. There may be multiple JVMs where application is deployed.
@JCG
@JCG
@JCG
NS Infra contributions at Java Code Geeks
@DZone
@DZone
@DZone
NS Infra contribution at DZone.
Contributors
Blog Archive
Blog Archive
Subscribe
Subscribe
Loading
Dynamic Views theme. Powered by Blogger.