background
it’s by accident that I have to jump into data center, where 4 kinds of data need deal with:
sensor verification, with huge amount of special raw sensor data
AI perception training, with huge amout of fusioned sensor data
synthetic scenarios data, which used to resimualtion
inter-middle status log data for Planning and Control(P&C)
big data tool is a have-to go through for L3+ ADS team, which has already developed in top start-ups, e.g. WeRide and Pony.AI, as well as top OEMs from NA, Europen. Big data, as I understand is at least as same important to business, as to customers. compared to AI, which is more on customer’s experience. and 2B is a trending for Internet+ diserving into traditional industry. anyway, it’s a good try to get some ideas about big data ecosystem. and here is the first step: hadoop
prepare jdk and hadoop in single node
Java sounds like a Windows langage, there are a few apps requied Java in Ubuntu, e.g. osm browser e.t.c., but I can’t tell the difference between jdk
and jre
, or openjdk vs Oracle. jdk
is a dev toolkit, which includes jre
and beyond. so when it’s always better to set JAVA_HOME
to jdk
folder.
jdk in ubuntu
there are many different version of jdk, e.g. 8, 9, 11, 13 e.t.c. here is used jdk-11
, which can be download from Oracle website, there are two zip files, the src
and the other. the pre-compiled zip is enough to Hadoop in Ubuntu.
|
|
append JAVA_HOME=/usr/local/jdk && PATH=$PATH:$JAVA_HOME/bin to ~/.bashrc
, and can run test java -version
.
what need to be careful here, as the current login user may be not fitted for multi-nodes cluster env, so it’s better to create the hadoop group
and hduser
, and use hduse
as the login user in following steps.
create hadoop user
|
|
the other thing about hduser
, is not in sudo
group, which can be added by:
curren login user is hduser
:
|
|
add HADOOP_HOME=/usr/local/hadoop
and PATH=$PATH:$HADOOP_HOME/bin
to ~/.bashrc
. test with hadoop version
there is another issue with JAVA_HOME not found
, which I modify the JAVA_HOME
variable in $HADOOP_HOME/etc/hadoop/hadoop_env.sh
passwordless access among nodes
- generate SSH key pair
|
|
the following two steps need do on both machines, so that the local machine can ssh access both to itself and to the remote.
enable SSH access to local machine
ssh-copy-id hduser@192.168.0.10
copy public key to the remote node
ssh-copy-id hduser@192.168.0.13
tips, if changed the default id_rsa
name to sth else, doesn’t work. after the changes above, will generates a known_hosts
at local machine, and an authorized_keys
, which is the public key of the client ssh, at remote machine.
test hadoop
- on master node
|
|
- on worker node:
|
|
and test with mapreduce