Hadoop学习笔记:Hadoop安装(本地安装)

Stella981
• 阅读 725

最近开始研究大数据这块,现在从最基础的Hadoop开始,后续将逐渐学习Hadoop整个生态圈的各个部分组件。

Hadoop安装分为本地安装、伪分布式、完全分布式和高可用分布式,这里为个人学习用(实际情况是本人没有那么多机器,装虚拟机的话,内存可能也不够,T_T),仅涉及到本地安装和伪分布式安装。

环境准备

# 操作系统信息
$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)

# 系统内核信息
$ uname -r
3.10.0-693.11.6.el7.x86_64

# hostname信息 ( 配置 )
$ hostnamectl set-hostname v108.zlikun.com
$ hostnamectl status
   Static hostname: v108.zlikun.com
         Icon name: computer-vm
           Chassis: vm
        Machine ID: da1dac0e4969496a8906d711f95f2a7f
           Boot ID: 8ffc47fb1b7148ab992d8bf6f3f32ac1
    Virtualization: vmware
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-693.11.6.el7.x86_64
      Architecture: x86-64

# 在`/etc/hosts`文件中配置主机名 ( 不要在意为什么是v108这个细节,只是我电脑上的第8个虚拟机而已,^_^ )
192.168.1.108   v108.zlikun.com v108

安装JAVA

# 解压 `jdk-8u151-linux-x64.tar.gz` 包,并移动到 `/usr/local` 目录下
/usr/local/jdk1.8.0_151

# 在 `/etc/profile` 中配置环境变量
export JAVA_HOME=/usr/local/jdk1.8.0_151
export PATH=$PATH:$JAVA_HOME/bin

# 查看JDK版本
$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

本地安装Hadoop

# Hadoop使用2.7.5版,文档地址:http://hadoop.apache.org/docs/r2.7.5/ ,下面是参考安装文档:
# http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/SingleCluster.html

# 安装过程:本地安装只需要将 `hadoop-2.7.5.tar.gz` 安装包解压到指定目录即可
$ tar zxvf hadoop-2.7.5.tar.gz
$ mv hadoop-2.7.5 /opt/hadoop
# 这里删除全部 *.cmd 格式文件 ( 这些文件仅限Windows下使用,非必须删除,取决于个人习惯 )
$ rm -rf /opt/hadoop/*/*.cmd
# 配置环境变量 HADOOP_HOME
$ echo 'export HADOOP_HOME=/opt/hadoop' >> /etc/profile

# 下面是Hadoop的目录结构
/opt/hadoop/
├── bin
│   ├── container-executor
│   ├── hadoop
│   ├── hdfs
│   ├── mapred
│   ├── rcc
│   ├── test-container-executor
│   └── yarn
├── etc
│   └── hadoop
├── include
│   ├── hdfs.h
│   ├── Pipes.hh
│   ├── SerialUtils.hh
│   ├── StringUtils.hh
│   └── TemplateFactory.hh
├── lib
│   └── native
├── libexec
│   ├── hadoop-config.sh
│   ├── hdfs-config.sh
│   ├── httpfs-config.sh
│   ├── kms-config.sh
│   ├── mapred-config.sh
│   └── yarn-config.sh
├── LICENSE.txt
├── NOTICE.txt
├── README.txt
├── sbin
│   ├── distribute-exclude.sh
│   ├── hadoop-daemon.sh
│   ├── hadoop-daemons.sh
│   ├── hdfs-config.sh
│   ├── httpfs.sh
│   ├── kms.sh
│   ├── mr-jobhistory-daemon.sh
│   ├── refresh-namenodes.sh
│   ├── slaves.sh
│   ├── start-all.sh
│   ├── start-balancer.sh
│   ├── start-dfs.sh
│   ├── start-secure-dns.sh
│   ├── start-yarn.sh
│   ├── stop-all.sh
│   ├── stop-balancer.sh
│   ├── stop-dfs.sh
│   ├── stop-secure-dns.sh
│   ├── stop-yarn.sh
│   ├── yarn-daemon.sh
│   └── yarn-daemons.sh
└── share
    ├── doc
    └── hadoop

# 执行 `bin/hadoop` 命令,可以查看帮助文档
$ cd /opt/hadoop
$ bin/hadoop 
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

# 通常安装Hadoop后第一件事就是配置它的JAVA_HOME参数
$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_151

词频统计示例

# 准备一个本地文件,后面将对其进行词频统计
$ mkdir input
$ echo 'java golang ruby rust erlang java javascript lua rust java' > input/lang.txt

# 运行官方自带 `MapReduce` 程序,进行词频统计
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount input output
18/01/30 08:42:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/01/30 08:42:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/01/30 08:42:34 INFO input.FileInputFormat: Total input paths to process : 1
18/01/30 08:42:34 INFO mapreduce.JobSubmitter: number of splits:1
18/01/30 08:42:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local935371141_0001
18/01/30 08:42:34 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/01/30 08:42:34 INFO mapreduce.Job: Running job: job_local935371141_0001
18/01/30 08:42:34 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/01/30 08:42:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 08:42:34 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/01/30 08:42:34 INFO mapred.LocalJobRunner: Waiting for map tasks
18/01/30 08:42:34 INFO mapred.LocalJobRunner: Starting task: attempt_local935371141_0001_m_000000_0
18/01/30 08:42:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 08:42:34 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/01/30 08:42:34 INFO mapred.MapTask: Processing split: file:/opt/hadoop/input/lang.txt:0+59
18/01/30 08:42:34 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/01/30 08:42:34 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/01/30 08:42:34 INFO mapred.MapTask: soft limit at 83886080
18/01/30 08:42:34 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/01/30 08:42:34 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/01/30 08:42:34 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/01/30 08:42:34 INFO mapred.LocalJobRunner: 
18/01/30 08:42:34 INFO mapred.MapTask: Starting flush of map output
18/01/30 08:42:34 INFO mapred.MapTask: Spilling map output
18/01/30 08:42:34 INFO mapred.MapTask: bufstart = 0; bufend = 99; bufvoid = 104857600
18/01/30 08:42:34 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
18/01/30 08:42:34 INFO mapred.MapTask: Finished spill 0
18/01/30 08:42:34 INFO mapred.Task: Task:attempt_local935371141_0001_m_000000_0 is done. And is in the process of committing
18/01/30 08:42:35 INFO mapred.LocalJobRunner: map
18/01/30 08:42:35 INFO mapred.Task: Task 'attempt_local935371141_0001_m_000000_0' done.
18/01/30 08:42:35 INFO mapred.Task: Final Counters for attempt_local935371141_0001_m_000000_0: Counters: 18
        File System Counters
                FILE: Number of bytes read=296042
                FILE: Number of bytes written=585271
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=10
                Map output bytes=99
                Map output materialized bytes=92
                Input split bytes=96
                Combine input records=10
                Combine output records=7
                Spilled Records=7
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=16
                Total committed heap usage (bytes)=165744640
        File Input Format Counters 
                Bytes Read=59
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local935371141_0001_m_000000_0
18/01/30 08:42:35 INFO mapred.LocalJobRunner: map task executor complete.
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Starting task: attempt_local935371141_0001_r_000000_0
18/01/30 08:42:35 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 08:42:35 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/01/30 08:42:35 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@475ebd65
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/01/30 08:42:35 INFO reduce.EventFetcher: attempt_local935371141_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/01/30 08:42:35 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local935371141_0001_m_000000_0 decomp: 88 len: 92 to MEMORY
18/01/30 08:42:35 INFO reduce.InMemoryMapOutput: Read 88 bytes from map-output for attempt_local935371141_0001_m_000000_0
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 88, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->88
18/01/30 08:42:35 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/01/30 08:42:35 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
18/01/30 08:42:35 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
        at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
18/01/30 08:42:35 INFO mapred.Merger: Merging 1 sorted segments
18/01/30 08:42:35 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: Merged 1 segments, 88 bytes to disk to satisfy reduce memory limit
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: Merging 1 files, 92 bytes from disk
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/01/30 08:42:35 INFO mapred.Merger: Merging 1 sorted segments
18/01/30 08:42:35 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/01/30 08:42:35 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 08:42:35 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/01/30 08:42:35 INFO mapred.Task: Task:attempt_local935371141_0001_r_000000_0 is done. And is in the process of committing
18/01/30 08:42:35 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 08:42:35 INFO mapred.Task: Task attempt_local935371141_0001_r_000000_0 is allowed to commit now
18/01/30 08:42:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local935371141_0001_r_000000_0' to file:/opt/hadoop/output/_temporary/0/task_local935371141_0001_r_000000
18/01/30 08:42:35 INFO mapred.LocalJobRunner: reduce > reduce
18/01/30 08:42:35 INFO mapred.Task: Task 'attempt_local935371141_0001_r_000000_0' done.
18/01/30 08:42:35 INFO mapred.Task: Final Counters for attempt_local935371141_0001_r_000000_0: Counters: 24
        File System Counters
                FILE: Number of bytes read=296258
                FILE: Number of bytes written=585433
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Combine input records=0
                Combine output records=0
                Reduce input groups=7
                Reduce shuffle bytes=92
                Reduce input records=7
                Reduce output records=7
                Spilled Records=7
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=2
                Total committed heap usage (bytes)=165744640
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Output Format Counters 
                Bytes Written=70
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local935371141_0001_r_000000_0
18/01/30 08:42:35 INFO mapred.LocalJobRunner: reduce task executor complete.
18/01/30 08:42:35 INFO mapreduce.Job: Job job_local935371141_0001 running in uber mode : false
18/01/30 08:42:35 INFO mapreduce.Job:  map 100% reduce 100%
18/01/30 08:42:35 INFO mapreduce.Job: Job job_local935371141_0001 completed successfully
18/01/30 08:42:35 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=592300
                FILE: Number of bytes written=1170704
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=10
                Map output bytes=99
                Map output materialized bytes=92
                Input split bytes=96
                Combine input records=10
                Combine output records=7
                Reduce input groups=7
                Reduce shuffle bytes=92
                Reduce input records=7
                Reduce output records=7
                Spilled Records=14
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=18
                Total committed heap usage (bytes)=331489280
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=59
        File Output Format Counters 
                Bytes Written=70

# 查看统计结果
$ cat output/*
erlang  1
golang  1
java    3
javascript      1
lua     1
ruby    1
rust    2

至此,本地Hadoop就安装完成了,可以进行一些简单的测试,这里可以参考官方演示示例:http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
6个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Wesley13 Wesley13
3年前
Hadoop使用学习笔记(1)
Hadoop使用学习笔记1.Hadoop安装与基本概念Hadoop发行版本地址(https://www.oschina.net/action/GoToLink?urlhttp%3A%2F%2Fhadoop.apache.org%2Freleases.html)1.1环境配置需
Stella981 Stella981
3年前
Hadoop完整搭建过程(二):伪分布模式
1伪分布模式伪分布模式是运行在单个节点以及多个Java进程上的模式。相比起本地模式,需要进行更多配置文件的设置以及ssh、YARN相关设置。2Hadoop配置文件修改Hadoop安装目录下的三个配置文件:etc/hadoop/coresite.xmle
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Stella981 Stella981
3年前
Hadoop伪分布式环境搭建之Linux操作系统安装
Hadoop伪分布式环境搭建之Linux操作系统安装本篇文章是接上一篇《超详细hadoop虚拟机安装教程(附图文步骤)》,上一篇有人问怎么没写hadoop安装。在文章开头就已经说明了,hadoop安装会在后面写到,因为整个系列的文章涉及到每一步的截图,导致文章整体很长。会分别先对虚拟机的安装、Linux系统安装进行介绍,然后才会写到had
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Python进阶者 Python进阶者
1年前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这