问题描述
我正在考虑使用 hadoop 在我现有的 windows 2003 服务器(大约 10 台四核机器和 16gb 内存)上处理大型文本文件
I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM)
问题是:
有没有关于如何在windows上配置hadoop集群的好教程?
Is there any good tutorial on how to configure an hadoop cluster on windows?
有什么要求?java + cygwin + sshd ?还有什么?
What are the requirements? java + cygwin + sshd ? Anything else?
HDFS,在 Windows 上玩得好吗?
HDFS, does it play nice on windows?
我想在流模式下使用 hadoop.在 c# 中开发我自己的映射器/reducers 有什么建议、工具或技巧吗?
I'd like to use hadoop in streaming mode. Any advice, tool or trick to develop my own mapper / reducers in c#?
您使用什么来提交和监控作业?
What do you use for submitting and monitoring the jobs?
谢谢
推荐答案
虽然不是您可能想听到的答案,但我强烈建议您将机器重新用作 Linux 服务器,并在那里运行 Hadoop.您将受益于在该平台上执行的教程、经验和测试,并花时间解决业务问题而不是运营问题.
While not the answer you may want to hear, I would highly recommend repurposing the machines as, say, Linux servers, and running Hadoop there. You will benefit from tutorials and experience and testing performed on that platform, and spend your time solving business problems rather than operational issues.
但是,您仍然可以使用 C# 编写作业.由于 Hadoop 支持流"实现,因此您可以使用任何语言编写作业.使用 Mono 框架,您应该能够获取在 Windows 平台上编写的几乎任何 .NET 代码,并在 Linux 上运行相同的二进制文件.
However, you can still write your jobs in C#. Since Hadoop supports the "streaming" implementation, you can write your jobs in any language. With the Mono framework, you should be able to take pretty much any .NET code written on the Windows platform and just run the same binary on Linux.
您还可以相当轻松地从 Windows 访问 HDFS —— 虽然我不建议在 Windows 上运行 Hadoop 服务,但您当然可以从 Windows 平台运行 DFS 客户端来将文件复制进和复制出分布式文件系统.
You can also access HDFS from Windows fairly easily -- while I don't recommend running the Hadoop services on Windows, you can certainly run the DFS client from the Windows platform to copy files in and out of the distributed file system.
对于提交和监控作业,我认为您主要是靠自己...我认为目前还没有为 Hadoop 作业管理开发的任何好的通用系统.
For submitting and monitoring jobs, I think that you're mainly on your own... I don't think that there are any good general-purpose systems developed for Hadoop job management yet.
这篇关于Windows服务器上的Hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!