Virtualization-aware Hadoop

Recently VMWare started a new open source project called Serengeti aiming to improve the Hadoop usage and performance in virtual environments. It is no surprise that VMWare is going in this direction, as they announced Spring for Hadoop just a few months ago. This is a clear sign that they take Hadoop very seriously and push it further to become a standard enterprise platform that will serve them on top of vSphere cloud platform.

“Hadoop must become friendly with the technologies and practices of enterprise IT if it is to become a first-class citizen within enterprise IT infrastructure. The resource-intensive nature of large Big Data clusters make virtualization an important piece that Hadoop must accommodate,” said Tony Baer, Principal Analyst at OVUM. “VMware’s involvement with the Apache Hadoop project and its new Serengeti Apache project are critical moves that could provide enterprises the flexibility that they will need when it comes to prototyping and deploying Hadoop.” [source]

Another company Atlantis, just announced new solution called Atlantis ILIO FlexCloud to boost the virtual performance of data-intensive application such as Hadoop by caching the IO requests or even the entire application in RAM. The main features of FlexCloud are:

  • Application Characterization – Identifies and maps storage IO traffic characteristics of the application and responds intelligently based on patterns.
  • Inline IO Deduplication – Eliminates duplicate Write and Read IO traffic to reduce the amount of storage traffic.
  • IO Processing – Processes IO requests from the hypervisor so that processing occurs in memory instead of being serviced by storage resulting in improved overall performance.
  • Scatter/Gather Coalescing – Transforms smaller randomized IO traffic into easier to consume larger sequential blocks to further boost network and storage efficiency.
  • Fast Clone – Creates new virtual machine clones on demand using Atlantis ILIO without copying data from storage or introducing performance overhead.

Marvell is also offering similar solutions that improve the virtualized IO-intensive applications. DragonFly is such virtual storage hyper-accelerator assembled as a hardware system-on-chips device that  targets to improve scalability and performance of NAS/SAN arrays and enterprise servers.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.