-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Home
Tachyon is a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that are frequently read.
On April 10th, 2013, we put out a soft release of Tachyon 0.2.0 Alpha. The new version is more stable and also features performance improvements. It is, however, a soft release and we are working full time towards a full hard release that contains a stable version of the features that we expect will be core to Tachyon. Stay tuned.
-
Java-like File API: Tachyon's native API is similar to JAVA file, providing InputStream, OutputStream interface, and efficient support for memory mapped I/O. Using this API would get best performance from Tachyon
-
Compatibility: Tachyon implements Hadoop FileSystem interface, therefore, Hadoop MapReduce and Spark can run with Tachyon without modification. However, close integration is required to fully take advantage of Tachyon and we are working towards that. End-to-end latency speedup depends on the workload and the framework, since various frameworks have different execution overhead.
-
Native support for raw tables: Table data with hundreds or more columns is common in data warehouses. Tachyon provides native support for multi-columned data. The user can choose to only put hot columns in memory.
-
Pluggable underlayer file system: Tachyon checkpoints in memory data into the underlayer file system. Tachyon has a generic interface to make plugging an underlayer file system easy. It currently supports HDFS, and single node local file system.
-
Web UI: Users can browse the file system easily through web UI. Under debug mode, administrators can view detailed information of each file, including locations, checkpoint path, etc.
-
Command line interaction: Users can use ./bin/tachyon tfs to interact with Tachyon, e.g. copy data in and out of the file system.
Running Tachyon Locally: Get Tachyon up and running on a single node for a quick spin in ~ 5 mins.
Running Tachyon on a Cluster: Get Tachyon up and running on your own cluster.
Configuration-Settings: How to configure Tachyon.
Command-Line-Interface: Interact with Tachyon through command line.
Tachyon Developer Preview presentation at Spark User Meetup (May, 2013)
tachyon-0.2.1-bin.tar.gz — Tachyon 0.2.0 binary with Hadoop1/CDH3
tachyon-0.2.0-bin.tar.gz — Tachyon 0.2.0 binary with Hadoop1/CDH3
Tachyon Release 0.2.1 - Apr 26, 2013
Tachyon Release 0.2.0 - Apr 10, 2013
Tachyon Release 0.1.0 - Dec 21, 2012
Startup Tasks for New Contributors: For people who are interested in contributing.
Building Tachyon Master Branch ( )
You are welcome to join our mailing list to discuss questions and make suggestions. We use JIRA to track development / issues. If you are interested in trying out Tachyon in your cluster, please contact Haoyuan.
Tachyon is developed in the UC Berkeley AMP Lab. The research and development is supported in part by NSF CISE Expeditions award CCF-1139158 and DARPA XData Award FA8750-12-2-0331, and gifts from Amazon Web Services, Google, SAP, Blue Goji, Cisco, Clearstory Data, Cloudera, Ericsson, Facebook, General Electric, Hortonworks, Huawei, Intel, Microsoft, NetApp, Oracle, Quanta, Samsung, Splunk, VMware and Yahoo!.
Berkeley Data Analysis Stack (BDAS) from AMPLab at Berkeley