000 10188nam a22002537a 4500
008 210112b2010 a|||f mb|| 00| 0 eng d
040 _aEG-CaNU
_cEG-CaNU
041 0 _aeng
_beng
082 _a610
100 0 _aAhmed Ali ElDin Mohamed Hassan
245 1 _aReplication in NileStore:
_b A Peer To Peer Global Storage System /
_cAhmed Ali ElDin Mohamed Hassan
260 _c2010
300 _a116 p.
_bill.
_c21 cm.
500 _3Supervisor: Nael Osman
502 _aThesis (M.A.)—Nile University, Egypt, 2010 .
504 _a"Includes bibliographical references"
505 0 _aContents: Chapter 1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 An overview of distributed storage systems . . . . . . . . . . . . . . . . . . . . . 4 2.1 Examples of distributed storage systems . . . . . . . . . . . . . . . . . . . 5 2.1.1 The Farsite project . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1.1 Feasability of a distributed storage system . . . . . . . . . 6 2.1.1.2 Network problem not storage problem . . . . . . . . . . . 6 2.1.2 The OceanStore project . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2.1 Thermodynamical approach . . . . . . . . . . . . . . . . . 8 2.1.2.2 Using Tapestry DHT . . . . . . . . . . . . . . . . . . . . . 9 2.1.2.3 Tackling versioning of mutable files . . . . . . . . . . . . . 11 2.1.3 The Tahoe project . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3.1 Tahoe security explained . . . . . . . . . . . . . . . . . . . 12 2.1.3.2 Technologies used and system architecture . . . . . . . . 14 2.1.4 The GEMS project . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.4.1 A Global Storage for Academic Simulations . . . . . . . . 16 2.1.4.2 Replica Management as a Linear Control Problem . . . . 17 2.2 The availability and durability problem . . . . . . . . . . . . . . . . . . . 18 2.2.1 Replica types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Replica placement strategies and algorithms . . . . . . . . . . . . . 19 2.2.2.1 Use Of AI hill-climbing algorithms . . . . . . . . . . . . . 20 2.2.2.2 Lazy Repair and Reintegration . . . . . . . . . . . . . . . 20 2.2.2.3 Using Age Estimation For Placement . . . . . . . . . . . 22 2.2.2.4 Proactive replication . . . . . . . . . . . . . . . . . . . . . 25 2.2.2.5 Gossiping is Practical . . . . . . . . . . . . . . . . . . . . 25 2.3 Security in distributed storage systems . . . . . . . . . . . . . . . . . . . . 26 2.3.1 The Sybil attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Incentives to use online storage systems . . . . . . . . . . . . . . . . . . . . 29 vii 2.4.1 Replica placement as a game . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Measurements and models of distributed systems . . . . . . . . . . . . . . 30 2.5.1 Analysis of deployed P2P systems . . . . . . . . . . . . . . . . . . . 30 2.5.1.1 KAD network analysis . . . . . . . . . . . . . . . . . . . . 30 2.5.1.2 Availability analysis in Overnet . . . . . . . . . . . . . . . 31 2.5.1.3 Comparing different systems . . . . . . . . . . . . . . . . . 32 2.5.2 Modeling of distributed storage systems . . . . . . . . . . . . . . . 33 2.5.2.1 Failure Correlation Analysis . . . . . . . . . . . . . . . . . 33 2.5.2.2 Effect of environment on churn . . . . . . . . . . . . . . . 36 2.5.2.3 Redundancy Vs Scaling Vs Dynamical System . . . . . . . 38 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 Validation of a Hybrid Replication Algorithm . . . . . . . . . . . . . . . . . . . 41 3.1 Why use an adaptive hybrid replication scheme? . . . . . . . . . . . . . . . 41 3.2 Modeling the distributed storage problem . . . . . . . . . . . . . . . . . . . 42 3.2.1 System assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.2 Peers’ model description . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.3 The system model . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3 Replication as an adaptive control problem . . . . . . . . . . . . . . . . . . 45 3.3.1 The estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4 Building an adaptive controller . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5 Choosing T: the time between two estimations of the churn . . . . . . . 46 3.6 Validating the algorithm using simulation . . . . . . . . . . . . . . . . . . . 47 3.7 Critique for this algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4 NileStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1 A framework for replica placement . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Nomenclature and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 NileStore system: Introduction and Design goals . . . . . . . . . . . . . . 53 4.3.1 System Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 NileStore: The protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 The Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.2 The peers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5 From Replica Placement to Task Assignment . . . . . . . . . . . . . . . . . 61 4.5.1 Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.2 Profit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.5.2.1 Family 1: Profit function I . . . . . . . . . . . . . . . . . . 64 4.5.2.2 FAmily 1: Profit function II . . . . . . . . . . . . . . . . . 66 4.5.2.3 Family 1:Profit function III . . . . . . . . . . . . . . . . . 67 4.5.2.4 Family 1:Profit function IV . . . . . . . . . . . . . . . . . 68 4.5.2.5 Family 2:Profit function V . . . . . . . . . . . . . . . . . . 69 4.5.3 Solving the problem . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1 Discrete Event Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 viii 5.1.1 Discrete Event Simulation Anatomy . . . . . . . . . . . . . . . . . . 72 5.1.1.1 Workload Modeling . . . . . . . . . . . . . . . . . . . . . . 72 5.1.1.2 Service Disciplines . . . . . . . . . . . . . . . . . . . . . . 72 5.1.1.3 Accuracy Vs Cost . . . . . . . . . . . . . . . . . . . . . . 73 5.1.2 Network Simulation: Practical design . . . . . . . . . . . . . . . . . 73 5.2 Simulating NileStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 NileStore’s Simulator Anatomy . . . . . . . . . . . . . . . . . . . . 74 5.2.2 NileStore’s Workload modeling . . . . . . . . . . . . . . . . . . . . 74 5.2.2.1 Peers’ Upload and Download Rates . . . . . . . . . . . . . 75 5.2.2.2 Storage Model . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.2.3 Leave/Join Rates . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.3 NileStore’s Simulator Practical Design . . . . . . . . . . . . . . . . 77 5.2.3.1 Classes for Managing the Queue . . . . . . . . . . . . . . 77 5.2.3.2 Networking Classes . . . . . . . . . . . . . . . . . . . . . . 78 5.2.3.3 NileStore’s Classes . . . . . . . . . . . . . . . . . . . . . . 79 5.2.3.4 The Main Loop . . . . . . . . . . . . . . . . . . . . . . . . 80 6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1 NileStore’s Performance with time . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.1 Experiment conditions . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.2 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.1.2.1 Reason for the poor performance of profit function V . . . 83 6.1.2.2 Family 1 profit functions are similar . . . . . . . . . . . . 83 6.1.3 Changing the experiment conditions . . . . . . . . . . . . . . . . . 87 6.2 Effect of changing the churn rate . . . . . . . . . . . . . . . . . . . . . . . 89 6.3 Effect of changing τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.4 Effect of changing r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.5 Time taken for replica placement . . . . . . . . . . . . . . . . . . . . . . . 95 7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 References . . . . . . . .
520 3 _aAbstract: Replica placement algorithms in distributed storage and backup systems have a direct effect on the required time for replication, the availability and durability of files and the latency of file retrieval. In this work we introduce Nilestore, a P2P-assisted cloud storage protocol that transforms the replica placement problem into a linear task assignment optimization problem. The task assignment problem is solved using an optimization engine. The assignment takes into account each peer’s upload bandwidth, download bandwidth, contributed free storage and the size of data that it needs to replicate. The task assignment is solved using a suboptimal optimization algorithm for speed. This approach to solving the replica placement problem has not been taken before for online storage systems. We show our simulation results for the algorithms under realistic network conditions that were obtained from many studies conducted on P2P networks and distributed storage. Our results show that using NileStore can offload the storage servers of the cloud by more than 75% using the available replicas in the system.
546 _aText in English, abstracts in English.
650 4 _aInformatics-IFM
_9266
655 7 _2NULIB
_aDissertation, Academic
_9187
690 _aInformatics-IFM
_9266
942 _2ddc
_cTH
999 _c8823
_d8823