Distributed wikis: a survey

‘Distributed wiki’ is a generic term covering various systems, including ‘peer-to-peer wiki’, ‘mobile wiki’, ‘ofﬂine wiki’, ‘federated wiki’ and others. Distributed wikis distribute their pages among the sites of autonomous participants to address various motivations, including high availability of data, new collaboration models and different viewpoints of subjects. Although existing systems share some common basic concepts, it is often difﬁcult to understand the speciﬁcity of each one, the underlying complexities or the best context in which to use it. In this paper, we deﬁne, classify and characterize distributed wikis. We identify three classes of distributed wiki systems, each using a different collaboration model and distribution scheme for its pages: highly available wikis, decentralized social wikis and federated wikis. We classify existing distributed wikis according to these classes. We detail their underlying complexities and social and technical motivations. We also highlight some directions for research and opportunities for new systems with original social and technical motivations. Copyright © 2014 John Wiley & Sons, Ltd.


INTRODUCTION
Wikis are typical Web 2.0 applications [1]; they demonstrate how a system can turn strangers into collaborators. In these groupware [2] systems, a typical task for a group of users is to write and maintain wiki pages, which are Web pages-hypertext documents-that can be edited directly in a Web browser. Traditional wikis are centralized; the wiki engine and collection of pages are hosted by a single organization, which users access remotely. The best known wiki is certainly Wikipedia, one of the most accessed sites on the Web. Despite their success, wiki systems still suffer from many issues, such as the following: (i) Scalability and cost: As the number of users and contributions increase, so do the storage and bandwidth requirements. Every year, the Wikimedia Foundation (which hosts Wikipedia) needs to raise millions of dollars to finance its huge infrastructure costs. (ii) Centralized control/single point of failure: A centralized wiki is controlled by a single organization. If the organization disappears-for example, if the Wikimedia Foundation runs out of funds-all the knowledge contributed to the wiki may disappear as well. (iii) No offline access: If a user's internet connection or the central wiki system are temporarily unavailable, the user cannot work.
The different categories of distributed wikis can be obtained by applying different approaches to the distribution and replication of wiki pages across the participants, ranging from the full replication of the wiki content to the partition of the collection of pages among the sites. The choice of distribution scheme, change propagation and conflict resolution strategies will significantly affect the performance of the system and its collaborative writing model [16].
In Section 2, we define a formal model and some basic operations for traditional wikis. Then, we show how this basic model is extended in the different categories of distributed wikis identified earlier. For the description of the three categories, we follow the same presentation plan. We start with an overview of the main characteristics of the category, then we define the data model, specific operations for the category and a collaboration scenario in the category. Finally, we describe some representative systems in the category. Section 3 presents highly available wikis. Section 4 presents decentralized social wikis. Section 5 presents federated wikis. The last section concludes the paper and points to future work.

TRADITIONAL WIKIS
According to Ward Cunningham-the creator of the original Wiki-a wiki is 'the simplest online database that could possibly work' [1]. More specifically, it is a collection of Web pages hosted on a server running a wiki engine, which gives them the following characteristics.
(1) Page title uniqueness: Each page has a unique title within the wiki; (2) Read-write: A wiki page can be edited directly through a Web browser. The textual content of a wiki is usually referred to as wikitext and is typically written using a simple mark-up language rendered by a wiki engine; (3) Wikilinks: Wikis include internal hyperlinks; these are hyperlinks to other pages of the same wiki, called wikilinks, which are part of the wikitext. The target page of a wikilink can be specified simply by its title, as the title uniquely identifies a page.
In this section, we formalize the concept of a traditional wiki. Later, we will show how this formalization is extended in the different categories of distributed wikis.

Data model
We model a wiki as a graph of wiki pages connected by wikilinks.
Wiki A wiki is a tuple hid; Gi, where id is the identifier of the wiki (this is usually the domain name of the wiki server); G D hP; Ei is the graph of the pages. The nodes P are the pages of the wiki, and the edges E Â P P are the wikilinks connecting them. The wiki identifier is a handle whereby users can locate and access the wiki. Wiki page A wiki page P is a pair .L; content/ where L is the title of the page and content is the wikitext of the page, that is, a sequence of text, wikilinks (defined subsequently) and embedded media content. Wikilink A wikilink with label L is an annotated instance of L appearing within the content of a wiki page. We use the notation OEL. Intuitively, the annotation means that L is no longer simply text but a symbol recognized by the system. Edges Let W be the wiki hid; Gi where G D hP; Ei. The edges E of the wiki graph are defined as follows: let P 1 D .L 1 ; content 1 / and P 2 D .L 2 ; content 2 / two pages of P . Then, .P 1 ; P 2 / 2 E iff content 1 contains the wikilink OEL 2 .
Notice that while the annotation may be present in the wikitext, if the target page does not exist, then, formally, the edge does not exist. The edges in the graph represent the possibility of browsing from the page containing the wikilink to its target page. If the target page does not exist, then this navigation is not possible; typically, the link would then redirect to a page creation form. This last functionality is not expressed in our model. content associated to this title. If there is no page with the given title, then the user may be redirected to an error page or to a form to create the page (use case: create page). The page content is then displayed to the user.

2.2.2.
Use case: create page. Summary: the user creates a new page from a blank editing form. This use case may be initiated through a special link in the user interface or by following a wikilink to a non-existent page. A blank editing form is displayed to the user; the page title may already be filled in. The user writes some content, then may decide to either submit the form and save this content or cancel. If the user chooses to submit the new page, the operation save.L; content/ is called. An error may occur if two users concurrently try to create a page with the same title; refer to Section 2.4.

2.2.3.
Use case: delete page. Summary: the user deletes a wiki page. When a user is viewing a page, the user interface may provide a link for users to delete the page. When the user selects this link, the operation delete.L/ is invoked on the server, where L is the page title. The result is that the wiki no longer has a page with the given title. In modern wikis, the user interface typically does not allow end users to delete pages immediately. Instead, they must initiate a deletion process in which approval of other users is requested, and after a delay, an administrator (a user with special privileges) deletes the page.

2.2.4.
Use case: edit page. Summary: the user modifies an existing wiki page. The user selects a link to modify an existing wiki page. The page is then displayed in 'write mode', that is, as an editing form, containing the page's current wikitext. The user modifies the page content by inserting or deleting text, wikilinks and media content. Typically, the user cannot modify the page title. The user may then submit the changes, which are sent to the server. This invokes the operation save.L; content/.
In the case where multiple users edit the page at the same time, there is a risk that some of the changes may be lost. This problem and several possible solutions are described in Section 2.4.

Operations
We now formalize the semantics of each atomic operation handled by the server. We express the operations lookup, delete and save as applied to a wiki W D hid; Gi where G D hP; Ei; these operations are specified by Algorithm 1. In these definitions, we give the minimal definition of each operation, where concurrent modifications are ignored. These definitions indicate the basic operation semantics and must be extended to handle concurrent modifications (Section 2.4). Complexity Executing these operations involves a request sent from the client to the server, local processing on the server and a response to the client. For such simple processing, the main latency factor is the HTTP request/response. For comparison with the distributed setting, and in particular different strategies for the save operation, we therefore measure the message complexity of operations, that is, the number of messages exchanged between different networked components (here, clients and servers) until the final state is reached. For all of these basic operations, exactly two messages are exchanged.

Collaboration in a traditional wiki: the concurrent editing problem
The problem of concurrent editing occurs when different users simultaneously open the same page in editing mode, then make different changes. Without any safeguards to detect this case, the last user to save her changes will simply override the changes made by the others, without being aware that she has done so. Among the solutions to this problem, there are two main strategies [16]: the sequential writing strategy, in which pages are locked while they are edited, and the parallel writing strategy, in which the pages are not locked but concurrent editing is detected at save time. Within the latter strategy, we further distinguish the 'first arrived wins' approach-where the first save is applied then all subsequent saves must go through a manual conflict resolution (merge) step-from the solutions that use automatic merge algorithms. However, automatic merge algorithms may produce undesirable results in the meaning of the text; therefore, many algorithms aim to detect the more problematic cases and revert to manual merging. This is the case of Wikipedia.
Both strategies require extending the data model of a page. Additional information is necessary to detect editing conflicts: either a flag must be added to indicate the page being locked, or a time stamp must be used to identify out-of-date versions.
The cost of the edit operation, as measured by the message complexity, differs depending on the chosen strategy. In the case of manual merges, the process will require many back-and-forth messages, as each user must reload the page and resubmit the resolved conflicts. If m users concurrently edit a page and they all save their changes, then the total number of messages exchanged between the clients and the server is m 2 C 3m 2 [17].
Collaboration scenario A scenario in which three contributors edit a Wikipedia page is illustrated in Figure 3. In this scenario, t represents a time stamp variable associated with a page.
In this scenario, three contributors, user 1 , user 2 and user 3 , concurrently edit the page (p 1 , content). For simplicity, we assume that the content is a list of characters; content D habci and each character has a position, starting from position 0. We assume that users will make simple changes, such as inserting or deleting a character.
First, a lookup operation is called and a copy of the content of the page is displayed in the browser of each user. The users then modify the content of the page: user 1 inserts the character X between a and b (positions 0 and 1): insert.X; habci; 0; 1/; user 2 deletes b: delete.habci/; 1/; and user 3 inserts Y between b and c: insert.Y; habci; 1; 2/. First, user 2 saves her changes, no conflict is detected, and a new version of the page is submitted to the server with a new time stamp. Later, user 1 and user 3 want to save their changes; the system first handles user 1 's request, and user 3 's request is aborted. A conflict is detected with the current version on the server published by user 2 , so user 1 has to solve the conflict manually. In this case, user 1 keeps user 2 's changes (deleting b), and the content of p 1 on the server is haXci. user 3 then re-tries to save, and as a conflict is detected, she must solve it manually. She dismisses the other users' changes and imposes her own version. The final content of p 1 is habYci. By delegating conflict resolution task to users, Wikipedia cannot ensure that all users' contributions will be included. The last contributor decides on the final content of the page ‡ . Collaboration in Wikipedia produces content that is validated by at least the last writer, with the risk of producing lost updates.

Distributed wikis
Distributed wikis extend this basic model to a distributed setting in which several physical sites, interconnected but managed by autonomous participants, host interoperable wiki engines and collections of wiki pages. In this setting, the traditional wiki model must be adapted. The infrastructure can be modeled as a graph N D hS; C i, representing the participants S D ¹S 1 ; S 2 ; : : : ; S n º and their physical connections C S i S j , where S i ; S j 2 S .
Beyond this common defining characteristic, the different distributed wiki systems are driven by different motivations and therefore make different choices regarding the distribution of resources between the participants and the functionality that they offer. The design of a distributed wiki can be characterized by the following criteria: the topology of the graph of participants; the physical distribution and replication of pages; and the change propagation and conflict resolution strategies.
In the following sections, we will represent these choices using the formal concepts introduced so far. We show how the data model and operations of traditional wikis are extended for distributed wikis: highly available wikis, decentralized social wikis and federated wikis.
The change propagation and conflict resolution strategies strongly define the collaboration model supported by a distributed wiki. We will illustrate the different collaboration models supported by these distributed wikis through extensions of the scenario given for the Wikipedia example ( Figure 3).

HIGHLY AVAILABLE WIKIS
Highly available wikis use a peer-to-peer (P2P) infrastructure to provide scalability, by sharing the storage and workload and/or fault tolerance, by replicating the content in different locations. In these systems, the distribution and replication of the content is 'transparent' to the end users and maintained in the background, beyond the users' direct control. We distinguish between structured and unstructured highly available wikis, according to the underlying P2P overlay network architecture.

Highly available structured wikis
Highly available structured wikis are designed to share the load (and cost) of the wiki across a distributed infrastructure and thus, provide scalability without requiring a single organization to bear the cost of the infrastructure. This distributed infrastructure is usually a distributed hash table (DHT) [18,19].

Definition 1
A highly available structured wiki is a tuple hG; N; M i, where G D hP; Ei, is a (traditional) wiki graph. N D hS; C i is a graph representing the structured overlay network. The nodes S D ¹S 1 ; : : : ; S n º are participants; each participant is uniquely identified, and the edges C S i S j represent their physical interconnections. These connections C , and therefore the graph topology, are usually dictated by a protocol ensuring connectedness of the graph. M W P ! S is a function that maps the wiki pages to the participants; this function is also implemented as part of the DHT protocol. Therefore, the graph G D hP; Ei is distributed among the autonomous participants ¹S 1 ; : : : ; S n º; that is, each page p 2 P is mapped to a host S i 2 S . Figure 4b shows an example of a structured P2P wiki. The wiki graph of the traditional wiki in Figure 2 is partitioned among the wiki servers, and the requests associated to the different use cases are transparently routed to the participant responsible for storing the relevant pages.
Highly available structured wikis may also include some degree of replication to ensure fault tolerance. In this case, the number of replicas is limited and determined by the system. The replicated pages are stored as backup and not directly accessed by the users. Optimistic replication [20] techniques can be used to manage the consistency of replicas. In optimistic replication, modifications are applied immediately to the replica where they were generated, then are propagated to other replicas to be integrated; conflict resolution is needed in some situations.

Use cases and operations.
Highly available wikis aim to reproduce the use cases of a traditional wiki; they can therefore be considered identical. The traditional operations lookup, delete and save also have the same semantics with respect to the system model (specified in Algorithm 1). Typically, the page title is used as a key for the data being stored, and the operations are implemented using the classical DHT functions [19] get, remove and put, respectively, as shown in Algorithm 2. The functionality handled by the DHT is mainly key-based routing, which consists of routing the operation requests to the peers responsible for storing the pages being retrieved, deleted or modified. Formally, requests about a page P i must be routed along the edges of the graph N to the node S i D M.P /.

Additional use cases
In addition to the traditional wiki use cases, peers can also join and leave the underlying network. These use cases are directly implemented by the underlying DHT's joi n and leave functions. These operations normally involve significant overhead, as the DHT routing tables must be updated, and the pages in the network must be redistributed among the peers to ensure load balancing and availability. In terms of our model, this involves modifying the mapping function M .
Complexity The message complexity of the operations in a structured highly available wiki is higher than those of a traditional wiki, because of the communication between the different DHT nodes. In a system with n participants, (n D jS j) as defined in 1, where each wiki page is replicated k times (k < n), the complexities of the operations have the complexities of the underlying DHT functions, as follows: lookup: Complexity of get, that is, the routing complexity, typically [21] O.log.n//. remove, save: Routing complexity, plus the cost of propagating the changes to the k page replicas; if these are stored in neighboring nodes of the main data location, this can be performed with O.k/ messages. The complexity of a remove or a basic save operation is therefore O.log.n/ C k/. join, leave: Depending on the DHT protocol, the message complexity of updating routing tables can range from O.1/ to O.log.n//. The cost of redistributing pages depends on the size of the wiki itself. If the pages are uniformly distributed, we can estimate that the number of pages to be redistributed is O.k:jPj=n/, where k is the degree of replication and jP j the size of the wiki.

The concurrent editing problem.
The different strategies for handling the concurrent editing problem (sequential writing and parallel writing with automatic or manual merge, discussed in Section 2.4) are also applicable to highly available structured wiki systems. For each wiki page P i , there is a single node S i D M.P i / responsible for managing this page. S i can therefore apply the same strategies that a centralized wiki server would.

Highly available structured wiki systems.
Piki [22], DistriWiki [7], DTWiki [23] and UniWiki [6] are examples of highly available structured wiki systems. Piki uses key-based routing [19] and a DHT-based version control system [24] as a storage backend. Each page is assigned to one primary owner and replicated by the primary owner on a number of peers. Piki allows concurrent modifications, which are handled by the primary owner using the 'first arrival wins' rule (with manual merges). Unlike most wiki systems, which are accessed through Web browsers, Piki is a stand-alone application. DistriWiki [7] uses the JXTA [25] protocol to build its P2P network. There is no automatic replication; each node stores a set of wiki pages, and users are expected to search for the latest version of each page in order to edit it. The problem of concurrent modifications is not addressed in DistriWiki.
DTWiki [23] is a wiki system that addresses the problem of operating a wiki system in an intermittent environment. It is built on a delay-tolerant network (DTN) [26] and the TierStore [27] distributed file system. DTN manages communications links as they go up and down, and TierStore provides a transparent synchronization of file system contents, partial replication of shared data and detection and resolution of concurrent update conflicts on a single file. TierStore manages concurrent update to file replicas by appending a suffix to each remotely conflicting replica. DTWiki detects the presence of the conflict and sends the user a message stating that a conflict has occurred and displays a merge of the contents of the conflicting revisions; the user can choose the final content of the file. Conflict resolution in DTWiki is similar to those in traditional wikis, as explained in Section 2.4.
UniWiki [6] consists of multiple wiki front-ends that fetch and store their data over a DHT. It combines optimistic replication [20] techniques and DHT techniques. The originality of UniWiki is that it performs automatic merges of concurrent edits by running the WOOT [

Highly available unstructured wikis
Most wikis in this category rely on a self-organized unstructured P2P network. The collection of wiki pages is fully replicated across all the participants' sites. The total replication scheme requires that all peers have the same storage capability. The users are connected to one peer and interact with the local page replicas; wikilinks are resolved locally; consequently, a user browsing pages hosted at one physical site cannot follow a wikilink to a page hosted elsewhere. Users are able to work even when their node is disconnected from the rest of the network. When the network is connected, changes are automatically propagated to the other nodes.
3.2.1. System model. An unstructured highly available wiki is conceptually similar to a set of n interconnected and automatically synchronized wikis.

Definition 2
A highly available unstructured wiki is a tuple, h G ; N i where G D ¹G 1 ; G 2 ; : : : ; G n º is a set of wiki graphs; N D hS; C i is a graph representing an unstructured and self-organized overlay network; the nodes S D ¹S 1 ; : : : ; S n º are participants, each participant is uniquely identified and the edges C S S represent their physical interconnections. Participant S i hosts G i ; The wiki graphs G i are eventually consistent: this notion is defined subsequently.
The wiki appears centralized because the participant directly interacts only with the local system, which is the wiki hS i ; G i i. Propagation of updates happens behind the scenes and to the user is indistinguishable from operations that might happen concurrently on the local wiki if it was an isolated system. Eventually, the set of local page replicas at each node should converge to be identical. As we will see further, ensuring that this happens is difficult. In order to define eventual consistency, we must consider a highly available unstructured wiki to be a system that evolves over time as a result of the user's actions. We note G .t / i the state of a graph G i at time t .
Eventual consistency Let W be a highly available unstructured wiki, W D h G ; N i as defined in 2. We consider a finite sequence of (arbitrary) user actions, occurring at times t 1 ; t 2 ; : : : t k . The wiki graphs ¹G i º i 2OE1:::n are eventually consistent if at some time later than the last action A t k , all of the graphs G i are identical. Formally, A highly available unstructured wiki follows the optimistic replication technique [20], with the hypothesis of eventual delivery of operations; this is generally achieved by using the gossiping algorithm [29]. An anti-entropy algorithm [30] supports intermittent connections. Figure 4a shows an example of an unstructured wiki. In this figure, the graph of the traditional wiki from Figure 2 is replicated on each wiki server.

Use cases and operations.
Each user interacts with a single local site S i hosting a wiki W i D hS i ; G i i. Let n D jS j, the number of peers in the system. The local wiki supports the traditional use cases view page, create page, delete page and edit page; the basic definition of these use cases, with respect to the local wiki W i , is as in the traditional context (cf. Section 2.2). The view page use case is implemented by the lookup operation on the local wiki W i . However, the propagation of changes implies that the modifying use cases (create/edit/delete page) also affect the other wikis in the network; the expectation is that every change initiated on any node of the wiki is eventually applied to all of the other nodes as well. For this purpose, the relevant use cases are extended (the base being the traditional operation, as described in Section 2.2, applied to the local wiki W i ) as follows: The create page and edit page use cases are extended so that once the user submits her changes, the save operation is called on every wiki of G . The delete page use case is extended so that the delete operation is called on every wiki of G .
The condition of eventual consistency means that after all of the save and delete operations have been applied, the wikis are identical. In order to ensure this condition, the save operation must rely on an automatic merge algorithm, with adequate consistency guarantees. This issue is discussed in more detail subsequently.
Finally, as in highly available structured wikis, peers can join and leave the network. For this class of systems, we will distinguish a node's initial join and final leave from a temporary disconnect, followed by a reconnect. For an initial join, the new peer must copy the local wiki of another (arbitrary) peer, as described in Algorithm 3. During temporary disconnections, pending changes are simply stored so that they can be propagated once the connections are re-established. This simply results in a delayed application of the save and delete operations generated by remote peers. The use cases leave and disconnect do not require any particular processing.
Complexity Aside from lookup operations, which only involve one peer, the operations in an unstructured wiki are costly, because of full replication. Each modifying operation must be broadcast to the full network, which requires a minimum of O.n/ messages. The join operation is also costly, as the full contents of the wiki must be copied to the new peer. Here, the number of messages involved is not particularly relevant; it is more appropriate to consider the number of bytes being transferred as, because of the use of automatic merges, every page must be copied with its entire edit history. Entire edit history is required once when joining the network; reconnecting after disconnection just requires some anti-entropy rounds.

Concurrent editing and consistency.
We now expand on the save operation and its relation to the problem of concurrent editing and consistency. Unlike structured wikis, in unstructured wikis, the replicas of a page can be modified concurrently and independently on different nodes.
In a structured wiki, there is one 'master' copy of each page, and a small number of 'slave' replicas, which mirror the state of the 'master' copy, with some latency. In an unstructured wiki, the different replicas may be in an inconsistent state not only because of latency, but due to modifications initiated on different replicas by different users. In order to maintain consistency and avoid lost updates, the sequential writing strategy is not applicable, as the network may be temporarily disconnected, which would prevent page locks from being propagated. With the parallel writing strategy, users may concurrently edit pages and save their changes on different nodes, temporarily disconnected from one another. This implies that conflicts may be detected only when the peers reconnect, possibly long after the editing has occurred. The 'first arrived wins' rule would therefore be very impractical, if not downright impossible.

2763
The only viable solution is to automatically merge conflicting changes. However, synchronization algorithms produce content that is not validated by human users, which implies that the text could be nonsensical. An interesting solution to this problem could be to flag the content produced by algorithms, thus producing concurrency awareness [31].
The save operation with an automatic merge can therefore be described as in Algorithm 4. Concurrent modifications can be merged by different synchronization algorithms, such as WOOT [28] or Logoot [32].
Synchronization algorithms implement different consistency models based on the history of changes [17]. This history can be computed by classical textual differences algorithms ('diff') between the old content and the new content of the page. Causality [33] ensures that all sites have the same causal history of changes but does not ensure that all copies are identical, whereas CCI consistency [34] enforces (C)ausality, (C)onvergence and (I)ntention preservation. These notions are defined as follows.
(i) Causality: All operations are ordered by a precedence relation, in the sense of the Lamport's happened-before relation [33], and they will be executed in the same order on every site. (ii) Convergence: The system converges if all replicas are identical when the system is idle (eventual consistency). (iii) Intention preservation: The intention of an operation is the effects observed on the state when the operation was generated. The effects of executing an operation at all sites preserve the intention of the operation.

Highly available unstructured wiki systems. RepliWiki [35]
, Wooki [5], XWiki Concerto [36] and Swooki [37] are examples of wikis built on unstructured networks of wiki servers. Repli-Wiki [35] aims to provide a decentralized, multi-master implementation of Wikipedia by replicating its content. It uses the summary hash history [38], in which each site maintains a tamper-evident update history that is used to determine the exact set of updates to be transferred during the automatic synchronization between peers. The synchronization algorithm of RepliWiki ensures causality and convergence.
The aim of Wooki [5] and XWiki Concerto [36] is to support offline work; they also replicate wiki pages on all servers. A modification on a peer is immediately applied to its local copy, then it is propagated among peers using a probabilistic epidemic broadcast [29]. An anti-entropy algorithm [30] is used to recover missing updates for sites that were offline or crashed. Concurrent changes are merged using the WOOT algorithm. This algorithm ensures CCI consistency for connected peers. Swooki [37] is a P2P semantic wiki [39]. It extends the synchronization algorithm of Wooki to support semantic data.

Collaboration in highly available wikis
Here, we revisit the Wikipedia collaboration scenario from Section 2.4.
As noted previously, highly available wikis are designed to provide their users with the same functionality as a centralized wiki, with additional availability guarantees. However, the use of manual conflict resolution, as in the Wikipedia scenario, is very impractical in a distributed setting, particularly when network disconnections may occur. A viable scenario for a highly available wiki is one in which changes are merged automatically, as shown in Figure 5.
In this scenario, the concurrent edits made by the three users, user 1 , user 2 and user 3 , are merged using a synchronization algorithm such as WOOT. As a result, they are all able to save their changes without errors, and they eventually see the result of the merge, which is the wikitext aXYc.
Highly available wikis provide traditional wiki functionality with additional performance guarantees. Structured wikis provide fault tolerance and allow the cost of managing a large wiki to be shared between different organizations, whereas unstructured wikis allow users to work even when the network is disconnected. However, as changes are automatically propagated and integrated, users have limited control over the collaboration process. Users could be interested in sharing their changes only with trusted peers, or modify a set of pages and publish the full changeset in one transaction; this would be particularly useful in semantic wikis, where dependencies exist between pages. Transactional changes, trust and real autonomy for participants are the main motivations of decentralized social wikis.

DECENTRALIZED SOCIAL WIKIS
Decentralized social wikis aim to support a social collaboration network and adapt many ideas from decentralized version control systems (DVCS) used for software development. They promote the multi-synchronous collaboration model [14], in which multiple streams of activity proceed in parallel. The main structure of a decentralized social wiki is similar to that of a replicated wiki; however, the unstructured overlay network is a social collaboration network; its edges represent relationships between users who have explicitly chosen to collaborate.
The synchronization of the nodes is not fully automated; instead, users can choose pages to replicate and manually publish changes, including sets of changes affecting multiple pages. The changes are propagated along the edges of the social network, and users can select, which changes to integrate.
As the published changes are propagated through the network, each wiki graph incorporates a subset of the global sequence of changes, filtered through the participants' trust relationships. The task of integrating selected changes can be automated by algorithms that may enforce different consistency models, as in highly available wikis.
The explicit collaboration network and the manual publishing and integration of changes define the class of decentralized social wikis, an extension to the main wiki concept. The social propagation of the changes requires additional use cases, whereby the users publish and integrate changes: Publish changes: The user selects a set of changes from one or several pages, represented as a list of atomic insertions and deletions, and stores this changeset in a location available to other users, using the operation publish. A user can publish other users' changes. Therefore, users can receive the same changes by different channels several times. Consequently, the merge algorithm has to be idempotent to avoid duplication; Integrate changes: after retrieving a changeset from another user through a social connection (operation retrieveChanges), the user selects a subset of the operations in the changeset and applies them to her local wiki, using the operation integrateChange. Integrating the changes may be automated, using algorithms such as those used in highly available wikis (e.g., WOOT). We discuss the issue of consistency in Section 4.3.
We note that the synchronization process assumes that the users discover the published changes in some way, either through a formal publish/subscribe protocol or by a query protocol. We do not represent this aspect, which may vary between systems and does not really affect the overall collaboration model.
Finally, users can establish or remove social connections to other users, the follow and unfollow use cases. These two use cases could happen in very different ways in different systems and we simply describe them subsequently as operations, giving their semantics on the system model.

Operations.
In addition to the traditional operations on the local wiki, the publishing and integration of changes is supported by the following additional operations: publishChanges, retrieveChanges and integrateChange. Conceptually, publishing a set of changes consists of making the state of the local wiki visible to other users, so that they can at least partially synchronize their local wikis with the published wiki. However, it would be impractical and extremely inefficient to transfer the full state of the wiki over the network, so most systems manipulate a representation of the wiki that describes the new state of the wiki as a list of changes from a shared previous version. The representation is a changeset, which includes a reference to a previous version, and a list of atomic operations. Therefore, while these notions are not indispensable to the DSW concept, they are the most sensible data model for synchronization. The synchronization operations in a decentralized social wiki are sketched in Algorithm 5 and make use of these concepts. See Reference [3] for a formal description of the aforementioned operations in the decentralized social wiki DSMW.

Consistency in a decentralized social wiki
In highly available unstructured wikis, synchronization algorithms with strong consistency guarantees (WOOT, Logoot,etc.) ensure that once the system is idle, the wikis on the different nodes converge and eventually reach a state where they are all identical. In a decentralized social wiki, users can choose which users they collaborate with and can choose to ignore some changes published even by the users they collaborate with. It can therefore be expected that the users' local wikis will be inconsistent. The rationale of this approach is that groups of users should collaborate on subsets of the wiki, and within such groups, sets of pages should be consistent while the collaboration lasts. Once a user chooses not to integrate an operation op 0 , then all the operations that follow op 0 can no longer be integrated by consistency-ensuring algorithms. This implies that there is a trade-off between the benefits of user autonomy and the consistency guarantees provided by synchronization algorithms.

Decentralized social wiki systems
Gollum [41], git-wiki [8] and Olelo [42] are wiki systems based on the distributed version control system Git [43]. These systems support the multi-synchronous collaboration model, in which users can work in parallel on their local replica and synchronize their modifications when they decide to, using git primitives such as pull and merge. We note that the Git merge algorithm is designed to identify edit conflicts at the granularity of a line (changes are conflicting if they affect the same line) and does not resolve these conflicts automatically. Git ensures convergence on shared histories. Convergence on shared objects in a workspace is ensured only if the merge operation is commutative, associative and idempotent, as defined in commutative, replicated data type (CRDT) [44] and summary hash history [38]. This is not the case for the merge algorithm in Git. Distributed Semantic MediaWiki (DSMW) [3,40] is an extension of Semantic MediaWiki (SMW) [45] that allows SMW servers to be connected and form a decentralized social semantic wiki network. The social links in DSMW are 'follow and synchronize' relations. Users create their own collaboration network by creating and subscribing to feeds, which are named communication channels for propagating operations. In DSMW, when a wiki page is updated on a participating node, an operation is generated, describing the change. The operation is executed immediately against the page and is logged for future publication. A user can then decide to publish a set of changes to a feed called push feed, and subscribers to this feed may then pull the changeset and integrate the changes to their local wiki graph through a pull feed, as shown in Figure 6. A pull feed cannot exist alone: it must be associated with at least one push feed. If needed, multiple changesets can be merged in the integration process, either generated locally or received from other participants. DSMW manages the synchronization of shared pages with the Logoot [32] algorithm, ensuring CCI consistency.

Collaboration in decentralized social wikis
As mentioned earlier, decentralized social wikis promote the multi-synchronous collaboration model, in which multiple streams of activity proceed in parallel. The collaborative work is made up of divergence/convergence cycles; participants can work in isolation from each other, during which time divergence occurs; then, from time to time, users share their changes with each other and integrate these changes to achieve a consistent state.
Again, we revisit the scenario from Section 2.4. Each user can work in isolation on her own copy of the page. We suppose that the decentralized social wiki implements a synchronization algorithm that ensures CCI consistency. While saving their modifications, user 1 , user 2 and user 3 decide to make them available at the addresses url100, url200 and url300, respectively.  In this scenario, user 1 decides to communicate her modifications to user 2 . A connection is created between user 1 and user 2 . user 3 also decides to communicate her modification to user 2 , and user 2 communicates her modification to user 1 only. The social network is as follows:  Figure 7, the replicas of p 1 of user 1 and user 2 converge, but they are divergent from user 3 's replica.
The divergence/convergence cycles also occur in unstructured wikis, but the divergence is supposed to be temporary and the role of the system is to ensure convergence. In decentralized social wikis, divergence is a possible choice for any user and is observable and measurable [46]. Users can define their own collaboration networks and synchronize their work with others at their chosen frequency.
A decentralized social wiki can have the same properties as an unstructured wiki if the social network is connected through the 'publish and synchronize' relation and the users publish all of their changes; the system can then ensure eventual consistency. The eventual consistency is defined only for shared objects of each strongly connected component of the social graph. The multi-synchronous collaboration model creates communities with different focal points within the wiki. In a way, decentralized social wikis allow divergence and multiple points of view, but they do not allow users to search and browse the global network and discover these different points of view. Federated wikis support this process, as we will discuss in the next section.

FEDERATED WIKIS
The term 'federated wiki' was coined by Ward Cunningham, for his Smallest Federated Wiki [9] project (hereafter SFW). The main principle of federated wikis is to allow divergence with no restrictions, that is, two participants can host pages on the same topic (identified by the page title), without having to synchronize them. This allows for multiple points of view to be represented.
The key difference with decentralized social wikis is that users can search and browse the global network, thus being exposed to the different points of view expressed by the participants. The participants of a federated wiki are organized in a social network and use this social network to search Copyright  and browse the pages hosted by their peers. In federated wikis, users collaborate by copying and reusing material from their peers, without directly altering it in another user's repository.

Wikilink semantics and the hypergraph model.
In a federated wiki, the title of a page no longer uniquely identifies a page; property 2.1 no longer holds. Wikilinks therefore acquire different semantics. A wikilink is defined by a page title and gives access to all or any of the pages in the network sharing this title. In functional terms, following a wikilink implies selecting one of the target pages of the hyperedge. This selection can be carried out automatically by the system or else by the user. We therefore model wikilinks as directed hyperedges, and the federated wiki as a directed hypergraph. § Figure 8 shows an example hypergraph, a small set of pages from a hypothetical federated wiki. In contrast with the traditional wiki graph shown in Figure 1, for each page title, there are several page versions. Version A of the 'Nantes' page has wikilinks to the pages 'Grand-Ouest' and 'Pays de La Loire', whereas version B of the 'Nantes' page has wikilinks to the pages 'Pays de la Loire' and 'Loire-Atlantique'. Each of these wikilinks is a hyperedge, because following it will retrieve all the versions of its target page.

Definition 4
A Federated Wiki is a tuple, hH; N; M i where H D hP; E h i is the hypergraph of wiki pages; H is composed of nodes (the pages P ) and directed hyperedges E h P P.P /; N D hS; C i is a graph representing the network of participants, where the nodes S D ¹S i º are the participant sites and the edges C S S represent their social connections; and M W P ! P.S / is a function that maps the wiki pages to sets of participants; this is not a function that can be expressed by a defined algorithm (as in the case of structured wikis), but rather describes a relationship that is under the control of the users. Each page p 2 P may be hosted by one or several participants.

Unique page identification.
Although page titles are not globally unique in a federated wiki, they may be locally unique (property 5.1.2). In this case, any single participant of a federated wiki, taken in isolation, is a traditional wiki. In some federated wikis (such as P2Pedia), this weaker property does not hold either.
As pages are not uniquely identified by their title, additional identifiers can be introduced to act as globally unique identifiers (GUID), so that a page is a triple .id; L; content/, where id is unique.
If page titles are locally unique, then for any page p, the combination of the page title L with M.p/ (or any element of M.p/) uniquely identifies p and can be used as a GUID.
Alternatively, a GUID can be obtained by hashing the page contents. Once each page is uniquely identified, the GUID can be used to create hyperlinks to specific pages, provided the system implements a mechanism to dereference a GUID. Such 'version-specific' links induce a graph structure and can complement the wikilinks.

Page distribution
As users are free to make any changes they like to a page, they are also free to host pages on whichever topics they like. In addition, they can copy another user's entire page without changing it, and this page is therefore replicated. As a result, the different pages of the hypergraph are replicated 'socially': for each unique page, there may be any number of copies, distributed in arbitrary locations. Figure 9 shows an example distribution of the federated wiki pages of Figure 8. In this example, pages titles are not locally unique. Instead, the page title plus the version act as a GUID for illustrative purposes. Some pages are more replicated (more 'popular') than others; the page 'Pays de la Loire v.C' is hosted by all three participants, whereas the 'Loire-Atlantique' pages are only hosted at one site.

Use cases
The 'social' distribution of pages requires an additional use case, which consists of copying a page version from one participant to another: the fork page use case. We note that for decentralized social wikis, content is also transferred manually between participants. However, a page version is not fully transferred; rather, the changes-the differences from a previous version shared by the transfer initiator and recipient-are transferred. In a federated wiki, as there are no consistency guarantees or assumptions, no previous version can be expected to be shared between the participants. Pages are therefore copied in full. In federated wikis with local page title unicity, any previously existing local version (i.e., another page sharing the same title) is deleted. The semantics of traditional wiki use cases are modified as follows.
View page: In a traditional wiki, users request specific pages by entering the page title in their browser, or by following wikilinks. As several pages may share the same title, pages are requested in a two-step operation: (1) Find a list of pages with the requested title in the network.
(2) Select one page to display; the selection can be manual or automatic. This triggers the lookup operation, which retrieves the page content based on the unique page identifier.
These two conceptual steps can be illustrated by the following sequence: Create page: Users can create new pages, which are stored locally. There is no longer any precondition limiting which pages may be created, unless the page titles are locally unique. In the system model, all hyperedges pointing to other pages with the same title now also point to the new page. Delete page: Users can delete only local pages. If other copies of the page exist elsewhere, they are not deleted. Edit page: As users have control over their local material only, the two-step edit-save action cannot be reduced to the effect of saving a new version. Instead, there are now three steps, forkedit-save. The 'fork' step consists of making a local copy of the original page to be edited. This local copy is then edited, then saved, while the remote page is unchanged. The fork operation is specific to federated wikis and is detailed subsequently. Once the page has been edited, it is saved locally, as for a new page (refer to the aforementioned discussion). Then, outgoing links must be updated, as in a traditional wiki.
The manual management of the social network requires the following additional use cases: Joining and leaving the network: The participants can join or leave the network. Formally, this corresponds to the node being added to or removed from the graph N , as defined

Federated wiki systems
The Smallest Federated Wiki project [9], led by W. Cunningham, is a set of interconnected and interoperable wiki servers, belonging to different users. Users can seamlessly browse the pages of the different wiki servers; wikilinks are dereferenced by automatically selecting the first available page according to a preference function over the known participating servers. Pages from the user's local server are selected in priority, then pages from the neighboring servers in the network. If no page with a given title is found, then the user is directed to a page creation form. The user interface shows several pages at the same time, side by side. This makes it easy to edit pages by dragging and dropping from other users' pages. The pages also have a 'fork' button. As the page titles are locally unique, when a page is forked, any existing local page with the same title is deleted. The network of servers is automatically discovered and maintained by browsing and forking pages (forks are recorded in the history of pages), which gives the user only indirect control over the social network structure. The full federated wiki (i.e., the known network) can also be searched, and users can see all the available versions of each page. Search results are visible as version-specific links, identifying the host of each version. The P2Pedia wiki [10,48] explores a very similar idea but implemented over a P2P file-sharing network. The peers may share any version (or set of versions) of each page. Wikilinks are dereferenced by a P2P file-sharing 'search' function and manual selection of the target page by the user, among the search results. In order to assist the user in this choice, search results can be ranked according to different trust indicators, based on the popularity of each page version, and on the social network. Page replicas are identified by a GUID based on the page content's hash. When the user Copyright  browses pages, each page is automatically forked; that is, it is downloaded not only to the browser cache, but also to the user's local repository. When a page is edited, the previous version is also kept, unless the user explicitly deletes it. In addition to these two federated wikis, the different language editions of Wikipedia share some aspects of federated wikis. Articles on the same topic in different languages may represent alternative views on the topic and interlanguage links provide a means of navigating between them. Cap [15] also discusses the different points of view adopted by different Wikipedia languages and proposes an 'every point of view' approach that matches the motivation of the federated wiki idea. The approach is not implemented, and its technical details have only been sketched out. Presumably, the different versions of each page would be still stored in a centralized repository and users would have access to all of them. This proposal could be a centralized equivalent of the federated wiki concept. However, the centralization goes against one of the fundamental principles of federated wikis, their decentralized control model, in which each user can modify only her own set of pages.

Summary and comparison
Distributed wikis consist of a network of autonomous participants that host a set of wiki pages. A distributed wiki must handle the maintenance of the network, the distribution of the pages in the network, the corresponding retrieval of pages and the propagation and integration of the edits made to the pages. The different existing systems are motivated by different social and technical issues and therefore adopt different solutions for each of these technical problems, with different complexities. We have classified them into three general classes, defined by their general motivation.
Highly available wikis, the largest category of distributed wiki systems, are designed to address the technical limitations of a centralized infrastructures as follows: lack of scalability, high cost and central point of failure. Their defining characteristic is that all of the problems mentioned earlier are handled automatically by the system: wiki pages are either partitioned or replicated across a self-organized network of wiki servers, and edits to the pages are automatically propagated and integrated using algorithms that guarantee the consistency of the replicated pages. The complexity of the lookup and save operations depends on the size of the network.
Decentralized social wikis are designed to support the multi-synchronous collaboration model and to allow users to organize social collaboration networks. The participants are therefore organized in a social network and modifications to the wiki are manually published and propagated through communication channels following the social network edges. Different communities will be formed around different focal topics. Within each collaborative community, the convergence of page replicas is ensured by automatic synchronization algorithms.
Federated wikis allow and encourage divergence, in order to accommodate multiple points of view. Federated wikis achieve this by giving users the greatest level of control over the different functionalities. In particular, the system does not enforce any consistency between pages on the same topic, and collaboration is limited to manually copying and reusing the work of others. Many pages may therefore share the same title, and wikilinks cannot simply point to a single wiki page. Instead, they point to all the versions sharing a given title, allowing users to browse the different versions. This is best modeled by a hypergraph of pages that are socially replicated across the sites. A key difference with decentralized social wikis is that users can browse pages from the whole network.
These classes of systems are summarized in Table I. For each system, the table indicates the network organization, the page distribution scheme, the consistency model, the change propagation method, the scope of page retrieval (i.e., the set of pages that users can directly retrieve for viewing) and finally the complexities of the lookup and save operations. These parameters for the complexity values are mainly n, the size of the network, and d is the average node degree (for decentralized social networks). Gollum [41], git-wiki [8], Olelo [42], DSMW [3,40] SFW [9], P2Pedia [10,48]

New challenges and opportunities for distributed wikis
Recent developments in Web technologies are offering new opportunities for distributed wikis.
Technological opportunities and real-time editing Existing distributed wikis are complex to deploy. This is a severe limitation for their adoption by end users. Recent advances in Web protocols, such as webRTC, ¶ make it easy to deploy complex distributed infrastructures, even for end users. New systems, such as ShareFest, || PeerCDN ** or webtorrent, † † demonstrate how direct connections, unstructured P2P networks or DHTs can be deployed directly in Web browsers. Such technology can transform any of billions of devices running compatible browsers into distributed wiki participants in one click. This can greatly improve the user experience with distributed wikis and allow researchers to set up new experiments much more easily. Furthermore, such technologies also introduce distributed real-time editing for Web authoring. It is already possible for a wiki instance to integrate a real-time editor, such as ShareJs, ‡ ‡ but WebRTC can improve the user experience with fully decentralized browser-to-browser data channels. This raises issues about collaboration models where some contributors edit together in real-time, while others may prefer to edit offline. How should slow coarse-grained changes be combined with fast fine-grained changes? How can real-time editing sessions served by different wiki instances be detected and accommodated?

Federated semantic wikis
The Semantic Web is a major opportunity and an interesting challenge for distributed wikis. Thanks to the Linking Open Data (LOD) project [49], the Semantic Web makes millions of RDF triples from a network of autonomous participants available to the public. However, data quality is a major issue [50,51]. Wiki systems have demonstrated how communities of users can improve the quality of shared documents, and semantic wikis [39] or wikidata [52] apply the wiki approach to improving semantic documents and data. Other work [53] aims to transform wikis into collaborative integrated development environments mixing text, data and applications. Such wikis allow simple semantic Web applications to be quickly developed and shared.
Surprisingly, in such approaches, wikis remain centralized, while the data are hosted in a federation of linked data. In fact, current semantic wikis allow authoring of one local dataset, even if this dataset is linked to others datasets of the LOD cloud. A federated semantic wiki should allow authoring of linked data across the federation, as part of it, or as a new federation accessing the existing federation of linked data.
The decentralized social wiki approach has been already applied to building a decentralized social semantic wiki [40]. However, this system has several drawbacks: Semantic data are modified as a side effect of text modification. The advantage is that the text is kept synchronized with semantic data embedded in the text, but the drawback is that semantic data cannot be modified directly. Consequently, other authoring tools for semantic Web such as Protégé or SPARQL Update cannot be used safely on the same semantic data authored through a semantic wiki. If different autonomous participants collaborate directly on semantic data in a system such as [54], then text and semantic data get out of sync. Fundamental replication techniques used to build distributed wikis force all the participants, in the worst case, to have the same storage capacity and generate considerable amounts of traffic on the network. As the amount of data hosted can be very large, this approach is problematic.
The federated class of wikis seems more appropriate for building a federated semantic wiki. A federated semantic wiki should be able to 'editorialize' data collected from LOD, that is, to author meaningful, human-readable documents from linked data. Such documents could then be further edited, and changes would be propagated to the federation of semantic wikis and the federation of linked data. In other words, a federated semantic wiki should be able to make the Web of data understandable and editable by humans.
However, a major issue is that the federated wiki and semantic wiki models cannot be applied directly to federated semantic wikis. In particular, there is a mismatch between the graph model of most semantic wikis, where concepts are generally associated with a unique page, and the hypergraph model of federated wikis, which represents the multiplicity of perspectives over shared concepts.
Many more issues can be added to this list: What is a suitable graph model? How should collected data be 'editorialized'? Since linked data is mainly read-only, how should federated semantic wikis push back changes? How should data providers trust changes authored in federated semantic wikis? How would concurrent changes be managed,specifically the changes coming from human users within federations of semantic wikis and the changes computed by algorithms on federations of linked data?