Concurrency controlWhat is Concurrency control?Concurrency control is a DBMS concept that is used to address conflicts with simultaneous accessing or altering of data that may occur with a multi-user system. When it is applied to DBMS, it is meant to coordinate simultaneous transactions while preserving data integrity. The Concurrency is about controlling multi-user access of DatabaseWhy is Concurrency control needed?It is needed to allow a set of transactions to execute concurrently, and to resolve conflicts among transactions and to ensure that the overall effect of their execution is correct.Concurrency Control in Mobile Database SystemBy Nitin Prabhu, Vijay Kumar, Indrakshi Ray and Gi-Chul YangIntroductionIn this paper an Concurrency Control Mechanism (CCM) is proposed that ensures epsilon serializability and report its performance.
The ability to perform transaction activities such as bill payments, transfer funds from anytime anywhere is in great demand. CCMs proposed earlier 1, 2, 3, 4 did not indicate conservation of resources and some of them are message intensive. This paper argues that a weaker form of consistency is desirable and proposes a CCM which uses semantic properties 2 of broadcast data and reports its performance.ApproachIn data processing develop a weaker correctness criteria is acceptable in a number of situations and use ? serializability (ESR) 5, 6, which permit very few amount of inconsistency specified by ? to develop this CCM. It is based on a two-tier replication scheme 7 that produces ? serializable schedule.
The idea behind two-tier replication is to allow the user to run transactions on the Mobile Unit (MU) which will make data updates locally and whenever the MU connects to servers, these transactions will be re-executed on servers as base transactions. Base transactions are serialized on the master copy of data and MUs are informed about the failed Base transactions. But the disadvantage of using this approach is that large number of transactions are rejected as MU execute transaction locally and the commit time of transaction at MU will be large because it will know its outcome only after the Base transactions have been executed. This paper modifies two-tier replication scheme to reduce the number of rejections and transaction commit time at MU.The Concurrency control mechanismAn assumption is made in this paper that there is a central server that holds and manages the database.
The data records will be replicated at the Mobile Unit, a limit ? is kept for the amount of change that can appear on the replicated record at each MU. If the transaction changes the data value in a MU by at most ?, then they can be committed instead of waiting for results of the Base Transaction at DBS. This will reduce transaction’s commit time and will also helps in reducing the number of rejections, which may arise due to Base Transaction not being able to commit. To control the validity of ?, we define a timeout parameter whose value will indicate a duration within which the value of ? is valid. Timeout values of the data item should be multiple (I) of broadcast cycle time (T). The steps of the algorithm are below: At the DBS: ? is calculated for each data object D by using a function ? = f (d, n).
For each D there is a function f (d, n) associated with it and this function depends on the application semantics.A timeout value ? is linked with ? of each data item.DBS broadcasts (d, ?) for each data item and ? for these values at the starting of the broadcast cycle.The DBS can receive a pre-committed transactions (transactions which made updates to replicas on the MU and got committed) or it can receive request transactions (transactions that are directly sent to the DBS by the MU).
A request transaction will not be executed at an MU as the transaction will have changed the value of replica D by more than ? at the MU.Execution of transaction on master copy of the data record: (i) DBS serializes the pre-committed transactions on the master copy according to the order they arrived on the DBS (ii) After ? is expired, DBS runs request transaction and reports it to the MU whether the transaction was committed or aborted. After ? expires the DBS goes back to procedure 1. At MU: MU has (d,?) for each D it has cached and their ? value.MU runs transaction t: Let (?-t) be the change t made to D. Let (?-c) be the current value of the total change in D since the last broadcast of value ?.If transaction t changes the value of D by an amount (?-t), then the amount (?-t) is added to (?-c).
The following cases are possible: If (?-t) <= ? and (?-c) <= ? then the transaction t will be committed at MU and will be sent to the server for re-execution as a Base Transaction on the master copy.If (?-t) <= ? and (?-c) > ? then the transaction t will be blocked at MU until the new set of (d,?) is received from the server.If (?-t) > ? then the transaction t will be blocked at MU and submitted to the server as a request transaction.Relationship with ESR: Mechanism for maintaining ESR consists of divergence control (DC) and consistency restoration. A transaction will import inconsistency through reading uncommitted data of other transactions. A transaction exports inconsistency by allowing other transactions to read uncommitted data. Transactions have export counter and import counter.
Intermediate Stages in the CCM schemeConclusionCCMs for MDS should take into account its inherent resource limitations in managing transaction execution. Keeping this in mind, a new concurrency control mechanism is proposed in the paper that uses ? serializability as the correctness criterion.Big Data In Mobile ComputingIntroductionMobile has transformed the way we live. As phone adoption has increased rapidly across the globe. This has widespread social implications. In the past few years smartphones remarkably started to carry sensors like GPS, microphone accelerometer, gyroscope, camera and Bluetooth. Related application and service offering covers health care, entertainment or information search. The ubiquity of mobile phones and the increasing amount of data generated from applications and sensors are giving rise to a new domain across computing.
The Mobile Data Challenge (MDC) by Nokia is motivated by our belief in the value of research that can result in a deeper scientific understanding of human and social phenomena, advanced mobile experiences and technological innovations. Guided by this principle, in January 2009 NRCL and its Swiss academic partners Idiap and EPFL started an initiative to create large-scale mobile data research resources. This included the design and implementation of the Lausanne Data Collection Campaign (LDCC), in which longitudinal smartphone data set was collected from nearly 200 volunteers in the Lake Geneva region over a time of a year. It also included the definition of a number of research tasks with clearly specified experimental protocols. From starting, the intention was to share these research resources with the research community which required integration of holistic and proactive approach on privacy according to the privacy-by-design principles.
After working 3 years in this direction the outcome was the MDC. The Challenge gave researchers an opportunity to analyze a relatively unexplored data set including rich mobility, communication, and interaction information. The MDC comprised of two research alternatives one was through an Open Research Track and another from a Dedicated Research Track.
In the Open Track, researchers were given opportunity to approach the data set from an exploratory perspective, by proposing their own tasks according to their interests and background. The Dedicated Track gave researchers the possibility to take on up to three tasks to solve, related with prediction of mobility patterns, recognition of place categories, and estimation of demographic attributes. Each of these tasks had properly defined experimental protocols and standard evaluation measures to assess and rank all contributions.ApproachTHE LAUSANNE DATA COLLECTION CAMPAIGN (LDCC) LDCC aimed at designing and implementing a large-scale campaign to collect smartphone data in everyday life conditions, grounding the study on a European culture. The goal was to collect quasi-continuous measurements covering all sensory and other available information on a smartphone.
This way LDCC were able to capture daily activities of phone users’ unobtrusively, in a setting that implemented the privacy-by-design principles . The data that was collected included a significant amount of behavioral information, both personal and relational aspects. This enabled investigation of a large number of research questions related to personal and social context – including mobility, phone usage, communication, and interaction.
Only content that were excluded was image files or content of the messages because it was considered too intrusive for the longitudinal study. Instead log-files with metadata were collected both for imaging and messaging applications. This section provides a summary on the LDCC implementation and captured data types.Data CollectionCollection of data was done using Nokia N95 phones and a client-server architecture that made the data collection invisible to the participants. A seamless implementation of the data recording process was a key to make a longitudinal study feasible in practice. Another important target for the client software design was to reach an appropriate trade-off between quality of the collected data and phone energy consumption.
The collected data was first stored in the device and then uploaded automatically to a Simple Context server via WLAN. The server stored the data in a database that could be accessed by the campaign participants.Additionally data visualization tool was developed which offered a “life diary” type of view for the campaign participants on their data. MDC DATA This section presents an overview of the MDC datasets and the corresponding preparation procedures.Division of the DatasetThe datasets provided to the participants of the MDC consist of slices of the full LDCC dataset. Slicing the data was needed in order to create separate training and test sets for the tasks in the Dedicated Track, but was also useful to assign the richest and cleanest parts of the LDCC dataset to the right type of challenge.
Four data slices were created for the MDC which were: Set A: Common training set for the three dedicated tasks.Set B: Test set for demographic attribute and semantic place label prediction tasks. Set C: Test set for location prediction task. Open set. Set for all open track entries. The rationale behind this structure was the following.
The participants of the LDCC were separated in three groups, according to the quality of their data and different aspects. The 80 users with the highest-quality location traces were assigned to sets A and C. Set A contains the full data for these users except the 50 last days of traces, whereas set C contains the 50 last days for which location data is available for testing. In order to maximize the use of the available data, they reused Set A as a training set for the two other dedicated tasks. A set of 34 users was selected as a test set for these tasks and appeared as Set B. In this way, models trained on the users of Set A can be applied to the users of their most visited locations. Data Types For both Open and Dedicated Tracks, most data types were released in a raw format except a few data types that needed to be anonymized.Common data types: Each data type corresponds to a table with each row as a record such as a phone call or an observation of a WLAN access point.
User IDs and timestamps are the basic information for each record. Data types for Open Track only: Geo-location information was only available in the Open Track. In addition to GPS data, WLAN data for inferring user location.
The location of WLAN access points was traced by matching WLAN traces with GPS traces.Location data in Dedicated Track: Physical location was not disclosed in the Dedicated Track. For each user the raw location data (based on WLAN and GPS) was transformed into a symbolic space which captures most of the mobility information and excludes actual geographic coordinates. This was done by detecting visited places and then mapping the sequence of coordinates into the corresponding sequence of visited places.ConclusionsCollecting such large amount of data requires extensive effort and heavy investments, which often means that collected data sets are available for researchers only in the limited manner.
This has generated some discussion about the Big Data driven research. Protecting the privacy of the users is the key reason for access and usage limitations of Big Data. With the examples described in this paper it is clear that open data sharing with research community and therefore wider open innovation momentum around the same commonly available data set is possible. Achieving that requires proactive and holistic approach on privacy throughout the research. Privacy protection requires extremely careful considerations due to multimodality of the rich smartphone data.This paper describes the needed countermeasures when the smartphone data was originally collected and when it was later released to the research community.In that manner it was possible to achieve appropriate balance between the necessary privacy protection but simultaneously still maintaining the richness of the data for research purposes.