Organisation: |
Orange Labs
CONVECS and POLARIS project-teams, Inria Grenoble - Rhône-Alpes and LIG |
---|---|
Method: |
LNT
MCL |
Tools used: |
CADP (Construction and Analysis of Distributed Processes)
|
Domain: |
Internet of Things.
|
Period: |
2019
|
Size: |
2500 lines of LNT
600 lines of MCL |
Description: |
Recent computing trends have been advocating for more distributed
paradigms, namely Fog computing, which extends the capacities of the
Cloud at the edge of the network, that is close to end devices and
end users in the physical world. The Fog is a key enabler of the
IoT (Internet of Things) applications, as it resolves some of
the needs that the Cloud fails to provide, such as low network
latencies, privacy, QoS, and geographical requirements. Recent
applications of the Fog include smart homes and cities, agriculture,
healthcare, transportation, etc. The Fog, however, is unstable because
it is constituted of billions of heterogeneous devices in a dynamic
ecosystem. IoT devices may regularly fail because of bulk production
and cheap design. Moreover, the Fog-IoT ecosystem is cyber-physical
and thus devices are subjected to external physical world conditions
that increase the occurrence of failures, resulting in application
inconsistencies, which in turn affect the physical world by inducing
hazardous and costly situations.
We propose an end-to-end autonomic failure management approach for IoT applications deployed in the Fog. The approach manages IoT applications and is composed of four functional steps: (i) state saving, (ii) monitoring, (iii) failure notification, and (iv) recovery. Each step is a collection of similar roles and is implemented, taking into account the specificities of the ecosystem (e.g., heterogeneity, resource limitations). State saving aims at saving data concerning the state of the managed application. These include runtime parameters and the data in the volatile memory, as well as messages exchanged and functions executed by the application. Monitoring aims at observing and reporting information on the lifecycle of the application. When a failure is detected, failure notifications are propagated to the part of the application which is affected by that failure. The propagation of failure notifications aims at limiting the impact of the failure and providing a partial service. To recover from a failure, the application is reconfigured and the data saved during the state saving step are used to restore a cyber-physical consistent state of the application. Cyber-physical consistency aims at maintaining a consistent behaviour of the application with respect to the physical world, as well as avoiding dangerous and costly circumstances. The proposed distributed failure management approach was validated formally using model checking techniques. The various components of the protocol (simulator, global managers, local agents associated to software elements and appliances) were modeled in LNT, and 12 key correctness properties (final objective, architectural invariants, functional properties) were specified in MCL. These properties were checked using the EVALUATOR tool on a set of model instances generated automatically, corresponding to applications with different fog nodes, software elements, appliances, and bindings. The verification provided insight about several finer points of the recovery procedure (functioning in degraded mode after a failure and missing replacement appliance, propagation of failure notifications in applications with multiple dependencies, and handling of multiple failures in sequence). Finally, the approach was implemented as a framework called F3ARIoT, which was evaluated on a smart home application. The results showed the feasibility of deploying F3ARIoT on real Fog-IoT applications, as well as its good performances in regards to end user experience. |
Conclusions: |
Failure management protocols for IoT applications in dynamic Fog
environments are complex distributed artifacts, which can benefit
from rigorous design approaches based on formal methods and
verification techniques. The verification of the proposed failure
management approach, with a focus on the recovery step, was useful
to assess its correctness and clarify several issues, and also helped
in its implementation.
|
Publications: |
[Ozeer-19]
Umar Ozeer.
"Autonomic Resilience of Applications in a Largely Distributed
Cloud Environment". PhD Thesis, Université Grenoble Alpes,
December 2019.
Available on-line at: https://tel.archives-ouvertes.fr/tel-02570825 or from the CADP Web site in PDF or PostScript [Ozeer-Salaun-Letondeur-et-al-20] Umar Ozeer, Gwen Salaün, Loïc Letondeur, François-Gaël Ottogalli, and Jean-Marc Vincent. "Verification of a Failure Management Protocol for Stateful IoT Applications". Proceedings of the 25th International Conference on Formal Methods for Industrial Critical Systems FMICS'2020 (Vienna, Austria), Lecture Notes in Computer Science vol. 12327, pp. 272-287. Springer Verlag, September 2020. Available on-line at: https://hal.inria.fr/hal-02930872/en or from the CADP Web site in PDF or PostScript |
Contact: | Gwen Salaün Inria - LIG / CONVECS 655, avenue de l'Europe CS 90051 38334 Montbonnot FRANCE Email: [email protected] |
Further remarks: | This case-study, amongst others, is described on the CADP Web site: http://cadp.inria.fr/case-studies |