The Rise of IoT and Big Data Challenges
The diversity and sheer increase in the number of connected Internet of Things (IoT) devices have brought significant concerns associated with storing and protecting a large volume of IoT data. Storage volume requirements and computational costs are continuously rising in the conventional cloud-centric IoT structures. Besides, the dependencies of the centralized server solution impose significant trust issues and make it vulnerable to security risks.
To mitigate these challenges, a layer-based distributed data storage design and implementation of a blockchain-enabled large-scale IoT system have been proposed. It has been developed using the Hyperledger Fabric (HLF) platform for distributed ledger solutions. The need for a centralized server and a third-party auditor was eliminated by leveraging HLF peers performing transaction verifications and records audits in a big data system with the help of blockchain technology.
The HLF blockchain facilitates storing the lightweight verification tags on the blockchain ledger. In contrast, the actual metadata are stored in the off-chain big data system to reduce the communication overheads and enhance data integrity. Additionally, a prototype has been implemented on embedded hardware, showing the feasibility of deploying the proposed solution in IoT edge computing and big data ecosystems.
Experimental Evaluation and Performance Analysis
Experiments have been conducted to evaluate the performance of the proposed scheme in terms of its throughput, latency, communication, and computation costs. The obtained results have indicated the feasibility of the proposed solution to retrieve and store the provenance of large-scale IoT data within the Big Data ecosystem using the HLF blockchain.
The experimental results show the throughput of about 600 transactions/minute and 500 ms average response time, about 23% of the CPU consumption at the peer process, and approximately 10-20% at the client node. The minimum latency remained below 1 s; however, there is an increase in the maximum latency when the sending rate reached around 200 transactions per second (TPS).
Addressing Security and Privacy Challenges
The exponential growth in the generated data presents its own security and privacy challenges and issues associated with data sources reliability and data sharing. The challenges of the Big Data ecosystem can be answered using the unique features of blockchain technology such as decentralized storage, transparency, immutability, and consensus mechanisms.
The integration of blockchain and Big Data can further enhance Big Data security and privacy, improve data integrity, provide fraud prevention, facilitate real-time data analytics, expand data sharing, enhance data quality, and streamline data access.
Blockchain-based Data Provenance and Auditing
This work aims to develop a blockchain-enabled public data provenance and auditing model in the Big Data ecosystem (Hadoop ecosystem) to provide a more efficient and secure framework than the reported solutions. Blockchain offers a decentralized database that records the history of all transactions appended to the shared ledger and enhances data traceability.
The proposed data provenance model aims to identify the way the data was derived and to provide data confidentiality, integrity, and availability. The HLF permissioned blockchain having registered members offers the above-required functionalities. The architecture of the system includes three layers: a blockchain layer, a Big Data system (off-chain storage) layer, and an authentication provider layer.
Secure IoT Data Management with Blockchain
The proposed model stores the provenance of data in the shared ledger (a small portion of the metadata), while the actual metadata is placed in the Hadoop ecosystem to tackle the issues mentioned earlier. In this way, the data is stored in the off-chain storage, and the data checksums are computed to perform the data verification and integrity checks.
The HLF blockchain can verify stored data integrity by comparing the immutable recorded information in the shared ledger with the checksum of stored data in the Hadoop system. A ChainCode is developed to facilitate these operations running in each peer node within the HLF network.
The client library sends the data checksum and provenance data. The Hadoop ecosystem is introduced as a pluggable storage solution to accommodate secure and verified data. The client facilitates the data invocation process by putting the data in the Hadoop storage and sending information to the blockchain for verification.
Enabling Secure and Scalable IoT Data Management
The proposed design considers storing checksums of all data objects, data addresses, and locations information about workers who stored the data, information on creating an object, data lineage, timestamp, certificate ID, and additional fields that can be customized for various data structures (e.g., JSON structure).
The process starts with the ChainCode functions invoked as parameters associated with the data to begin storing data in the HLF ledger. A specific function is designed in the ChainCode to perform the data retrieving functionalities. The client library is developed and built using the Software Development Kit (SDK) to interact with the HLF blockchain platform for data verification and provenance operations.
The edge computing device is a central node to implement the blockchain-based IoT Big Data storage scheme. It offloads the tasks from small IoT devices and maintains significant energy savings. Besides, it performs the associated computations, manages data storage, and relays transactions and messages for IoT devices.
Conclusion and Future Directions
The blockchain-enabled data provenance mechanism for Big Data applications in IoT systems guarantees data verifiability and integrity because the data operations are recorded in the form of the transaction by every block in the blockchain network. The experimental results show the feasibility of the proposed solution to retrieve and store the provenance of large-scale IoT data within the Big Data ecosystem using the HLF blockchain.
Future research directions may include integrating the proposed scheme with a distributed database like Apache Cassandra to store transaction data with more detailed performance evaluations and developing a sharding-based consensus that handles the network partitions. Additionally, supporting MQTT-based communication between blockchain, IoT sensors, and Hadoop off-chain storage to store transaction data could be explored.
Sensor Networks is an excellent resource for professionals, researchers, and enthusiasts interested in the latest advancements in sensor network technologies and their practical applications.