其他摘要 | Currently, astronomical data processing technology has entered the era of data-intensive astronomy informatics. Big data is a typical characteristic with large amount of data, fast data capturing rate and continuous data growing in solar observation. Traditional local host data storage technology, such as DAS and other network storage technologies, such as NAS and SAN, perform many limitations under the background of astronomical big data storing, processing and data management. It slows down the procedure of scientific research. Modern astronomical observation needs advanced big data technologies to accelerate data processing. The storage system for these data processing technologies has to provide high performance and extendable parallel reading and writing ability, efficient data indexing and querying and also should adopt to manage the fast growing of observation data. The New Vacuum Solar Telescope (NVST) has begun routine observation and produced over 200TB solar observation data by using the mode of high speed, multi-channel and multi-wavelength. When two channels of photosphere and chromosphere are observed at the same time under proper observing conditions, the chromospheric channel can reach at the rate of 60GB per hour, photospheric channel can reach the rate of 190GB per hour. About 2TB data can be produced in 8 hours continues observation. With high time and space resolution of data requirements of NVST and multi-channel parallel working together in the future, single-direction writing speed can reach at the level of TB per second. If the real-time data processing has taken into account, the rate will be doubled. Through there are some storage technologies can achieve at good performance and can be extensible, but data characteristics of continuous storing ultimately limit the use of these main stream technologies. Traditional local file systems such as Ext3, Ext4 and ZFS are hard to satisfy the requirements of NVST, so we need to find a storage technology which can manage massive data, has high performance, be highly extendibility, can adopt to future data storage of NVST and can support massive high speed data processing. With devices like larger telescope in use, the storage system needs suitable technologies to support massive high speed data storing, reading and processing. Distributed parallel storage is the technology which can well satisfy these needs, because distributed architecture can supply high performance, parallel storing and has the ability of scale-out, which is more suitable for multi-channel, multi-waveband, high speed and massive data continuously growing like NVST. In this dissertation, key techniques of distributed storage are mainly researched. NoSQL based bitmap index is also studied to satisfy the needs of massive data indexing and data retrieving. This dissertation research mainly covers the following aspects, 1) Applying distributed storage to solar observation. We use experiments to verify the feasibility of high performance and extensibility of distributed storage. We achieve at the data acquisition rate of 3.4Gb/s by using bonding technology in the 1Gb network environment. 2) High speed data storing may lead to inconsistency problem between metadata and data stored separately. How to take effective mechanism to keep the consistency of metadata and data is an ignored issue in data storage. This dissertation analyzed the reasons, the states and the models of the inconsistency. 2PC algorithm is adopted to ensure the consistency. 3) We design a distributed storage system called AstroFS based on the mechanism of RAID0 under the network environment in order to perform high performance. Key technologies have carried out. Such as data aggregation, splitting algorithms, data balance strategies and so on. 4) This paper uses compressed word-aligned bitmap index to build index for massive solar data. We also design and realize an astronomical data archiving system(DAS) based on Fastbit. Compared to technique based on relational databases, DAS has many advantages, such as more efficiently retrieval, faster index building and so on. The distributed storage and massive data retrieval technologies researched in this dissertation satisfies the requirements of NVST data storing and management. The research methods also make a reference for the design of the massive data storage and data retrieval applications of the foreign and domestic large solar telescopes. |
修改评论