Advanced Technology and Social Media Influence on Research, Industry and Community

The rapid development in technology and social media has gradually shifted the focus in research, industry and community from traditional into dynamic environments where creativity and innovation dominate various aspects of the daily life. This facilitated the automated collection and storage of huge amount of data which is necessary for effective decision making. Indeed, the value of data is increasingly realized and there is a tremendous need for effective techniques to maintain and handle the collected data starting from storage to processing and analysis leading to knowledge discovery. This chapter will cite our accomplished works which focus on techniques and structures which could maximize the benefit from data beyond what is traditionally supported. In the listed published work, we emphasized data intensive domains which require developing and utilizing advance computational techniques for informative discoveries. We described some of our accomplishments, ongoing research and future research plans. The notion of big data has been addressed to show how it is possible to process incrementally available big data using limited computing resources. The benefit of various data mining and network modeling mechanisms for data analysis and prediction has been addressed with emphasize on some practical applications ranging from forums and reviews to social media as effective means for communication, sharing and discussion leading to collaborative decision making and shaping of future plans.


Introduction
Data is a major resource for decision making.Its value and importance has never been ignored since the existence of mankind on earth.It has been collected, stored and maintained using a wide variety of affordable means ranging from primitive to advanced.Indeed, collecting, storing and maintaining data was a cumbersome task in the past, mainly prior to the development of various technologies that gradually helped humans in handling data.However, the recent development in technology rapidly influenced data collection, storage and maintenance.For instance, sensors are becoming popular in all aspects of the daily life; they have been installed in almost every indoor and outdoor equipment.They are widely available and equipped with wireless communication skills which allow them to feed huge amounts of data that should be captured, stored, cleaned, and processed for knowledge discovery as main ingredient of effective decision making.
In the past, humans used computing devices in a limited way.Database management systems were developed to facilitate flexibility in storing and retrieving data.Making sense of data was left to domain experts who are expected to retrieve and study data related to a specific problem in a way to draw some conclusions which may guide the decision-making process.Automating the knowledge discovery process was better realized towards the end of the 20 th century when various machine learning and data mining techniques were developed and put in practice to serve a variety of application domains including business, health, security, etc.
To cope with the new era, researchers, developers and practitioners realized the need to develop new techniques and technologies capable handling growing volumes of data captured incrementally from heterogeneous sources.In other words, growth in volume and types of data expected to be processed suddenly witnessed a boom.Social networks and social media platforms are gaining increased popularity and are generating tremendous amounts of data.Surveillance devices are available almost everywhere.Even traditional archives are digitized.Consequently, storage media and techniques which were previously accepted as sufficient are no more capable of handling new needs.For instance, hard drives of personal computers were only couple megabytes in capacity when they were initially manufactured with less than one megabyte of main memory.People were happily competing to get the honor of owning and using such devices.It may be impossible to image using same computing platform in the current era where gegabytes of storage are no more sufficient.In fact, computing resources have improved rapidly to partially meet human needs but will never be satisfactory.Therefore, researchers and developers are always seeking new technologies and techniques, and hence conducting and advancing research will continue to attract more attention and investment.
Explicitly speaking, data volume, characteristics and associated expectations may be described as a moving target.This necessitates the availability of enough room for storage and sophisticated techniques for processing.People will continue to collect more data as time passes, but they will never afford to increase their computing power to handle their data effectively.Thus, the need for algorithms and techniques that can depend only on limited computing resources to deal with various aspects of data from dynamicity to volume, among others.Along this direction, we contributed various techniques and algorithms that could successfully satisfy a variety of applications which require handling large volumes of dynamic and stream data.These techniques are described in our published papers listed in the references at the end of this paper.Scalability is the main aspect considered by our techniques, including frequent pattern mining, clustering, network analysis, finding repeating patterns in long sequences, etc.
Our completed and ongoing research addresses various aspects of data from definition to construction to manipulation and analysis leading to knowledge discovery for decision making.Our initial contributions focused on traditional aspects related to handling and manipulation of data which were popular during the last two decades of the 20th century.We then gradually moved onto more advanced techniques which we realized as necessities since 1990s.These techniques, include, network analysis, data mining and machine learning techniques which have tremendously and visibly served various applications.We also realized scalability as a serious need especially in the current era of big data.We developed advanced techniques and adapt them to various domains, including: § Bioinformatics and Health informatics § Data partitioning and allocation § Homeland security and terror/criminal network analysis § Financial data analysis: from stock market to FOREX to fraud detection § Web/network data analysis: from structure to content to usage § Social media analysis and opinion mining including spam detection § Recommendation and customer behavior analysis § Network representation is a powerful mechanism for modeling many-tomany relationships.
A network consists of a set of nodes corresponding to the entities in the application domain and a set of links representing certain types of relationships between the entities.On the other hand, data mining includes a set of powerful techniques for studying the relationships/connections between various objects.Further, data mining may be used in the network construction phase.To construct a network data may be first analyzed using techniques like frequent pattern mining or clustering.Once a network is constructed, it can be analyzed for knowledge discovery.
Most of the traditional approaches for frequent pattern mining assume unlimited main memory which is not realistic.Therefore, scalability is a major concern when it comes to practical applications where data streams dynamically and available in large volume.To tackle these problems, we developed a novel approach which satisfies the following: § The ability to mine in a bounded amount of memory space that may vary based on task priority.Thus, it is possible to mine using common PC.§ Improve external data access and make the mining process more I/O conscious.§ Introduce a specialized mining task aware memory manager for both RAM and the external memory We build a tree structure namely, Frequent-Pattern (FP) tree, which summarizes the given data and allows for effective discovery of frequent patterns.Each branch represents at least one transaction.We build the tree from left to right and from top to bottom as shown in Figure 1.This way, we can store on the disk left side of the tree as it grows to the right.Therefore, our upper bound is the size of free disk space rather than the available memory.To facilitate effective data investigation and analysis, we build our own tool, namely NetDriller which is capable of analyzing raw data to derive a network.Then various network analysis techniques could be applied on the network to identify actors which may reveal some important aspects related to the analyzed network, like most knowledgeable employee, most dangerous criminal, least performing student, best team to undertake next project, etc.The basics of NetDriller are summarized in Figure 2.  We utilized NetDriller to analyze September 11 terror network.It is surprising to realize that those who planned for the attacks considered all difficulties they could face.In other words, the network continues to be connected after removing terrorists who were identified as leaders down to level 6.The same is not true when the network of Madrid attacks was analyzed.The latter network became disperse only after removing second level leaders as shown in Figure 4. NetDriller was also employed using gene expression data related to prostate cancer to identify proteins attributed to the disease.The main result reported by Net-Driller is shown in Figure 5.
Finally, next are some of our ongoing and planned research activities based on the promising results reported in our already published papers.First, in bioinformatics we are tracking disease evolution: spatial and temporal aspects, drug repositioning, etc.Second, in health Informatics we are working on patient monitoring, referral optimization and prediction, etc. Third, we demonstrated the applicability and effectiveness of sequence analysis and prediction for various domains, including financial (e.g., stock, forex), weather, traffic, energy, etc.Finally, other domains and applications considered by our research recommendation, sentiment analysis, opinion mining, spam detection, homeland security, close monitoring and analysis for early warning, etc.
To sum up, our research efforts described in our papers published in the literature and listed in the bibliography illustrate how data mining and network analysis are powerful techniques for data analysis.Further, it is possible to analyze huge amounts of data using limited computing resources, and to develop integrated solutions by combining various aspects leading to robust framework.We have succeeded in developing some techniques from scratch and we also expanded some existing techniques to produce working solutions for our industrial and academic partners.We could help in sophisticated data analysis to maximize knowledge discovery for informative decision making.

Fig. 1
Fig. 1 Construction of FP-tree top-down and left-to-right

Fig 3 .
Fig 3. Sep.11 network changes after excluding each level nodes for eigenvector centrality measure.

Fig 4 .
Fig 4. Madrid network changes after excluding each level nodes for eigenvector centrality measure

Fig 5 .
Fig 5.A list of genes and part of the gene-gene network related to prostate cancer.