Friday, August 2, 2013

BigData - Health Industry usages - Part 1

"Information is the oil of the 21st century, and analytics is the combustion engine.” Big data makes it a reality to harness the power of deep analytics, on very huge data sets and streams in either batch/real-time, using parallel programming concepts and cheap processing power of commodity servers. Big data is typically identified by huge volumes of data, high velocity of data capture and high variety of data collection. Since there is a limit on vertical scalability in terms of processing power - using a single very powerful server is no more relevant. It tends to be much more expensive to scale up rather than scale horizontally. As an example if you have a task to lift a weight of 300 lbs, it is very difficult and expensive to find and hire a single person who can do that task rather than employing 6 average strength humans who can lift 50 lbs each. As one can find that the more the weight to be lifted goes up, the more difficult and expensive one to find a single person to do that task. We hear several things on how big data is going to make health services more personalized and more economical. We are already implementing/implemented Meaningful Use I & II procedures to streamline the patients' data management. There are several technologies that come into play to utilize the big data paradigm for health industry. Listed below are some of the important procedures and technologies that will make health industry a pro-active, dynamic, economical, personalized and self-aware entity that is going to change the way health related services will be made available to the communities across the world. In-Memory and CEP(complex event programming): ============================================= Pro-active and Predictive Monitoring and Alert system - personalized: ===================================================================== Currently when a patient is in ICU, various monitoring systems are hooked up to him/her to measure heart rate, blood pressure, oxygen saturation,... and when any reading crosses a specific set of values, an alert is being sent to the monitoring control room and medical staff will attend to that patient as needed. However none of this entire stream of monitoring data is being collected or stored for any further analysis - reason being a need to have huge amounts of data storage and subsequent very high processing power to analyze that at a later time. With the arrival of bigdata(Hadoop - HDFS & MapReduce related bigdata ecosystem technologies like HBase, Hive, Sqoop, Pig, ZooKeeper, Mahout, R, CouchDB, MongoDB, Neo4J,..Cloudera and Hartonworks platforms, Amazon Elastic cloud storage, Oracle Exadata, Exalytics, SQL server 2014, Attensity, OpenChorous and various NoSQL databases like ColumnOriented-DocumentedOriented-GraphOriented-Spatial ), it is now possible for organizations to store as well as process huge amounts of various formats of data either structured/semi-structured/unstructured in a batch or real-time fashion. One can ask how it is going to help if we not only monitor the patient's vital information but also store it and analyze it in real time. The system can store all this streams of monitoring data in an in-memory database and do analytics on it to predict and pro-actively advise the medical staff that a serious condition manifestation may be possible in some time in near future even though none of the monitoring levels crossed any threshold levels yet. This will give medical staff very much needed few seconds or minutes of advanced warning to provide for better life saving services. The streams of monitoring data thus saved in in-memory database can be persisted either fully or in a filtered way in a hdfs based database for further analysis. SmartPhones and Location Services: ================================== The use of smart phones became so rampant across the world recently and now we can utilize the location services data collected by various communication service providers in a very pro-active way to predict and provide better health services. Recently Snowden from NSA generated some public discussion on various types of data being collected and used by the US Government. This information is nothing new and rather trivial to the people following the advancement of technologies and related applications development. Back to the topic, how we can use location services provided by the smart phones for better health systems development. Each smartphone having a built-in GPS receiver uses 3 GPS satellites to determine it's postion on the ground within few feets of accuracy. Earlier cell phones can get location by using algorithms based on cell towers communication. We are focussing on smart phones for this topic. Pro-Active Disease Transmission Alert - at personal level: ========================================================= Let us say a family member/friend of a registered member/patient of a clinic/hospital travelled to China when SARS epidemic was high and returned back to US and either stayed at the same residence or have a close interaction somewhere else with the registered member. With the help of bigdata, it is possible to pro-actively alert the registered member by the health system by just monitoring the location data of cell phone used by the registered member and all other cell phones which are physically close during any given point of time period. One can ask how such information can be determined with high accuracy and where the BigData with Location services come into play. Let us say the registered member's smart phone number is with a local clinic/hospital. Based on his/her location/movements on a daily basis and possible location proximity and time spent nearby other cell phone members, the health systems can pro-actively determine possible transmission rate of various viruses. Each virus may have different effective spread distance and exposure time period. The travel data of the all the cell phone members which are in close physical range can be gathered from travel organizations and the government. As you can see the amount of data needed for such an analysis is very huge - you are literally storing/accessing cell phone location data of all the people who are in a close proximity with the registered member in any given amount of time period - say 1 day, 1 mont, 1 year or multiple years. Storing and Analysis on such huge data sets was not possible earlier. Now with big data we not only can store huge amounts of high variety of data reliably but can also process it economically and in near a real time fashion. Pro-Active Food contamination Alerts - at personal level: ======================================================== We often see recalls of food items by various super market chains. We can use registered member(s) food purchasing habits and food contamination alerts to personalize real-time alerts to the registered members. Let us say I shop and Stop&Shop and I am a registered member with LIJ and my/my parent's credict card information is with the LIJ(in a secure fashion). Now let us say a particular brand of milk product is being recalled by the Stop&Shop. The health system with the help of big data can pro-actively monitor and determine the possible food poisoning by using food recall alerts from Stop&Shop and my/myhousehold's food purchasing habits and recent purchase listings at Stop&Shop and can issue a pro-active alert and possible remedies in case of consumption of such foods to me to stop using a specific food item(s) in near real- time. Same can be applied to my eating out habits and local restaurants and related food poisoning alerts. Evidence Based Medical Practices - personalized: ================================================ As more and more people around the world travel more and the societies become more and more diverse, it will become more and more complex for a medical practioner to identify and attend to illneses of his/her patients who may be from over 20 nationalities. The patient set can be from different countries, different religions, different food habits, different blood related diseases, different allergies, different age groups, different life styles,... and so on which in turn may make diagnosis more complex for the medical practictioners. With the advent of huge data collection, storage, analytics and now with big data - it is now possible to access and analyze such data either in structured/semi-structured/unstructured in an economical and in near real time frame. As one saw how deep blue defeated Gary Casparov(chess game: logical analysis) and recently Watson won the Jeopardy championship( hypothesis generation, massive evidence gathering, analysis, and scoring), we are going to use it's presence in medical practice in a big way in near future. Already Watson based medical analysis systems are ready to go live this year: for utilization management decisions in lung cancer treatment at Memorial Sloan–Kettering Cancer Center in conjunction with health insurance company WellPoint. With the help of big data, now huge amounts of patients information can be collected, processed and analyzed to identify various patterns and to determine most relevant and economical treatment for a patient's specific condition. This way medical practice can become highly personalized and at the same time more effective and economical. (by eliminating non-relevant tests and by providing effective treatment option based on each patient's condition). Over the time as the more data gets stored and advanced analytics gets implemented, a medical Watson system can become a one stop answer to all medical practitioners and staff. As seen at Memorial Sloan–Kettering Cancer Center this year already over 90% of nurses follow the guidance given by medical Watson system. The time is not far it to be similar for most of the doctors to follow the guidance given by Medical Watson. The practice of medical insurance can be further extended to areas like educational systems insurance by pro-actively monitoring each individual right from pre-K at a very granular level and effectively predicting most relevant areas for successful career paths and thereby making it possible for each individual to become a higher contributor to the society at a later stage. This way evidence based courses/degrees can be offered to registered students in the educational insurance system at a discounted price or for free. Indirect usages: Determining patients satisfaction and sentiments, patients' behavior analysis, pro-active community engagement for managing and controlling diseases spread, automated personalized travel medical advice and nutrition suggestions, personalized medical advice based on family and community health data,.... Big Data Eco system: using Hadoop one can eliminate the need for expensive clustering solutions offered by various vendors to support higher availability. using Hive one train their employees who are good with SQL to work on big data, using Sqoop one can import data into Hadoop from RDBMs systems, using Mahout one can execute various data mining methods, using R one can develop various statistical analytical programs, using PIG system administrators can interact with Hadoop, using Hartonworks and Cloudera solutions one can accelerate big data implementations, using HBase one can implement a data warehousing solution on top of Hadoop, using Neo4J one can implement graph based relationships, using Riak one can implement key value pair analysis, using Amazon elastic cloud or Microsoft Azure one can go for cloud based solutions, using Storm one can analyze streams of data in real time, ....and so on. Big data is not just one technology but rather a set of technologies that work together to provide the needed solutions.

No comments:

Post a Comment