Bài giảng Bảo mật cơ sở dữ liệu - Security Methods for Statistical Databases
Bạn đang xem 20 trang mẫu của tài liệu "Bài giảng Bảo mật cơ sở dữ liệu - Security Methods for Statistical Databases", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên
Tài liệu đính kèm:
- bai_giang_bao_mat_co_so_du_lieu_security_methods_for_statist.pptx
Nội dung text: Bài giảng Bảo mật cơ sở dữ liệu - Security Methods for Statistical Databases
- Security Methods for Statistical Databases
- Introduction ▪ Statistical Databases containing medical information are often used for research ▪ Some of the data is protected by laws to help protect the privacy of the patient ▪ Proper security precautions must be implemented to comply with laws and respect the sensitivity of the data
- Accuracy vs. Confidentiality Accuracy – Confidentiality – Researchers Patients, laws want to extract and database accurate and administrators meaningful data want to maintain the privacy of patients and the confidentiality of their information
- Laws ▪ Health Insurance Portability and Accountability Act – HIPAA (Privacy Rule) ▪ Covered organizations must comply by April 14, 2003 ▪ Designed to improve efficiency of healthcare system by using electronic exchange of data and maintaining security ▪ Covered entities (health plans, healthcare clearinghouses, healthcare providers) may not use or disclose protected information except as permitted or required ▪ Privacy Rule establishes a “minimum necessary standard” for the purpose of making covered entities evaluate their current regulations and security precautions
- HIPAA Compliance ▪ Companies offer 3rd Party Certification of covered entities ▪ Such companies will check your company and associating companies for compliance with HIPAA ▪ Can help with rapid implementation and compliance to HIPAA regulations
- Types of Statistical Databases ▪ Static – a static ▪ Dynamic – changes database is made continuously to reflect once and never real-time data changes ▪ Example: most online ▪ Example: U.S. Census research databases
- Security Methods ▪ Access Restriction ▪ Query Set Restriction ▪ Microaggregation ▪ Data Perturbation ▪ Output Perturbation ▪ Auditing ▪ Random Sampling
- Access Restriction ▪ Databases normally have different access levels for different types of users ▪ User ID and passwords are the most common methods for restricting access ▪ In a medical database: ▪ Doctors/Healthcare Representative – full access to information ▪ Researchers – only access to partial information (e.g. aggregate information)
- Query Set Restriction ▪ A query-set size control can limit the number of records that must be in the result set ▪ Allows the query results to be displayed only if the size of the query set satisfies the condition ▪ Setting a minimum query-set size can help protect against the disclosure of individual data
- Query Set Restriction ▪ Let K represents the minimum number or records to be present for the query set ▪ Let R represents the size of the query set ▪ The query set can only be displayed if K R
- Query Set Restriction Query 1 Query 2 Original Database Query 2 Query Results K Results Query 1 K Query Results Results
- Microaggregation ▪ Raw (individual) data is grouped into small aggregates before publication ▪ The average value of the group replaces each value of the individual ▪ Data with the most similarities are grouped together to maintain data accuracy ▪ Helps to prevent disclosure of individual data
- Microaggregation ▪ National Agricultural Statistics Service (NASS) publishes data about farms ▪ To protect against data disclosure, data is only released at the county level ▪ Farms in each county are averaged together to maintain as much purity, yet still protect against disclosure
- Microaggregation Age Microaggregated Age 10 11.67 12 Average 11.67 13 11.67 57 56.67 54 Average 56.67 59 56.67
- Microaggregation User ry e s u lt u Q s e R Averaged Original Microaggregated Data Data
- Data Perturbation ▪ Perturbed data is raw data with noise added ▪ Pro: With perturbed databases, if unauthorized data is accessed, the true value is not disclosed ▪ Con: Data perturbation runs the risk of presenting biased data
- Data Perturbation User 1 Noise Added ry e s u lt u Q s e R Original Perturbed Database Database R e su Q lts u e ry User 2
- Output Perturbation ▪ Instead of the raw data being transformed as in Data Perturbation, only the output or query results are perturbed ▪ The bias problem is less severe than with data perturbation
- Output Perturbation Query User 1 Results ery Qu ts sul Re Noise Added to Results Original Database Re su lts Query Q Results ue ry User 2
- Auditing ▪ Auditing is the process of keeping track of all queries made by each user ▪ Usually done with up-to-date logs ▪ Each time a user issues a query, the log is checked to see if the user is querying the database maliciously
- Random Sampling ▪ Only a sample of the records meeting the requirements of the query are shown ▪ Must maintain consistency by giving exact same results to the same query ▪ Weakness - Logical equivalent queries can result in a different query set
- Comparison Methods The following criteria are used to determine the most effective methods of statistical database security: ▪ Security – possibility of exact disclosure, partial disclosure, robustness ▪ Richness of Information – amount of non-confidential information eliminated, bias, precision, consistency ▪ Costs – initial implementation cost, processing overhead per query, user education
- A Comparison of Methods Method Security Richness of Costs Information Query-set Restriction Low Low1 Low Microaggregation Moderate Moderate Moderate Data Perturbation High High-Moderate Low Output Perturbation Moderate Moderate-low Low Auditing Moderate-Low Moderate High Sampling Moderate Moderate-Low Moderate 1 Quality is low because a lot of information can be eliminated if the query does not meet the requirements
- Sources ▪ This presentation is posted on ▪ Adam, Nabil R. ; Wortmann, John C.; Security- Control Methods for Statistical Databases: A Comparative Study; ACM Computing Surveys, Vol. 21, No. 4, December 1989 ( adam.pdf?key1=76895&key2=1947043301&coll=portal&dl=ACM&CFID=4702747&CFTOKEN=83773110) ▪ Official HIPAA – ( incur ▪ Bernstein, Stephen W.; Impact of HIPAA on BioTech/Pharma Research: Rules of the Road ( ▪ Service Bureau; 3rd Party Testing (