生活随笔 技术点滴
猫猫给我的链接,下载了天涯论坛泄露的用户数据,解压缩后一共50个txt文件,1.7G的数据量,据称有4千万的用户数据。
50个文件遍历比较困难,自然想到用Oracle处理这些数据。
sql load是个不错的工具
建表
create table tianya (username varchar2(500),pwd varchar2(500),mail varchar2(500));
新建控制文件1.ctl,内容如下
load data infile ‘tianya_1.txt’ infile ‘tianya_2.txt’ infile ‘tianya_3.txt’ infile ‘tianya_4.txt’ infile ‘tianya_5.txt’ infile ‘tianya_6.txt’ infile ‘tianya_7.txt’ infile ‘tianya_8.txt’ infile ‘tianya_9.txt’ infile ‘tianya_10.txt’ infile ‘tianya_11.txt’ infile ‘tianya_12.txt’ infile ‘tianya_13.txt’ infile ‘tianya_14.txt’ infile ‘tianya_15.txt’ infile ‘tianya_16.txt’ infile ‘tianya_17.txt’ infile ‘tianya_18.txt’ infile ‘tianya_19.txt’ infile ‘tianya_20.txt’ infile ‘tianya_21.txt’ infile ‘tianya_22.txt’ infile ‘tianya_23.txt’ infile ‘tianya_24.txt’ infile ‘tianya_25.txt’ infile ‘tianya_26.txt’ infile ‘tianya_27.txt’ infile ‘tianya_28.txt’ infile ‘tianya_29.txt’ infile ‘tianya_30.txt’ infile ‘tianya_31.txt’ infile ‘tianya_32.txt’ infile ‘tianya_33.txt’ infile ‘tianya_34.txt’ infile ‘tianya_35.txt’ infile ‘tianya_36.txt’ infile ‘tianya_37.txt’ infile ‘tianya_38.txt’ infile ‘tianya_39.txt’ infile ‘tianya_40.txt’ infile ‘tianya_41.txt’ infile ‘tianya_42.txt’ infile ‘tianya_43.txt’ infile ‘tianya_44.txt’ infile ‘tianya_45.txt’ infile ‘tianya_46.txt’ infile ‘tianya_47.txt’ infile ‘tianya_48.txt’ infile ‘tianya_49.txt’ infile ‘tianya_50.txt’ append into table system.tianya FIELDS TERMINATED BY ‘ ‘ OPTIONALLY ENCLOSED BY ‘”‘ trailing nullcols (username,pwd,mail)
执行命令
sqlldr system/system control=1.ctl
最终加载进去3100万多的数据,1.7G的txt用我的笔记本处理10分钟完成(i5,6G mem),oracle的处理能力还是很彪悍的。
加载完成后创建索引方便查询
create index idx_tianya_username on tianya (username);
create index idx_tianya_mail on tianya (mail);
我的密码我是找回来了,后果就是所有网站的密码都得改。
我的邮箱密码是一套,通讯工具是一套,网站的账号是一套,支付密码是独立的。
不同的安全级别设置不同的密码,能防止信息泄露。
You can be the first to comment!