边google边测试,目前将HIVE的小文件问题基本解决。
下为个人事件中的解决方案,请大家多指正~
——————————————————
个人理解:造成小文件数过多的直接原因是reduce数过多,每个reduce在每次DML操作时候都会产生至少一个小文件。
我这边遇到reduce数过多的情况,往往是利用动态分区方式、跨比较大的时间范围查询并写入报表。
——————————————————
个人解决方案(结合了百度到的几种方案,测试可有效解决问题)
设置参数:
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set hive.hadoop.supports.splittable.combineinputformat = true;
set hive.merge.mapredfiles = true; /* Merge small files at the end of a map-only job. */
set hive.merge.mapfiles=true; /* Merge small files at the end of a map-reduce job. . */
另外,在代码的最外层select 后加一个distribute by rand(),据说可以比较均匀的重新分配写入的文件数,我也每次都带上了。
Buy Urosin Viagra Acquisto Forum viagra Viagra Professional 100 Mg
Methotrexate And Amoxicillin Interactions Cialis Prix Andorre proscar shopping Cialis Farmacias Similares Order Generic Propecia
Citalopram Tablets Online online pharmacy Discount Doryx Drugs Mail Order Cialis Online Discount Refill Prednisone
My developer is trying to persuade me to move to .net from PHP.
I have always disliked the idea because of the costs.
But he’s tryiong none the less. I’ve been using Movable-type on numerous websites for about a year
and am worried about switching to another platform.
I have heard great things about blogengine.net.
Is there a way I can transfer all my wordpress posts into it?
Any help would be greatly appreciated!
cialis 20 mg vademecum https://bestadalafil.com/ – Cialis Deissg Cialis Llulho Ann Emerg Med. https://bestadalafil.com/ – buy cialis non prescription