Robomongo : $group의 메모리 제한을 초과함

programing

Robomongo : $group의 메모리 제한을 초과함

mailnote 2023. 7. 15. 10:25

Robomongo : $group의 메모리 제한을 초과함

mongo에 중복된 것을 제거하기 위해 스크립트를 사용하고 있는데, 테스트로 사용한 10개의 항목으로 컬렉션에서 작동했는데 600만 개의 문서로 실제 컬렉션에 사용했을 때 오류가 발생합니다.

로보몽고(현재 로보 3T)에서 실행한 스크립트는 다음과 같습니다.

var bulk = db.getCollection('RAW_COLLECTION').initializeOrderedBulkOp();
var count = 0;

db.getCollection('RAW_COLLECTION').aggregate([
  // Group on unique value storing _id values to array and count 
  { "$group": {
    "_id": { RegisterNumber: "$RegisterNumber", Region: "$Region" },
    "ids": { "$push": "$_id" },
    "count": { "$sum": 1 }      
  }},
  // Only return things that matched more than once. i.e a duplicate
  { "$match": { "count": { "$gt": 1 } } }
]).forEach(function(doc) {
  var keep = doc.ids.shift();     // takes the first _id from the array

  bulk.find({ "_id": { "$in": doc.ids }}).remove(); // remove all remaining _id matches
  count++;

  if ( count % 500 == 0 ) {  // only actually write per 500 operations
      bulk.execute();
      bulk = db.getCollection('RAW_COLLECTION').initializeOrderedBulkOp();  // re-init after execute
  }
});

// Clear any queued operations
if ( count % 500 != 0 )
    bulk.execute();

다음은 오류 메시지입니다.

Error: command failed: {
    "errmsg" : "exception: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.",
    "code" : 16945,
    "ok" : 0
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:23:13
doassert@src/mongo/shell/assert.js:13:14
assert.commandWorked@src/mongo/shell/assert.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1215:5
@(shell):1:1

그래서 저는 설정해야 합니다.allowDiskUse:true일하러?스크립트의 어디에서 수행해야 하며 이 작업에 문제가 없습니까?

{ allowDiskUse: true }

집계 파이프라인 바로 뒤에 배치해야 합니다.

코드에서는 다음과 같이 표시됩니다.

db.getCollection('RAW_COLLECTION').aggregate([
  // Group on unique value storing _id values to array and count 
  { "$group": {
    "_id": { RegisterNumber: "$RegisterNumber", Region: "$Region" },
    "ids": { "$push": "$_id" },
    "count": { "$sum": 1 }      
  }},
  // Only return things that matched more than once. i.e a duplicate
  { "$match": { "count": { "$gt": 1 } } }
], { allowDiskUse: true } )

참고: 사용{ allowDiskUse: true }집계 파이프라인이 디스크의 임시 파일에서 데이터에 액세스할 때 성능과 관련된 문제가 발생할 수 있습니다.또한 디스크 성능과 작업 세트의 크기에 따라 다릅니다.사용 사례에 대한 테스트 성능 테스트

데이터가 큰 경우에는 항상 그룹 전 일치를 사용하는 것이 좋습니다.그룹 전 일치를 사용하는 경우에는 이 문제가 발생하지 않습니다.

db.getCollection('sample').aggregate([
   {$match:{State:'TAMIL NADU'}},
   {$group:{
       _id:{DiseCode:"$code", State:"$State"},
       totalCount:{$sum:1}
   }},

   {
     $project:{
        Code:"$_id.code",
        totalCount:"$totalCount",
        _id:0 
     }   

   }

])

만약 당신이 이 문제를 비길 데 없이 정말로 극복한다면, 해결책은{ allowDiskUse: true }

다음은 대부분의 경우 디스크 사용을 방지하는 데 도움이 되는 문서화되지 않은 간단한 방법입니다.

중간자를 사용할 수 있습니다.$project전달된 기록의 크기를 줄이기 위한 단계.$sort단계.

이 예에서는 다음으로 이동합니다.

var bulk = db.getCollection('RAW_COLLECTION').initializeOrderedBulkOp();
var count = 0;

db.getCollection('RAW_COLLECTION').aggregate([
  // here is the important stage
  { "$project": { "_id": 1, "RegisterNumber": 1, "Region": 1 } }, // this will reduce the records size
  { "$group": {
    "_id": { RegisterNumber: "$RegisterNumber", Region: "$Region" },
    "ids": { "$push": "$_id" },
    "count": { "$sum": 1 }      
  }},
  { "$match": { "count": { "$gt": 1 } } }
]).forEach(function(doc) {
  var keep = doc.ids.shift();     // takes the first _id from the array

  bulk.find({ "_id": { "$in": doc.ids }}).remove(); // remove all remaining _id matches
  count++;

  if ( count % 500 == 0 ) {  // only actually write per 500 operations
      bulk.execute();
      bulk = db.getCollection('RAW_COLLECTION').initializeOrderedBulkOp();  // re-init after execute
  }
});

첫 번째를 보다$project디스크 사용을 방지하기 위한 단계입니다.

이 기능은 집계에서 사용되지 않는 대부분의 데이터를 사용하여 대량의 레코드를 수집할 때 특히 유용합니다.

MongoDB 문서에서

$group 단계에는 RAM이 100MB로 제한됩니다.기본적으로 스테이지가 이 제한을 초과하면 $group에서 오류가 발생합니다.그러나 대용량 데이터셋을 처리할 수 있도록 하려면 allowDiskUse 옵션을 true로 설정하여 $group 작업이 임시 파일에 쓸 수 있도록 합니다.자세한 내용은 db.collection.aggregate() 메서드 및 aggregate 명령을 참조하십시오.

언급URL : https://stackoverflow.com/questions/44161288/robomongo-exceeded-memory-limit-for-group

'programing' 카테고리의 다른 글

Nestjs 종속성 주입 및 DDD / Clean Architecture (0)	2023.07.15
포함된 Tomcat org.springframework.context를 시작할 수 없습니다.응용 프로그램 컨텍스트 예외 (0)	2023.07.15
Git를 사용하여 변경 로그를 관리하는 좋은 방법은 무엇입니까? (0)	2023.07.15
Angular 2-Types 스크립트에서 선택적 클래스 매개 변수를 설정하는 방법은 무엇입니까? (0)	2023.07.15
이 SQL Server 제약 조건에서 PAD_INDEX의 목적은 무엇입니까? (0)	2023.07.15

현재글Robomongo : $group의 메모리 제한을 초과함

각종 프로그래밍 정보를 다루는 블로그입니다.

reactjs, JSON, jquery, MySQL, wpf, sql-server, Android, spring-boot, AngularJS, MariaDB, ASP.NET, python, GIT, ajax, WordPress, MongoDB, c, powershell, Oracle, Excel,

Today :
Yesterday :

mailnote

Robomongo : $group의 메모리 제한을 초과함

Robomongo : $group의 메모리 제한을 초과함

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

« 2025/10 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Robomongo : $group의 메모리 제한을 초과함

Robomongo : $group의 메모리 제한을 초과함

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바