Grok parsing pattern for CloudWatch logs

Hi,

Did anyone tried to parse cloudwatch logs with GROK pattern. I’m trying to parse my index, application, search logs however failing to do so especially for json data in source attribute .

Can you help me with it?

[2022-05-15T05:22:52,435][TRACE][index.indexing.slowlog.index] [4388a5411b6eabcac004fa2c8acca911] [sys_abc-mw-svcs_prd-20220515-000221/ABCsg8_pREa7L49RY3ct3g] took[1.4s], took_millis[1485], type[_doc], id[qnabc4ABmr-WAurf3RUy], routing[], source[{“className”:“com.syf.mw.makepayment.dao.MakePaymentDao”,“app_id”:“abcdddc-0771-4d0a-bd40-14f8b30e0e44”,“source_type”:“APP/PROC/WEB”,“logLevel”:“INFO”,“system”:{“syslog”:{“version”:“1”}},“unknown_stg”:"[APP/PROC/WEB/4]",“space_name”:“ABC-MW-SVCS”,"proces]

Hi @Santu

Thanks for reaching out!

This a little bit outta of my realm of understanding, however I do want to make sure you get the support needed here.

I have gone ahead a looped in the Log Engineering team as they will have a better understanding here. Please note they will reach out here with their findings!

Please feel free to reach out with any questions or updates you may have!

Thank you @dcody . so far I could come up with below pattern and as you see the last source section where I’m struglling

GROK Pattern:

%{TIMESTAMP_ISO8601:time}\]\[%{WORD:loglevel}\]\[%{WORD:logname}\.%{WORD:logname}\.%{WORD:logname}\.%{WORD:logname}\] \[%{NOTSPACE:key}\] \[%{NOTSPACE:source}\] took\[%{NOTSPACE:took_seconds}\], took_millis\[%{NOTSPACE:took_millis}\], type\[%{NOTSPACE:type}\], id\[%{NOTSPACE:id}\], routing\[\], source\[%{NOTSPACE:source}\]

Output:

{
  "time": [
    [
      "2022-05-15T05:22:52,435"
    ]
  ],
  "YEAR": [
    [
      "2022"
    ]
  ],
  "MONTHNUM": [
    [
      "05"
    ]
  ],
  "MONTHDAY": [
    [
      "15"
    ]
  ],
  "HOUR": [
    [
      "05",
      null
    ]
  ],
  "MINUTE": [
    [
      "22",
      null
    ]
  ],
  "SECOND": [
    [
      "52,435"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ],
  "loglevel": [
    [
      "TRACE"
    ]
  ],
  "logname": [
    [
      "index",
      "indexing",
      "slowlog",
      "index"
    ]
  ],
  "key": [
    [
      "4388a5411b6eabcac004fa2c8acca911"
    ]
  ],
  "source": [
    [
      "sys_abc-mw-svcs_prd-20220515-000221/ABCsg8_pREa7L49RY3ct3g",
      "{“className”:“com.syf.mw.makepayment.dao.MakePaymentDao”,“app_id”:“abcdddc-0771-4d0a-bd40-14f8b30e0e44”,“source_type”:“APP/PROC/WEB”,“logLevel”:“INFO”,“system”:{“syslog”:{“version”:“1”}},“unknown_stg”:"[APP/PROC/WEB/4]",“space_name”:“ABC-MW-SVCS”,"proces"
    ]
  ],
  "took_seconds": [
    [
      "1.4s"
    ]
  ],
  "took_millis": [
    [
      "1485"
    ]
  ],
  "type": [
    [
      "_doc"
    ]
  ],
  "id": [
    [
      "qnabc4ABmr-WAurf3RUy"
    ]
  ]
}

Hey @Santu,

This is a bit of a difficult one to solve. As long as the structure of the log doesn’t change much, you can use this to parse out all of the attributes:

%{TIMESTAMP_ISO8601:time}\]\[%{WORD:loglevel}\]\[%{WORD:logname}\.%{WORD:logname}\.%{WORD:logname}\.%{WORD:logname}\] \[%{NOTSPACE:key}\] \[%{NOTSPACE:source}\] took\[%{NOTSPACE:took_seconds}\], took_millis\[%{NOTSPACE:took_millis}\], type\[%{NOTSPACE:type}\], id\[%{NOTSPACE:id}\], routing\[\], source%{NOTSPACE}className\"\:\"%{DATA:className}\"%{NOTSPACE}app_id\"\:\"%{DATA:app_id}\"%{NOTSPACE}source_type\"\:\"%{DATA:source_type}\"%{NOTSPACE}logLevel\"\:\"%{DATA:logLevel}\"%{NOTSPACE}version\"\:\"%{DATA:systemSyslogVersion}\"%{NOTSPACE}unknown_stg\"\:\"%{DATA:unknown_stg}\"%{NOTSPACE}space_name\"\:\"%{DATA:space_name}\"

You could also use something like this to parse out the source asJSON, and our ingestion pipeline should parse that JSON out automatically if you name it message:

%{TIMESTAMP_ISO8601:time}\]\[%{WORD:loglevel}\]\[%{WORD:logname}\.%{WORD:logname}\.%{WORD:logname}\.%{WORD:logname}\] \[%{NOTSPACE:key}\] \[%{NOTSPACE:source}\] took\[%{NOTSPACE:took_seconds}\], took_millis\[%{NOTSPACE:took_millis}\], type\[%{NOTSPACE:type}\], id\[%{NOTSPACE:id}\], routing\[\], source\[%{DATA:message}\,\"unknown_stg\"\:\"%{DATA:unknown_stg}\"%{NOTSPACE}space_name\"\:\"%{DATA:space_name}\"

I hope this helps!