Create an extractor
  • 8 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Create an extractor

  • Dark
    Light
  • PDF

DNIF library has extractors ready to ingest data from all types of devices including IT infrastructure like networking equipment, servers and laptops as well as Applications such as Internal Portals, Email, Document Storage. Ingested logs are parsed to extract relevant fields, enriched with additional context and stored in the Datanode. A standard domain-specific extensible schema is applied during ingestion to help you organize your data to build and maintain quality analytics across the enterprise.

How to view Extractors?

  • Hover on the System icon on the left navigation panel and select Extraction, the following page will be displayed.

image.png

The above page displays the following details.

Field Description
Name Displays the name of the extractor
Log Name Displays the specific log type and log name of this particular extractor
Author Displays the email address of the user who created this extractor.
Type Displays the type of extractor, whether it is a custom / native extractor.
Version Displays the version number of the extractor.
image.png Allows you to delete the extractor, this will be displayed only for custom extractors
image.png Click this to refresh the extractor list
image.png Click this to search for a particular extractor
image.png Click this to add a new extractor

How to create Custom Extractors?

Method I

To create a custom extractor, click the plus icon on the extractor list page. A side panel will be displayed as follows.

image.png

  • Enter a name for the extractor you are about to create and you can directly start writing in the yaml editor.

image.png

  • Click Submit after writing the parser and it will be listed in the extractor list.

By default, the exisitng native extractor will be disabled and custom will be added to the name of extractor.

The extractor will be listed on the extractor list page.

Method II

In this method you can create a custom exrtractor by cloning the existing native extractor.

  • To create a custom extractor by cloning, click the name of an exisitng native extractor, the yml page will be dispalyed.

image.png

  • Make the required changes and click Submit at the end of the page, the following screen will be displayed.

image.png

  • Click Save to create a custom extractor.

How to write an extractor using a yaml file?

Extractors are built-in .yaml files, the value for ExtractorID, SourceName, and SourceType is populated along with the assigned key value.

Basic Information

Each extractor will have the following basic information

Field Description
schema-version The version assigned to the extractor.
extractor-id The unique ID assigned to the extractor
source-name The name assigned to the extractor as per the device. Example: Fortigate, Checkpoint, etc
source-type The type of device.
    Example: Firewall, OS, switch etc.
source-description A short description regarding the extractor

Stream

Stream is a domain-specific collection of data from different sources that contributes to a unique dataset and a unique set of use-cases. Each value in the Stream field within the extractor can be used to generate a search that returns a particular dataset with information.
image.png

This is the section where we could define the streams that are included in the extractor. There are various streams such as: AUTHENTICATION, SYSMON-PROCESS, SYSMON-NETWORK, IAM etc
Example: Authentication refers to the login and logout activity events, IAM refers to the User Management events such as create user, delete user.

Master Filters

The Master Filters and First Matches helps to identify the extractor to be applied to a given log source, it has been heavily optimized for performance.

Event Details:

The following configuration should be done under Event details

  • First Match
    First Matches will help us identify different patterns associated with a log source.

    • Each first match will be associated with a decoder.
    • First matches can now yield multiple events if used with decoder=json or custom-kv.

The decoder=regex is the legacy event detail approach that will be relevant for older devices.

  • Decoder
    Decoder section defines the type of decoder to be used on the basis of the log format. Decoders are defined at the First match level, therefore, we could use multiple decoders in an extractor file.
    There are 3 decoders available

    • JSON: It is written as ‘decoder: json’ in the extractor files. The log samples which are in JSON format could be parsed using this decoder. It parses all the key-values correctly that are rendered in the log sample.
      Note: Regex is not required to parse key values.

    • Custom(key-value): It is written as ‘decoder: custom’ in the extractor files. The log samples which are in key-value format could be parsed using this decoder. Here, we have to write a generic regex that captures the key and value from the log samples appropriately.
      Example: Refer to the below snapshot:
      image.png As per the snapshot, it is seen that a generic regex is written to capture the key value in the log sample. This regex will result in groups of keys and values, displayed in the image below. Further, the Key could be annotated as per the field Annotations in the extractor.
      image.png

    • Regex: It is written as ‘decoder: regex’ in the yml files. The log samples that are in Syslog (only values) format could be parsed using this decoder. Here, we have to write the regex and define the field name in it. This field name could be mapped and annotated in the extractor accordingly.
      Example: Refer to the below snapshot
      image.png
      In the snapshot, it can be seen that field names are defined in the regex. This could be achieved by writing (?P<field_name>) at the start of the group.
      image.png

  • Event Key Format
    In the event-key-format section we have to define the field on the basis of which we could achieve an accurate present in the log event.
    For example: Refer to the snapshot below:
    image.pngIn the snapshot above one can see that First Match is defined on the basis of SourceName and further it is segregated on the basis of EventID in the ‘event-key-format’ section.

  • Event Key Mapping:
    In the ‘event-key-mapping’ section the events could be defined with appropriate Streams. Although while specifying an event in this section one needs to ensure each event identifies itself with a Stream.
    image.png
    In the snapshot, EventID is defined as the pointer that provides us maximum information about the log event. Refer to the below table to understand regarding annotate and translate fields.
    image.png

Field Description
annotate Static key value for Stream, Action and status to be added as per the log event’s information. In the above snapshot, relevant Stream, Action and Status is defined as per the log event’s information.
  • Stream: Type of log
  • Action: Action Performed in the log event. Eg: Login, Logout
  • Status: Status of the action performed in the log event. Eg: Passed, Failed
  • translate All the relevant fields as per the stream should be defined under the translate section. Allows you to replace the fields as per DNIF terminology.
    • Fallback:
      Fallback is a mandatory field. All the events that are defined with Stream will be parsed accurately, while the undefined events for that particular First Match will parse under the fallback section.
      Example: Let us consider we have created the First Match on the basis of SourceName for Windows Extractor and the further division is on the basis of EventID. In this we have defined some EventIDs with proper stream while some of the EventIDs could not be defined, this undefined EventID will then parse under the fallback section.

    Refer the snapshot for fallback events field definition:

    image.png

    • Globals

    Globals is a non-mandatory field. In this section we could define the generic fields that are present throughout the Extractor.

    Refer the snapshot for globals definition:
    image.png

    • Substitutions

    In most of the devices, there are substitutions provided for some fields. This substitution can be defined under the globals section as follows:

    image.png

    To make this work for multiple samples (First Matches) in the Extractor, following procedures can be followed.

    For example:
    image.pngimage.png

    We have two first matches here.

    Referring to the first occurrence of First Match, we have subs defined for it. The only addition being (&id001) present against subs. The character ‘&’ denotes assigning value of subs to the variable id001. Once we assign this to a variable, we can reuse it wherever we are using the same values, as in case of subs. (& sign is placed before variable name as assignment)

    Referring to the second occurrence of First Match, we have subs but now we are only using (id001). So we are basically reusing the subs defined before. The symbol ‘’ is used with id001 to refer (&id001).

    For all other occurrences of subs further, we can simply refer to the first occurrence of subs. We just need to make sure that assigning value to a variable has to be done at first occurrence of subs and then used later with its reference. And basic variable naming should be considered (alphabets/alphanumeric would be preferred.)

    Pitfalls to avoid in a new way of building parsers

      The procedure for creating an extractor has been mentioned in detail. If any of the steps are not followed correctly, it would result in bad extractor performance on the setup, this could also affect the EPS hits.

    How do Extractors work?

    On adding a new extractor, DNIF performs the following functions to extract relevant data from the incoming events.

    • The extractor database is rebuilt as per the newly added extractor and the Adapter pipeline is restarted.
    • First, the master filter is validated and if a match is found the extractor is identified and applied to the incoming log source.
    • The event is passed on to the first match. If the first match is found, then the event is routed to the appropriate event key for appropriate annotation and translation. Here, the event is annotated to a particular stream based on the event type and relevant fields are extracted to standard DNIF keywords.
    • If the event doesn’t match the master filter, then it will not be parsed and will be tagged as NLF (No Log Found ).

    • If the event key doesn’t match after matching master filter and first match, then the event will be tagged as OTHER events (if configured in the fallback clause).

    Introduced in v9.1.1

    UNET sync is a process that is running on core and it automatically syncs your extractors every 30 minutes.


    Was this article helpful?

    What's Next