DNIF library has extractors ready to ingest data from all types of devices including IT infrastructure like networking equipment, servers and laptops as well as Applications such as Internal Portals, Email, Document Storage. Ingested logs are parsed to extract relevant fields, enriched with additional context and stored in the Datanode. A standard domain-specific extensible schema is applied during ingestion to help you organize your data to build and maintain quality analytics across the enterprise.
How to view Extractors?
- Hover on the System icon on the left navigation panel and select Extraction, the following page will be displayed.
The above page displays the following details.
|Name||Displays the name of the extractor|
|Log Name||Displays the specific log type and log name of this particular extractor|
|Author||Displays the email address of the user who created this extractor.|
|Type||Displays the type of extractor, whether it is a custom / native extractor.|
|Version||Displays the version number of the extractor.|
|Allows you to delete the extractor, this will be displayed only for custom extractors|
|Click this to refresh the extractor list|
|Click this to search for a particular extractor|
|Click this to add a new extractor|
How to create Custom Extractors?
To create a custom extractor, click the plus icon on the extractor list page. A side panel will be displayed as follows.
- Enter a name for the extractor you are about to create and you can directly start writing in the yaml editor.
- Click Submit after writing the parser and it will be listed in the extractor list.
By default, the exisitng native extractor will be disabled and custom will be added to the name of extractor.
The extractor will be listed on the extractor list page.
In this method you can create a custom exrtractor by cloning the existing native extractor.
- To create a custom extractor by cloning, click the name of an exisitng native extractor, the yml page will be dispalyed.
- Make the required changes and click Submit at the end of the page, the following screen will be displayed.
- Click Save to create a custom extractor.
How to write an extractor using a yaml file?
Extractors are built-in .yaml files, the value for ExtractorID, SourceName, and SourceType is populated along with the assigned key value.
Each extractor will have the following basic information
|schema-version||The version assigned to the extractor.|
|extractor-id||The unique ID assigned to the extractor|
|source-name||The name assigned to the extractor as per the device. Example: Fortigate, Checkpoint, etc|
|source-type||The type of device.
|source-description||A short description regarding the extractor|
Stream is a domain-specific collection of data from different sources that contributes to a unique dataset and a unique set of use-cases. Each value in the Stream field within the extractor can be used to generate a search that returns a particular dataset with information.
This is the section where we could define the streams that are included in the extractor. There are various streams such as: AUTHENTICATION, SYSMON-PROCESS, SYSMON-NETWORK, IAM etc
Example: Authentication refers to the login and logout activity events, IAM refers to the User Management events such as create user, delete user.
The Master Filters and First Matches helps to identify the extractor to be applied to a given log source, it has been heavily optimized for performance.
The following configuration should be done under Event details
First Matches will help us identify different patterns associated with a log source.
- Each first match will be associated with a decoder.
- First matches can now yield multiple events if used with decoder=json or custom-kv.
The decoder=regex is the legacy event detail approach that will be relevant for older devices.
Decoder section defines the type of decoder to be used on the basis of the log format. Decoders are defined at the First match level, therefore, we could use multiple decoders in an extractor file.
There are 3 decoders available
JSON: It is written as ‘decoder: json’ in the extractor files. The log samples which are in JSON format could be parsed using this decoder. It parses all the key-values correctly that are rendered in the log sample.
Note: Regex is not required to parse key values.
Custom(key-value): It is written as ‘decoder: custom’ in the extractor files. The log samples which are in key-value format could be parsed using this decoder. Here, we have to write a generic regex that captures the key and value from the log samples appropriately.
Example: Refer to the below snapshot:
As per the snapshot, it is seen that a generic regex is written to capture the key value in the log sample. This regex will result in groups of keys and values, displayed in the image below. Further, the Key could be annotated as per the field Annotations in the extractor.
Regex: It is written as ‘decoder: regex’ in the yml files. The log samples that are in Syslog (only values) format could be parsed using this decoder. Here, we have to write the regex and define the field name in it. This field name could be mapped and annotated in the extractor accordingly.
Example: Refer to the below snapshot
In the snapshot, it can be seen that field names are defined in the regex. This could be achieved by writing (?P<field_name>) at the start of the group.
Event Key Format
In the event-key-format section we have to define the field on the basis of which we could achieve an accurate present in the log event.
For example: Refer to the snapshot below:
In the snapshot above one can see that First Match is defined on the basis of SourceName and further it is segregated on the basis of EventID in the ‘event-key-format’ section.
Event Key Mapping:
In the ‘event-key-mapping’ section the events could be defined with appropriate Streams. Although while specifying an event in this section one needs to ensure each event identifies itself with a Stream.
In the snapshot, EventID is defined as the pointer that provides us maximum information about the log event. Refer to the below table to understand regarding annotate and translate fields.
|annotate||Static key value for Stream, Action and status to be added as per the log event’s information. In the above snapshot, relevant Stream, Action and Status is defined as per the log event’s information.
|translate||All the relevant fields as per the stream should be defined under the translate section. Allows you to replace the fields as per DNIF terminology.|
Fallback is a mandatory field. All the events that are defined with Stream will be parsed accurately, while the undefined events for that particular First Match will parse under the fallback section.
Example: Let us consider we have created the First Match on the basis of SourceName for Windows Extractor and the further division is on the basis of EventID. In this we have defined some EventIDs with proper stream while some of the EventIDs could not be defined, this undefined EventID will then parse under the fallback section.
Refer the snapshot for fallback events field definition:
Globals is a non-mandatory field. In this section we could define the generic fields that are present throughout the Extractor.
Refer the snapshot for globals definition:
In most of the devices, there are substitutions provided for some fields. This substitution can be defined under the globals section as follows:
To make this work for multiple samples (First Matches) in the Extractor, following procedures can be followed.
We have two first matches here.
Referring to the first occurrence of First Match, we have subs defined for it. The only addition being (&id001) present against subs. The character ‘&’ denotes assigning value of subs to the variable id001. Once we assign this to a variable, we can reuse it wherever we are using the same values, as in case of subs. (& sign is placed before variable name as assignment)
Referring to the second occurrence of First Match, we have subs but now we are only using (id001). So we are basically reusing the subs defined before. The symbol ‘’ is used with id001 to refer (&id001).
For all other occurrences of subs further, we can simply refer to the first occurrence of subs. We just need to make sure that assigning value to a variable has to be done at first occurrence of subs and then used later with its reference. And basic variable naming should be considered (alphabets/alphanumeric would be preferred.)
Pitfalls to avoid in a new way of building parsers
- The procedure for creating an extractor has been mentioned in detail. If any of the steps are not followed correctly, it would result in bad extractor performance on the setup, this could also affect the EPS hits.
How do Extractors work?
On adding a new extractor, DNIF performs the following functions to extract relevant data from the incoming events.
- The extractor database is rebuilt as per the newly added extractor and the Adapter pipeline is restarted.
- First, the master filter is validated and if a match is found the extractor is identified and applied to the incoming log source.
- The event is passed on to the first match. If the first match is found, then the event is routed to the appropriate event key for appropriate annotation and translation. Here, the event is annotated to a particular stream based on the event type and relevant fields are extracted to standard DNIF keywords.
If the event doesn’t match the master filter, then it will not be parsed and will be tagged as NLF (No Log Found ).
If the event key doesn’t match after matching master filter and first match, then the event will be tagged as OTHER events (if configured in the fallback clause).
UNET sync is a process that is running on core and it automatically syncs your extractors every 30 minutes.