YARA webinar follow up

yara webinar main 990x400 1

If you read my previous blogpost, “Hunting APTs with YARA” then you probably know about the webinar we’ve done on March 31, 2020, showcasing some of our experience in developing and using YARA rules for malware hunting.

In case you’ve missed the webinar or if you attended and want to re-watch it, you can find the recording here:
yara webinar 01

As requested by many of you, we are also making the slides available through SlideShare:
yara webinar 02

Unfortunately, we were forced to cut short the broadcast as we were running out of time. Nevertheless, we received a number of interesting questions and as I promised, I will try to answer them below. Thanks to everyone who participated and appreciate all the feedback and ideas!

YARA webinar – questions

  1. Can you share the presentation? (multiple)

    Sure, please find the link above for SlideShare.

  2. Hi Costin! what is the point of writing a rule on the exploit and not about the vulnerability? (from Ari)

    Hi Ari, hope you guys are doing well! In this case, we are trying to hunt an unknown 0day exploit, therefore, we don’t know which vulnerability it exploits. The only thing we can try to hunt for are the artifacts that the exploit developer left in his older exploits of the same kind (in this case, Silverlight). For more details, please see our blogpost: The mysterious case of CVE-2016-0034: the hunt for a Microsoft Silverlight 0-day.

  3. I’ll add an xml-based switch to show Imphash in lowercase, in pestudio! (from Marc)

    Thanks Marc, appreciated, and sorry for mispronouncing your last name! Everyone, in case you aren’t already using Pestudio for your initial malware assessment, go check it out.

  4. “Your italian is pretty good man / your italian is not so bad / Your italian is great 🙂 ” – various amici

    Thank you! Perhaps not surprisingly, Romania used to be a Roman colony 2000 years ago, which is why our languages are so similar. Wishing you guys all the best, stay safe and stay healthy!

  5. When you are looking or other languages, does the “pe.language” catch all hexbyte formats? (I.e. UTF-8 and UTF-16 will show mandarin characters in different hex bytes) (from Jono)

    That’s a good question. In reality, pe.language actually cycles through all the resources in the PE file and returns true if the language of at least one resource matches the one you are looking for. So it doesn’t really searching for any characters in the file, only using the metadata from the resource section.

  6. Can please explain “not for all i” in criteria – from Rohit, referring to the generic YARA rule from example 3
    Indeed, this is one tricky rule. Just to make it easier, I’m showing the solution below:
    yara webinar 03

    In essence, the rule works as follows: first, the version_info structure field named “CompanyName” should contain “Microsoft”, which means the file is claiming to be from Microsoft. Secondly, it needs to be signed with a digital certificate, so pe.number_of_signatures should be larger than 0. Finally, we check if there is at least one issuer for all the certificates used to sign the file that is not Microsoft nor VeriSign. Why “not for all”? Well, it’s a reverse logic – for all the certificates, we want to make sure the signatures are either from Microsoft or VeriSign. If at least one sig is found that is not from these two, the file is suspicious. Another way to do this would be to keep “and for all” and apply the not inside the loop, switching the “or” for an “and”. (because not (a or b) ==not a and not b)

  7. Do you have any open source database of good and benign files to test against false positives? (from Ramon)

    Hey Ramon, thanks for the question! Please turn to slide 37 for advice on how to build a benign sample set for QA and false positives testing.

  8. When you specify the “filesize” attribute within your rule – what denomination do you target? Bytes, Kilobytes, Megabytes etc…? (from James)

    By default, the filesize is expressed in bytes, so 200000 would be 200000 bytes. The YARA syntax also supports KB and MB, with KB multiplying by 1024 and MB by 2^.20.

  9. Would you recommend using the xor modifier now for this stuff? (from John) referring to slide 39:
    yara webinar 04

    In particular, the example on the right side is from Shamoon2 samples, where some of the strings would be XOR’ed by a one byte key which kept changing from sample to sample. Interesting enough, YARA supports the “xor” modifier, since version 3.8 (or so). However, the xor modifier is always applied last, so for our case above, it would work, as the zeroes in the wide strings would be xor’ed as well! Therefore, we need to bruteforce the strings and use them like in the case above, if zeroes are not xor’ed.

  10. How long does it take to scan your full collection with a normal YARA rule? (from Juan Aleister-Crowley)

    The entire Kaspersky malware collection, which is possibly one of the largest in the world, takes between 1 and 2 weeks to scan entirely, on a cluster of a few hundred computers. However, in most case, we resort to scanning subsets, such as recent samples or known APT samples already tagged by our robots, which takes between minutes and up to a day or two.

  11. What is your experience of using matching on the PE Rich Header? (from Axel)

    Good question! While in theory the pe module could allow for creation of rules that match on the decrypted Rich header, we haven’t played much with that. This is however something we’ve explored in connection to the Hades APT attack on the Winter Olympics and the associated false flag that relied on the Rich header from a Lazarus sample.

  12. What are some best practices around managing a collection of YARA rules? Rules harvested from the web as well as the ones internally developed. Are there any specific tools dedicated to maintaining such a collection? Do you just use Git? (from V)

    Hey V, thanks for the question! This is indeed one of the trickiest things and I have to admit that I do not know of a perfect solution yet. Indeed, there are some YARA management frameworks, but I can’t say I’m a big fan of any of them in particular. I do use Git for this purpose, but I also lack a nice visual interface that would allow me to search, edit and run them against samples with a click.

  13. Better speed if checking the file size before the rules? (from Damien)

    That’s a good question. According to Victor, the condition is evaluated by a decision tree, so the order is not necessarily the one that you put in the syntax. To be honest, I do prefer to put the filesize check first, perhaps for “superstition” reasons 🙂

  14. Here is a question “5 of ($b*)” means “any 5 of ($b*)” or “first 5 of ($b*)” (from Yerbol)

    Indeed, that means any (sub-)group of five $b strings.

  15. Hi, why is important and good indicator to use PDB paths in a YARA sigs? (from Adrian)

    Based on our experience, PDB paths, in particular unique looking folder names from PDB paths, are very good for detection of future malware from the same author. For example, taking an EternalBlue scanner from Omerez, that is used by the CobaltGoblin group, it has the following PDB inside:
    C:OmerezProjectsEternal BluesEternalBlueScannerobjReleaseEternalBlues.pdb
    A YARA rule that matches on “C:OmerezProjects” could find other tools from the same author.

If you have more questions about the YARA webinar, please feel free to drop us a line in the comments box below or on Twitter: @craiu.

P.S. Special note for those trying to do the iOS/MacOS homework – if you write the rules but don’t have access to a platform to run them for hunting purposes, please drop us a note at: yarawebinar [at] kaspersky.com

Thanks and stay safe!
Costin

Original Source