<http://www.uwasa.fi/~ts/info/proctips.html> Copyright © 1999-2001 by Prof. Timo Salmi Last modified Wed 2-May-2001 23:03:40 |
|
Although there already is an abundance of procmail material on the net, here are some of my own tips and observations. This tips page is a companion of my Foiling Spam with an Email Password System page. The items on this page are in no particular order.
Unix email can conveniently be preprocessed with automatic filters such as procmail, the "Autonomous mail processor". This item repeats what already is presented about getting started in many of the other FAQs, including mine on spamfoiling. Nevertheless, this is so crucial that I'll try to give the essential outline also here.
Find out what your email directory is. Go ("cd") to the directory where your email folders are located and type "pwd". Assume in this item that you get "/home/myid/Mail". Further assume in the example that "/home/myid" is your home directory so that you can use the variable "${HOME}" to denote it.
Find out where your system's Bourne shell is located by typing "which sh". Assume that you get "/usr/bin/sh".
Prepare a "~/.procmailrc" file with a suitable editor. For example you might use "emacs ~/.procmailrc". To start with, put something like this into the ~/.procmailrc file:
#Preliminaries SHELL=/usr/bin/sh #Use the Bourne shell (check your path!) MAILDIR=${HOME}/Mail #First check what your mail directory is! LOGFILE=${MAILDIR}/procmail.log LOG="--- Logging ${LOGFILE} for ${LOGNAME}, " #Whatever recipes you'll use #The order of the recipes is significant :0 * ^From: scam@cyberspam\.com /dev/null # Accept all the rest to your default mailbox :0: ${DEFAULT}For the "~/.procmailrc" file a read permission for the user him/herself will be sufficient. To ensure, give the command "chmod u+r ~/.procmailrc".
Find out where the "procmail" program is located on your system by typing "which procmail". Assume below that you get "/usr/local/bin/procmail". Also check what your id is: "whoami". Assume that you get "myid".
Next comes the crucial step. Put the following line in your "~/.forward" file. Include the quotes (") into the ~/.forward file contents.
"|IFS=' ' && exec /usr/local/bin/procmail || exit 75 #myid"Set adequate permissions for accessing the "~/.forward" file: "chmod 644 ~/.forward". Lastly, check ("ls -lFd ~/") that your main directory permissions are at least (the equivalent of) "drwx--s--x". If not, "chmod u+rwx ~/" and "chmod og+x ~/".
You should now be set to go. To check, send an email to yourself to see if it gets through. If there is a problem see the advice on troubleshooting.
There are several options. One method is building a simple test environment as follows. It is a very convenient method. If you apply it right, it allows the testing without affecting your normal flow of email in any way. Create the following "proctest" file, preferably at your path. Make it executable using "chmod u+x proctest". Thus you'll have a new command "proctest" available.
#The executable file named "proctest"
#!/bin/sh
#
# You need a test directory.
TESTDIR=/home/myid/test/
if [ ! -d ${TESTDIR} ] ; then
echo "Directory ${TESTDIR} does not exist; First create it"
exit 0
fi
#
#Feed an email message to procmail. Apply proctest.rc recipes file.
#First prepare a mail.msg email file which you wish to use for the
#testing.
procmail ${TESTDIR}/proctest.rc < mail.msg
#
#Show the results.
less ${TESTDIR}/Proctest.log
clear
less ${TESTDIR}/Proctest.mail
#
#Clean up.
rm -i ${TESTDIR}/Proctest.log
rm -i ${TESTDIR}/Proctest.mail
The beauty of this method is that besides "proctest.rc" you can easily edit also "mail.msg" for testing different kinds of incoming mail and the behavior of your recipes in various situations. Note, however, that it is best to test only for one email message at a time. In other words, do not put more than one email message into the mail.msg test file.
A question remains. Where does one get the structure of a posting for the "mail.msg" test posting? Easy. Invoke elm, select a suitable, existing posting, and make a copy of it to "mail.msg" by pressing C (capital C) and reply mail.msg to "Copy message to:". Other mail programs probably have similar options.
Below is the proctest.rc recipe file which I used in preparing for this item:
SHELL=/bin/sh TESTDIR=/home/myid/test MAILDIR=${TESTDIR} LOGFILE=${TESTDIR}/Proctest.log LOG="--- Logging for ${LOGNAME}, " #Troubleshooting: VERBOSE=yes LOGABSTRACT=all #Let's test stripping lines from the email message's header :0 fwh | egrep -vi "(^Content-|^MIME-Version:.)" #If it is from myself, store the email message :0: * $ ^From:.*${LOGNAME} ${TESTDIR}/Proctest.mail #Otherwise, discard the email message :0 /dev/nullFeedback: The header stripping does not work if any of those header lines is continued. It is almost always an error to use grep/egrep/fgrep when filtering a message header. A better recipe would be the following, utilizing formail:
#Let's test stripping lines from the email message's header, #but only when they're there :0 fwh * ^(Mime-Version:|Content-) | formail -IMime-Version: -IContent-To continue myself. The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is about the header of the email message.
The formail -I switch means that if the field is found it is to be replaced with a similar field with and "Old-" prefix, provided that the field is not empty (if it is empty the field is removed).
Just in case, let's first revisit an "and" rule by a common example:
#Trivial catching of potential spam towards the end of a ~/.procmailrc #Place only after accepting all the mailing lists you want to receive :0: * ! ^TO_ts@([-a-z0-9_]+\.)*uwasa\.fi * ! ^TO_timo\.salmi ${HOME}/.mail/PotentialSpam.mailFor entering an "or" rule, consider the following example:
#Accept email from Era Eriksson, the author of the major procmail FAQ :0: * ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\ ^From:.*era@iki\.fi ${DEFAULT}Let's look at a few details:
:0: * 1^0 ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi * 1^0 ^From:.*era@iki\.fi ${DEFAULT}Likewise, you could alternatively use ( ) grouping
:0: * ^From:.*(\ reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\ era@iki\.fi) ${DEFAULT}
See the action line below (i.e. the one starting with the "|" pipe). Separate the commands with "&&". If you wish to continue on a second line for readability, apply "\" Alternatively, just one long line could have been used. The recipe below is from a test with the testbench, so it's purpose is just to show this method of giving multiple commands.
#Test if the message has a "Subject:" header and has a subject in it
#(The brackets [] contain a space and a tab)
:0:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* ^Subject:[ ]*\/[^ ].*
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
echo "${MATCH}" >> ${TESTDIR}/Proctest.mail
Likewise, a single command can be subdivided for easier documentation:
| echo "A ^Subject: header found but there is no subject" \
>> ${TESTDIR}/Proctest.mail
Below is another example with a slightly different syntax using the semicolon ";" as the separator. The example also demonstrates how to save diskspace by zipping email from a particular source. You'll need Info-ZIP's zip and unzip in order to be able to apply it. (They are available from the proper Unix section of Garbo program archives at the University of Vaasa, Finland.)
:0w:Test.mail.lock
* ^From:.*test
| unzip ${HOME}/mail/Test.zip; \
cat >> Test.mail; \
zip -oj9 ${HOME}/mail/Test.zip Test.mail; \
rm -f Test.mail
What happens on the action line is this:
Now is a good time to utilize my testbench in order to find out if a logic works. Build a /home/myid/test/proctest.rc file.
SHELL=/bin/sh TESTDIR=/home/myid/test MAILDIR=${TESTDIR} LOGFILE=${TESTDIR}/Proctest.log LOG="--- Logging for ${LOGNAME}, "First, a few environment variables are included.
#Troubleshooting: VERBOSE=yes LOGABSTRACT=allThe above means: Use full reporting for the debugging.
#An auxiliary regular expression to detect text, #The brackets [] contain a space and a tab GETTEXT="[ ]*\/[^ ].*"If the same expression is used several times in a recipe file, it is convenient to put the expression into an environment variable instead of writing it out repeatedly.
#Test if the message has a "Subject:" at all
:0c:${TESTDIR}/Proctest.mail.lock
* !^Subject:
| echo "No ^Subject: header was found" >> ${TESTDIR}/Proctest.mail
#Otherwise, discard the message
:0
/dev/null
After the recipes above have been testbenched and cleared, you know that the methods used in them will work for you in your own environment.
Of course, there are other options for extracting the subject into an environment variable. One is to utilize "formail" which is a companion to the procmail program. If you include the following expression at the beginning of your ~/.procmailrc recipes file, you will have the variable ${SUBJECT} available for the rest of the recipes file.
#Environment variables for procmail # #Get the subject #Discard some dangerous special chars + any leading and trailing blanks SUBJECT=`formail -xSubject: \ | sed -e 's/[;\`\\]/ /g' \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`For an example of usage see the Foiling Spam with an Email Password System page.
Feedback: Extracting the header from inside procmail using the \/ token is _much_ faster than the formail solution.
You can use
#Header logging
:0hc:${HOME}/.mail/Procmail.head.lock
| cat >> ${HOME}/.mail/Procmail.head
:0 hc:
$HOME/headers.cut
That eliminates a cat and a shell process, plus the pipe and extra reads and writes.
Now, if you want to overwrite the file with each new message [or do some further shell operations within the pipe], then the cat command is a reasonable choice.
[A further point] That would have been an odd name for the lockfile. Why not $HOME/headers.cut.$LOCKEXT?
Besides what is on my page Foiling Spam with an Email Password System and a separate item on detecting the sender, below are some instructive little tricks.
Perhaps the strongest generic trick against spam is to shirk any email that is not addressed to you directly, since most spam is addressed to some kind of mailing lists. Of course, you first will have to accept email from any legitimate mailing list which you have subscribed to. If you put a suitable recipe after your recipes that accept the legitimate email lists much of the incoming spam will be caught. Below is a simplified And a bit munged) version of what I do in my own ~/.procmailrc:
#Catch potential spam :0 * !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi { :0 fwh * ^Content-Length: | formail -IContent-Length: :0: Spam.mail }If you look carefully through this page, you'll find explanations for all the details in the above recipe. It will be a good exercise to do so. :-)
Since so much, if not practically all spam comes from forged sender addresses it is much more effective to block certain suspect email routes than to try to match the elusive spammers. The scoring recipe example below treats as spam all email that is routed via dialsprint.net and that is not addressed to "me" personally.
#Spam avoidance of certain routes and if not for me personally :0: * -1^0 * 1^0 ? formail -x"Received:" | egrep -is "dialsprint\.net" * 1^0 ! ^TO_(myid|myFirstName[ _\.]myLastName)@([-a-z0-9_]+\.)*myhost\.mydom Spam.mail
:0B: * (remove@|removeme@) PotentialSpam.mail
:0D: * ^Subject:.*ADV PotentialSpam.mail
:0: * (^Subject:.*make.*money.*fast|^Subject:.*\$\$\$) PotentialSpam.mail
Feedback: The regexp:
(remove@|removeme@)is much slower than
remove(me)?@Having the 'top-level' of the regexp be a alternation (via '|') slows down matching by quite a bit. The more that can be factored out at the beginning of the regexp, the better. The same goes for the recipe that matches against the Subject: header field:
^Subject:.*(make.*money.*fast|\$\$\$)is faster than:
(^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)My comment: Of course it is commendable to be efficient, especially where easy understanding is not compromised. However, if the two clash, I often prefer clarity of expression and convenience over a strict maximization of code efficiency. Don't we have our powerful modern computers to perform our tasks for us, not vice versa :-). (This is not about the particular feedback above. The improvements are useful. They are both legible and instructive.)
More feedback: The "* ^Subject:.*ADV" rule is overly simplistic and catches many non-spam subjects. Maybe rather something like "* ^Subject:\<*ADV\>"
My comment. Ok. Let's try
:0D:
#(The brackets [] start with a space and a tab)
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
It is far from perfect, but it should work reasonably well for regular purposes. Spam detection requires experimenting anyway. Regular expressions are not easy. They are quite a large subject area of their own.
The above assumes that the is (at least) one space after the "Subject:" header before the subject begins. This can be ensured by first applying "formail -z" which you can have high up your ~/.procmailrc. For example I have the upper two lines in mine.
:0 fwh
:0D:
| formail -z -iContent-Length:
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
See the other items in this tips file for an explanation of the "fwh" flags. The formail program with the "-z" switch will insert the desired blanks into the header. The "-iContent-Length:" switch (which is outside the theme of the current item) will replace the Content-Length: headers with Old-Content-Length: headers.
I use a slightly different recipe in my own ~/.procmailrc recipes file:
:0D
* ^Subject:.*([ ]|<|\[)ADV([ ]|>|:|\]|$)
{
:0
{ RULE="Catch potential spam by detecting an ADV keyword" }
:0
/dev/null
}
If you wonder about the "RULE" variable, see the item about logging which rules have been used.
On to a different facet. Some ISPs (Internet Service Providers) do now allow numbers in the email addresses. Thus, you may identify some of the forged spam by catching a violation in this respect. The following recipe catches email with numbers in the user id before the @ mark from the all the various nodes on "respectable.net".
:0: * ^From:.*[0-9].+@([-a-z0-9_]+\.)*respectable.net PotentialSpam.mail
Before we proceed any further, there is a very important email feature to observe. If you alter the content-length of a message it is highly advisable first to discard any "Content-Length:" lines from the email's header. If you fail to do that, there is the danger that next time you read the relevant email folder your email program will break your folder because of erroneous length information. Many email programs are brain-dead that way.
#Truncate messages longer than 4000 bytes to 100 lines
:0:Truncated.mail.lock
:0
* > 4000
{
:0 fwh
* ^Content-Length:
| formail -IContent-Length:
| head -100 >> Truncated.mail
}
Some details:
#Truncate messages longer than 4000 bytes to 100 + 10 lines
:0c:Truncated.mail.lock
:0:Truncated.mail.lock
:0
* > 4000
{
:0 fwh
* ^Content-Length:
| formail -IContent-Length:
| head -100 >> Truncated.mail &&\
echo "-:-:-:- (snip) -:-:-:-" >> Truncated.mail
| tail +101 | tail -10 >> Truncated.mail
}
A few observations:
Let's see. A lite version of the testbench could be the following. Put the rules you wish to try out in a "greptest" file of your rules with egrep since procmail matching closely (but not quite!) follows egrep's. Make the file executable with "chmod u+x greptest". Then make a "mail.msg" file with the texts you wish to try to match (or not to match). Thus you might have:
#The executable file named "greptest" #!/bin/sh egrep -i '(ts|timo\.salmi)@([-a-z0-9_]+\.)*uvasa\.fi' mail.msg # #Allow a quick visual comparison on the screen echo "" cat mail.msg #The mail.msg target file with the trial text for the matching ts@uvasa.Fi ts@loisto.uvasa.fi Timo.Salmi@uvasa.Fi Timo.Salmi null@uvasa.fiThen, just give the command "greptest" and visually compare the outputs.
Miscellaneous notes:
I have been baffling over this item myself, because it is not as trivial as it first appears. The catch is that the ".com" is exactly at the end of the address. The problem naturally is that in the email headers there can be text after the email address, such as the sender's name. E.g.
From: scam@cyberspam.com (The Big Bad Spammer)The first solution that comes to mind is the following, but it is not entirely accurate.
:0: * ^From:.*\.com * !^From:.*\.com\. * !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi ProbableComSpam.mail
# Get the sender's address # Discard any leading and trailing whitespaces FROMADDR_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` # Test if the email came from the .com domain :0: * $ ? echo ${FROMADDR_} | egrep -is '\.com$' ComDomain.mail
:0: * ^From:.*\.hk|\ ^From:.*\.kr|\ ^From:.*\.tr * !^From:.*\.hk\.|\ !^From:.*\.kr\.|\ !^From:.*\.tr\. * !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom ProbableSpam.mailAn aside: You could also utilize a more condensed format:
* ^From:.*\.(hk|kr|tr)(Condensing the rest of the above recipe is left as an exercise.)
Using scoring is one option. The recipe could also be rewritten as
#Define getting the sender's address #Discard any leading and trailing whitespaces FROM_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Whatever other recipes in between. #Spam screening of certain susceptible domains :0: * -1^0 * 1^0 $ ? echo ${FROM_} | egrep -is '\.hk$' * 1^0 $ ? echo ${FROM_} | egrep -is '\.kr$' * 1^0 $ ? echo ${FROM_} | egrep -is '\.tr$' * 1^0 !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom ProbableSpam.mailThere also is the option
:0:
* ^From:.*\.hk([ >]|$)|\
^From:.*\.kr([ >]|$)|\
^From:.*\.tr([ >]|$)
* ! ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
If we only look at the "From:" field in the header we have the familiar:
#Accept all email from myself, weed out autoreplies :0: * ^From:.*myid@([-a-z0-9_]+\.)*myhost\.mydom * ! ^X-Loop: myid@myhost\.mydom ${DEFAULT}Next, let's extend the matching to more fields in the header:
:0 * ? formail -x"From" -x"From:" -x"Reply-To:" -x"Errors-To:"\ | egrep -i "scam@cyberspam\.com" /dev/null
FROM="^(From[ ]|(Old-|X-)?(Resent-)?(From|Reply-To|Sender):)(.*\<)?"
#(whatever else in between)
:0
* $ ${FROM}scam@cyberspam\.com
/dev/null
:0 * ? formail -x"Received:"\ | egrep -i "cyberspam\.com" /dev/nullSpam email is sometimes indicated by a missing or an empty "From:" line in the header. Furthermore, the "From:" line might contain an empty <> instead of having a proper address within the <>. Using scoring we might have something like
:0:
* 1^0 ^From:([ ]$|$)
* 1^0 ! ^From:
#A catch: Don't use here the word-boundary operators \< \>
#Use just the plain <>
* 1^0 ^From:.*<>
NoFrom.mail
Under a worst-case scenario, the various sender headers might all be empty. To test for this unlikely eventuality we can utilize the fact that formail would put a "foo@bar" into the "FROM_" under such circumstances.
# Define getting the sender's address # Discard any leading and trailing whitespaces FROM_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` # Test if the sender could not be identified at all :0: * FROM_ ?? foo@bar NoSender.mailAs always, there are several alternatives to solving a problem. Consider a potential case where a spammer poses as the mailer-daemon but the "From:" header is either missing or total gibberish. How to detect this situation? The second condition in the recipe below ensures that there is "From:" line in the header, and that it has some elementary validity.
:0: * ^From[ ]*MAILER-DAEMON * ! ? formail -x"From:" | egrep -is "[a-z]" ProbableSpam.mail
One trick is to utilize the following variable definition letting formail do the worrying about the proper address format.
REPLYTO_=`egrep "^Reply-To:" | head -1 \ | formail -c -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
Feedback:
Let me suggest this:
Timo's further comments:
Put these definitions high up in your ~/.procmailrc :
#Get the sender's address, the generic version FROM_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Get the sender's host FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'` #Build the postmaster's address FMAST_="postmaster@${FHOST_}"Thus, you have the postmaster's alleged address available as ${FMAST_} from this point on in your recipes file. Note, however, that all validity testing of the address is missing.
What happens in the FROM_ formula:
#Get the sender's address, ignore Reply-To: FROM_=`formail -I"Reply-To:" -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
The (only slightly modified) example below is based on a true situation from my own ~/.procmailrc.
#Ensure a whitespace exists between field name and content #Comment "Old-" the Content-Length field from all the headers :0 fwh | formail -z -i"Content-Length:" #(whatever else in between) :0 * From:.*the-mailing-list-maintainer * ^TO_the@first\.recipient\.edu { :0 fw | formail -I"To:" -I"X-" -I"Content-Type:" -I"MIME-Version:"\ -A "To: Maintainer's long recipient list suppressed" \ | sed -e '/^This is a multi-part /,/^Content-Transfer-Encoding: /d' \ -e '/------=_NextPart_/,$d' :0: ${DEFAULT} }
This is a somewhat complicated subject with material dispersed throughout the various procmail FAQs. Basically scoring is a method to count how many of the conditions are fulfilled in a recipe and if the "score" is positive, that is the score is 1 or more, the action line in the recipe will be performed. There is much, much more to scoring, but this is a good starting point.
Consider the following simple spam foiling recipe. It will put the email into the ProbableSpam.mail file if the score adds up to at least to one. If the first condition is met, 1 is added to the score. Ditto for the second condition. Thus if either of the tell-tale spam signals occur, the score will be positive (that is greater than zero) and the action (storing the email message into the ProbableSpam.mail file) will be enacted.
:0: * 1^0 ^Subject:.*make money fast * 1^0 ^Subject:.*\$\$\$ ProbableSpam.mailThe example above uses equally-weighted scoring. One can also have unequal scores. Below, a hit of the second condition gives two points while a hit of the first only gives one.
* 1^0 ^Subject:.*make money fast * 2^0 ^Subject:.*\$\$\$Scoring can be used to build some extremely trivial artificial intelligence into the recipes. Consider the following
:0: * -1^0 * 1^0 ^Subject:.*money * 1^0 ^Subject:.*fast * 1^0 ^Subject:.*\$\$\$ ProbableSpam.mail
:0:
* ^Subject:[ ]*\/[^ ].*
* -2^0
* 1^0 MATCH ?? ()\<easy\>
* 1^0 MATCH ?? ()\<fast\>
* 1^0 MATCH ?? ()\<(cash|money)\>
* 1^0 MATCH ?? \$\$\$
ProbableSpam.mail
#Catch potential spam by examining the email route :0: * 1^0 ? formail -x"Received:" | egrep -i "157\.161\.140\.2" * 1^0 ? formail -x"Received:" | egrep -i "199\.217\.231\.46" * 1^0 ? formail -x"Received:" | egrep -i "212\.106\.213\.36" * 1^0 ? formail -x"Received:" | egrep -i "216\.154\.1\.82" ProbableSpam.mail
#Avoid a specific forgery spam :0: * -1^0 * 1^0 ^From:.*mikerobbins2000@hotmail\.com * 1^0 ? formail -x"Received:" | egrep -is "psi\.net" Spam.mailScoring and ordinary conditions can be mixed in the rules. For example the two recipes below achieve roughly the same thing, but the latter option produces less steps if the email is for you.
:0: * -1^0 * 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com' * 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net' * 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net' * 1^0 ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom ProbableSpam.mail :0: * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com' * 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net' * 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net' ProbableSpam.mailThe formail switches in the above are
:0: * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * ^Received:.*(\ alladvantage\.com|\ ameritech\.net|\ bellatlantic\.net) ProbableSpam.mail
Scoring seems to be the answer:
:0: * 1^0 ^Subject:([ ]$|$) * 1^0 !^Subject: NoSubject.mailAs usual, the brackets [] contain a space and a tab.
There are other options to test for an empty "Subject:" or an entirely missing "Subject:" field. The one below puts the subject contents in a variable. The actual recipe then tests if the value of the "SUBJ_" variable is empty.
#Get the subject discarding any leading and trailing blanks SUBJ_=`formail -xSubject: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Test for an empty or missing subject :0: * SUBJ_ ?? ^^^^ NoSubject.mail
I am not exactly sure why you wish to do this, but here is how to replace the "To:" header field of a message using formail. Choose the formail "-i" option to rename the old "To:" field to be "old-To:" and to insert the new "To:" header field. The flags in the recipe are as follows: "f" use the pipe as a filter, "h" it is about the header of the email message, "w" execute before proceeding down the rest of the "~/.procmailrc".
:0 fhw * To.*myoldid@myoldhost.myolddom | formail -i "To: mynewid@mynewhost.mynewdom"
The technique is fairly simple. Put this in your "~/.procmailrc" file:
MAILDIR=/home/myid/Mail #The location of your own mail directory # Whatever other preliminaries # Whatever other recipes # Test if the email's sender is in the blacklisted :0 * ? formail -x"From" -x"From:" -x"Sender:" \ -x"Reply-To:" -x"Return-Path:" -x"To:" \ | egrep -is -f black.lst /dev/null
abc23@airnewz.ccn abdu@advis.com.tr adexec@mail.com dinner@dine.com friend@public.com helpingyou@mail.com mk1977@ms1.kingnet.com.tw nb8MAMxhq@mail.com no@body.com owieuj@peterlink.ru patkline00@usa.net promotions@web-vertise.com unknown@unknown.com
Below is an example:
#Get the sender's bare email address from the first "From" line FROM_=`formail -c -x"From " \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \ | awk '{ print $1 }'` #Get the original subject of the email #Discard superfluous tabs and spaces SUBJ_=`formail -c -xSubject: \ | expand \ | sed -e 's/ */ /g' \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Whatever other recipes you'll use :0 * ^From:.*infolist@([-a-z0-9_]+\.)*infohost\.infodom # Avoid email loops * ! ^X-Loop: myid@myhost\.mydom { :0c: #Preserve a copy of the email Infolist.mail :0fwh #Adjust some headers before forwarding | formail -A"X-Loop: myid@myhost.mydom" \ -A"X-From-Origin: ${FROM_}" \ -i"Subject: $SUBJ_ (fwd)" # Forward the email :0 !mydept@myhost.mydom }
I have the following recipe in my ~/.procmailrc file, but the
email does not get forwarded to the myid2@myhost.mydom
address.
:0 c
*^From.*info.gov
! friend@somehost.domain myid2@myhost.mydom
I am not sure what is wrong with that, but at least the solution below should work:
:0
* ^From.*info.gov
* ! ^X-Loop: myid@myhost\.mydom
{
:0fwh
| formail -A"X-Loop: myid@myhost.mydom"
:0c
! friend@somehost.domain
:0
! myid2@myhost.mydom
}
The X-Loop is not relevant from the point of the stated problem, but using it as a safeguard is always advisable.
Feedback: The reason that the first one does not work is that the recipients' addresses are separated by space while they should be separated by a comma [as in]
:0
! friend@somehost.domain,myid2@myhost.mydom
(I have not tested this one.)
Ah! Another potential case of spam avoidance? (This is a companion page to Foiling Spam with an Email Password System, remember.) Below is an example. But be sensible in using the method, since most spam has forged senders.
#Define getting the sender's address
#Whatever other recipes in between.
#Return certain email
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0
#
# Is the email from a frequent spam domain?
# (Note: fgrep takes no regular expressions)
* ? formail -c -x"Received:" | fgrep -is 'cyperspam.com'
#
# Is it for a mailing list rather than to me?
* ! ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
#
# Avoid forgeries that pretend to be from my own site
* ! $ ? echo ${FROM_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FROM_} | fgrep -is '.'
* $ ? echo ${FROM_} | fgrep -is '@'
#
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
# Make a temporary file of the message to be returned
:0c:formail.lock
# Discard whitespaces, insert a leading blank
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
# Prepare and send the rejection
# Be sure to customize your sendmail path
:0:formail.lock
| (formail -r -I"Subject: Rejected mail: Recipient refusal" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
&
rm -f return.tmp) \
| /usr/lib/sendmail -t
}
:0
:0
* ^From:.*(charpie|charpie5266)@mydeja\.com
{ REJECT="charpie5266@mydeja.com" }
:0
* ^From:.*umidextr@([-a-z0-9_]+\.)*mindfall\.com
{ REJECT="umidextr@mindfall.com" }
:0
* ^From:.*(rasch|Greg.*\.Rasch)@([-a-z0-9_]+\.)*millkirn\.com
{ REJECT="rasch@millkirn.com" }
:0
* ^From:.*(daren|Daren[_\.]Risenthal)@([-a-z0-9_]+\.)*slunet\.org
{ REJECT="daren@slunet.org" }
* ! REJECT ?? ^^^^
{
:0
{ RULE="These users I do not want to talk with" }
:0cw
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
:0:procmail.lock
| (formail -r -I"To: ${REJECT}" \
-I"Subject: Rejected mail: Recipient refusal" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp) \
| /usr/lib/sendmail -t
}
:0
* ! REJECT ?? ^^^^
{
:0cw
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
:0 fwh
| formail -r \
-A"Subject: Rejected mail: Recipient refusal" \
-A"From: myid@myhost.mydom" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp
:0
! ${REJECT}
}
This is a theme whose constituents already are covered throughout this material. But also take a look at "man procmailex" for the "vacation database" idea even if a better name here would be something like "dejatold database".
SUBJ_=`formail -c -xSubject: \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` :0 # Was it to me * ^TO_myoldid@myoldhost\.myolddom # Ignore messages for daemons * ! ^FROM_DAEMON # Avoid email loops * ! ^X-Loop: myid@myhost\.mydom { :0 c ! myid@myhost.mydom :0:dejatold.lock | formail -rD 8192 dejatold.cache :0 eh | (formail -r \ -A"X-Loop: myid@myhost.mydom" \ -I"Subject: Changed email address" ; \ echo "Dear Sender," ; \ echo "" ; \ echo "Thank you for your email about" ; \ echo "\"${SUBJ_}\"" ; \ echo "" ; \ echo "My email address has changed." ; \ echo "Old: myoldid@myoldhost.myolddom" ; \ echo "New: myid@myhost.mydom" ; \ echo "Your email has been forwarded to my new address." ) \ | /usr/lib/sendmail -oi -t }Some explanations:
Let's start with another, much simpler question:
From: ts@UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Procmail: How do I filter by the body
Date: Sun Apr 23 09:34:38 EET DST 2000
X-Comment: Slightly modified
I am trying to save all the messages that come to me with "mypassword" in the body to a folder called password. How do I do that?
As the manuals state:
Hence, all there is to it is
:0 B: * mypassword passwordIf you want your password case sensitive then use ":0 BD:".
From: ts@UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Question of procmail newbie
Date: Tue Nov 23 23:09:41 EET 1999
X-Comment: Slightly modified
How could I solve the following problem with procmail: I receive
e-mails with a body like this:
I need to store this mail to the folder aaa/bbb/ccc, so procmail
should create directories aaa/bbb . What kind of .procmailrc should
I write?
The trick is to extract the appropriate text from the body of the email message and to set procmail variable values on the basis of the results. This is how it can be done.
#Preliminaries
CATE=`cat | egrep "^Category:" | awk '{ print $2 }'`
#Whatever other recipes
:0B:Procmail.lock
#Whatever other recipes
SHELL=/usr/bin/sh #Use the Bourne shell (check your path!)
SCAT=`cat | egrep "^Subcategory:" | awk '{ print $2 }'`
FILE=`cat | egrep "^File:" | awk '{ print $2 }'`
* ^Category:[ ].+[a-z0-9]
* ^Subcategory:[ ].+[a-z0-9]
* ^File:[ ].+[a-z0-9]
| mkdir ${CATE} ; mkdir ${CATE}/${SCAT} ;\
cat >> ${CATE}/${SCAT}/${FILE}
As a validity check the condition lines require that all the key-lines are present in the email message body and that the lines contain names.
CATE=`cat | awk '/^Category:/ { print $2 }'` etc
Next, let's consider a more tricky task. Find from the body of the text the last line that potentially contains the string "mailto:". Insert the contents of that line into a MAILTO_ variable.
:0 * ^Subject:.*Whatever { :0 { MAILTO_=`sed -e '1,/^$/ d' \ | egrep "mailto:" \ | tail -1 \ | expand \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \ | sed -e 's/[^o]://g' -e 's/^://g' \ | awk -F: '{ print $2 }' | awk '{ print $1 }'` } :0: WhichEverFolderYouWant }Consider the MAILTO_ construct. (The test of the recipe should be self-explanatory.)
I have a really simple procmail question. All I want to do is add a line "======= Forwarded Mail ==========" to the top of the body of all incoming messages, and forward them to another account.
Let start by considering the first part of the question only. This is how it is done. The solution owes heavily to Philip Guenther.
:0 { :0 fhw | cat - ; \ echo "===== Filtered email =====" :0: ${DEFAULT} }So far so good. Next let's add the forwarding so that the token will only appear in the forwarded message. (If you wish to change that, adjust the order of the rules.)
:0 { :0c: ${DEFAULT} :0 fhw | cat - ; \ echo "======= Forwarded Mail ==========" :0 !forward@myhost.mydom }Finally, let's add avoiding email loops.
# Discard loops :0 * ^X-Loop: myid@myhost\.mydom /dev/null :0 { :0c: ${DEFAULT} :0 fhw | cat - ; \ echo "======= Forwarded Mail ==========" :0 fhw | formail -A"X-Loop: myid@myhost.mydom" :0 !forward@myhost.mydom }
This is a terribly complicated subject involving many many features
which I do not know. Let's nevertheless look at some further example
recipes.
# Matching a few undelivery and such reports :0: * ^Subject:.*Undeliver(ed|able) (e)?mail|\ ^Subject:.*Returned (spam )?(e)?mail * ^TO_(myid|firstname\.lastname)@([-a-z0-9_]+\.)*myhost\.mydom Returned.mailConsider the first rule of the recipe above. It will match all email with the following on the "Subject:" line in the header:
* ^Subject:[ ]+Undeliver(ed|able) (e)?mailIn other words only spaces and/or tabs are allowed between "Subject:" and the start of the actual subject.
Let's consider another example. Say that we have two hosts
:0: * ^From:.*cyber.com([^\.]|$) ProbableSpam.mailThat is, do not allow a dot after the .com or alternatively require that the line ends there. However, cyber.comet would be matched! Thus, depending on what you want to achieve, you might have e.g.
:0:
* ^From:.*cyber.com( |"|>|$)
ProbableSpam.mail
What is the difference between the rules below?
* ^From:.*myid@([-a-z0-9_]+\.)*myhost.mydom * ^From:.*myid@([-a-z0-9_]+\.)?myhost.mydom * ^From:.*myid@([-a-z0-9_]+\.)+myhost.mydomThe first one matches any of
Symbol | Interpretation |
---|---|
* | Match zero or more times |
? | Match zero or one times |
+ | Match one or more times |
. | Any character |
[ ] | Match from the list within the backets |
^ | The start of the line (within [] however, a negation) |
$ | The end of the line |
\ | Quote the next character to take it literally |
( ) | Grouping |
Basically the syntax for variable value tests is
VAR1_=Whichever expression you devise :0: * VAR1_ ?? regexp whereverBut you can build rules like
VAR1_=Whichever expression you devise VAR2_=whatever :0: * $ VAR1_ ?? ${VAR2_} whereverNote, however, that the above still is regular expression matching, not an equality.
The blank after the first $ is significant. It tells that the variable references on the line (${VAR2_}) are to be expanded, not to be taken as a literal text.
Feedback: That's easily resolved using $\var expansion and anchoring both ends of the regexp:
* VAR1_ ?? $ ^^$\VAR2_^^That condition will succeed if and only if VAR1_ and VAR2_ have the same contents, with the possible exception of VAR1_ having one more trailing newline than VAR2_.
Date: 09 Dec 1999 23:06:41 -0600
ts@UWasa.Fi (Timo Salmi) wrote:
From: Philip Guenther
Newsgroups: comp.mail.misc
Subject: Re: procmail, trivial html detection, and a quirk
> I just noted that, at least in procmail v3.13.1 1999/04/05
>
> :0B:
> * </body>
> * </html>
>
> does not work. Instead one has to apply
>
> :0B:
> * [<]/body>
> * [<]/html>
That last one is generally avoided because it looks like you're
using the \< regexp special when you really aren't. Putting the
'<' or '>' in brackets also works, as you did above, but it
slows down the matching ever so slightly as a character class is
slower to match than a single normal character. Thus, one of the
above four methods is usually preferred.
Philip Guenther
(Timo's addendum: As far as I understand \< is a word-boundary in procmail. Hence \< is best avoided, when not used as an actual boundary.)
I know how to sort my incoming email with procmail into different folders, but how do I use formail to automatically add some suitable identification text to the subject line of the email that I receive?
The general idea is this
#Get the subject discarding any leading and trailing blanks SUBJ_=`formail -xSubject: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` * YourFirstSelectionCriterion { :0 fwh | formail -I"Subject: WhateverYouAdd_1 ${SUBJ_}" :0: YourFirstFolder } * YourSecondSelectionCriterion { :0 fwh | formail -I"Subject: WhateverYouAdd_2 ${SUBJ_}" :0: YourSecondFolder }The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is about the header of the email message.
The -I option in formail removes and replaces the old header. Should you wish to retain the old subject header with an "Old-" prefix added, use -i instead.
Here are a few ideas:
The combination of quoting and regular expressions can cause some subtle problems when the Unix echo and one of the greps (grep, fgrep, egrep) is used in the procmail recipes.
Consider
SUBJ_=`formail -c -xSubject:` # Responses to filter reports :0: * -1^0 * 1^0 $ ? echo \"${SUBJ_}\" | fgrep -is 'Re: Filter report' * 1^0 ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom Response.mail
SUBJ_=`formail -c -xSubject: \ | expand | sed -e 's/[;|\$\`\\]/ /g' \ | sed -e 's/ */ /g' \ | sed -e 's/(/\\\(/g' -e 's/)/\\\)/g' \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
To get a log of what happens you set at the beginning of your ~/.procmailrc recipes file
SHELL=/usr/bin/sh # Use Bourne shell MAILDIR=${HOME}/Mail # Customize as appropriate LOGFILE=${MAILDIR}/procmail.log # Your procmail log VERBOSE=yes # Produce full information LOGABSTRACT=all # - " -However, this produces so much information that it is not convenient for a routine checking by a visual examination. But you can include a suitable (dummy) variable definition in each one of your recipes and then search the log file for occurrences of that variable. Here is an example demonstrating how it goes. Consider a recipe that originally is
# Discard probable spam mail, set 1 :0: * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * 1^0 ^From:.*alladvantage.com * 1^0 ^From:.*ameritech.net * 1^0 ^From:.*bellatlantic.net ProbableSpam.mailChange this to be
:0 * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * 1^0 ^From:.*alladvantage.com * 1^0 ^From:.*ameritech.net * 1^0 ^From:.*bellatlantic.net { :0 { RULE="Discard probable spam mail, set 1" } :0: ProbableSpam.mail }Apply the same principle for all your recipes in your ~/.procmailrc file. Then, as email has arrived, you can check which rules have been used by searching the log file with the command grep "RULE=" ${HOME}/Mail/procmail.log. If you need this regularly, make the grep search one of your Unix scripts:
#!/usr/bin/sh grep "Assigning \"RULE=" ${HOME}/Mail/procmail.logIn the altered procmail recipe, further up, carefully note some of the syntax
:0 * ^TO_my-mailing-list { :0 * ^From:.*@([-a-z0-9_]+\.)*myhost\.mydom { :0 { RULE="To my-mailing-list, probably legitimate" } :0: ${DEFAULT} } :0E { :0 { RULE="To my-mailing-list, probably spam" } :0: Spam.mail } }
There is a very good page by Walter Dnes explaining the method. So for once I'll direct you elsewhere. The method relies on ad-hoc approximation. In brief, scoring is used to detect if more than 5 per cent of the characters in the body of the message are high-bit characters typical of the said language codes. If you have gone through the items in my procmail FAQ, it should be easy to understand the inventive method given on Walter's page. Also see the exercise at the end of the current FAQ involving detecting Korean.
I have a cellular phone. I want to save the incoming email normally and also to send a modified copy to my second account (a Short Message Service). The forwarded copy should include the original subject AND five lines of the original message text. The original body should not be included. Is this possible with procmail?
Well yes, it is. It takes some figuring out needing many of the principles presented in the other items in my proctips collection. It also needs a few tricks with Bourne shell programming. Perhaps most importantly, this item demonstrates how to put the body of the message into a variable.
# Customize these paths if they do not match yours
SHELL=/usr/bin/sh
SENDMAIL=/usr/lib/sendmail
:0
* ^Subject:.*Timo testing
{
# Put the email intact in the default folder
:0c:
${DEFAULT}
# The "c" flag above tells the recipe to continue
# Now we prepare a different version of the message
:0
{
# Get the subject into a variable
# Expand the possible tabs into blanks
# Discard any leading and trailing blanks
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Get the body of the message into a variable
# Accept only the first five lines
# Discard newlines, i.e. put everything on one line
BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
}
# Prepare and send a message with no body
# -X "" extracts just the header (discards the body)
# Plug in the new subject
# Content fields might cause problems if not discarded
# Change to To: address
:0:proc.lock
| formail -X "" \
-I"Subject: ${SUBJ_} ${BODY_}" \
-i"Content-Type:" \
-i"Content-Length:" \
-I"To: your@second.address" \
| ${SENDMAIL} -t
}
The line
BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`retrieves the first five lines from the body of the text. It would be more useful to retrieve a specified number of characters from it. Say we wish to retrieve 160 characters. This is how to do that.
BODY_=`sed -e '1,/^$/ d' | tr -d '\n' | dd bs=1 count=160`Solving the alternative of having a maximum of 160 characters in the concatenated SUBJ_ and BODY_ is left as an exercise to the reader.
There also is another, more important improvement that can be made in the action above. Replace tr -d '\n' with tr '\n' ' ' so that when the lines are concatenated a space is put in between them.
The recipe below assumes that the signature properly adheres to the Internet "-- " convention to denote where the signature starts.
:0
* ^Subject: Whatever
{
:0 fbw
| sed -e '/^-- /,$ d'
:0:
${DEFAULT}
}
Let's look at what we've got:
sed -e '/^-- /,$ d'
with
sed -e '/^-- /,/^$/ d'
will instead delete everything starting from the "-- " until the first encountered empty line. Thus if there is e.g. an attachment after the signature, the attachment will not be thrown away.
Unix manuals are not very helpful as starting points, but after you have got the rudiments under your belt, you may wish to browse the following manuals for additional information. Below is a simple "manuals" Bourne shell script. It prepares plain text format files of some of the essential Unix man manuals for a procmail user, especially suited for offline reading.
Note that the "^H" is not a "^" and an "H", but a CTRL-H, i.e. ASCII 8 (the backspace character). To make the "manuals" file executable type "chmod u+x manuals".
#!/bin/sh
TODIR=${HOME}/myman
echo ${TODIR}
man egrep | sed -e 's/_^H//g' > ${TODIR}/egrep.man
man formail | sed -e 's/_^H//g' > ${TODIR}/formail.man
man procmail | sed -e 's/_^H//g' > ${TODIR}/procmail.man
man procmailex | sed -e 's/_^H//g' > ${TODIR}/procmaex.man
man procmailrc | sed -e 's/_^H//g' > ${TODIR}/procmarc.man
man regexp | sed -e 's/_^H//g' > ${TODIR}/regexp.man
man sendmail | sed -e 's/_^H//g' > ${TODIR}/sendmail.man
ls -lF ${TODIR}
Many of the recipes in this FAQ utilize sed and/or awk. Some useful links (note, however, as is common with links, I can't guarantee that they still are current):
Yes, it is, but it is not quite as straight-forward as one would expect.
Since this is a procmail, not the vacation program advice collection I'll assume that you are reasonably familiar with the vacation program. If not, start with "man vacation". You have to use procmail to customize the ~/.vacation.msg file because when invoked via procmail, the vacation $SUBJECT variable is not necessarily set.
Usually, when vacation is used, it is first called interactively to crate the ~/.vacation.msg file and to replace the ~/.forward file. If you are going to use the procmail solution it is very important not to do this. In particular, the ~/.forward file must not be touched in any way. The reason is that in this solution it is used to to invoke procmail, not vacation. (The vacation program is, of course, called by procmail now.)
# Set a number of variables high up in your ~/.procmailrc
# Get the subject discarding any leading and trailing blanks
# Prepare the vacation message's base
# Here we go ivoking vacation and also saving the email
#
VACATION=/usr/bin/vacation
ONVACAT=yes
VACFREQ=5d
VACMSG=${HOME}/.vacation.msg
MYNAME_="MyFirstName MyLastName"
MYEMAIL_=myid@myhost.mydom
#
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# This is done only once in ~/.procmailrc
#
:0 cwi
* ONVACAT ?? ^^yes^^
| echo "From: ${MYEMAIL_}" > ${VACMSG} ;\
echo "Subject: ${MYNAME_}, away from my mail" >> ${VACMSG} ;\
echo "X-Loop: myid@myhost.mydom" >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
echo "Thank you for your email about:" >> ${VACMSG} ;\
echo "\"$SUBJ_\"" >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
echo "Your email will be seen to when I return." >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
cat ${HOME}/.signature >> ${VACMSG}
# You might have serveral, different of these recipes
#
:0
* ^Subject:.*Whatever
{
:0
{ RULE="Testing" }
:0 cwi
* ONVACAT ?? ^^yes^^
* ! ^X-Loop:.*myid@myhost\.mydom
| ${VACATION} -t${VACFREQ} myid
:0:
WhateverFolder
}
Feedback: Maybe I [Collin Park] can add one more comment: I think you need a global LOCKFILE to cover the area from when you generate the vacation message to the place where you invoke $VACATION.
Otherwise, message #N may generate .vacation.msg, then message #N+1 overwrites it before #N invokes $VACATION.
It is nice that you have found my proctips so useful that you ask for my personal advice. Nevertheless, if you ask me by email for individualized procmail consultation my response has to be similar to that as in asking me for any programming advice. Briefly, the response is that I do not do email consultation. If you have a procmail related problem please post your question to the Usenet news to a newsgroup like comp.mail.misc. The added advantage of posting is that in a newsgroup both the question and the potential answers will have a wider forum. That way everyone will benefit.
On rare occasions I have also been asked to email my own personal ~/.procmailrc or my own spamfoiling scripts. The answer is a definite no. There are two main reasons. First, that material is private. Second, I have neither the willingness nor the time to send out material to users on individual requests. If and when I want to share my material I make it available for the users to themselves retrieve it via WWW or FTP.
Yes, notably this:
Programming | |
---|---|
Turbo Pascal
programming material
|
|
MS-DOS batch programming
material
|
|
Unix Bourne
shell scripts programming material
|
|
Etc | |
More links to Timo's FAQ materials
|
Let's see if we can put to work the methods presented in this FAQ to solve some tasks, part of them having come up on the Usenet news.
Ex.1) Keep a copy of incoming email, and at the same time, get only the first five lines from the message body and forward it to another account.
# Discard potential email loops :0 * ^X-Loop: myid@myhost\.mydom /dev/null :0 * Any rule(s) you might wish to have { # Keep a copy, but don't stop yet ( the c ) :0c: ${DEFAULT} # Comment with "Old-" the Content-Length field from the header # Ensure that a whitespace exists between field name and content :0 fwh * ^Content-Length: | formail -z -i"Content-Length:" # Add the loop avoidance # ( f for piping; w for waiting for completion; h for headers ) :0 fwh | formail -A"X-Loop: myid@myhost.mydom" # Truncate the body ( the b ) to five lines :0 fwb | head -5 # Forward to the other account :0 ! myid2@myhost.mydom }It is important to handle the content-length header-field when the length of the email is altered. This is done to ensure that the receiving email program will not break the forwarded message when it is read. The -i switch is used to retain the information about the original message length to the attention of the receiver.
Ex.2) Forward the first 10 lines of the message body to the user's second account while preserving all the original message headers -- I.e, at the receiving side, the user wants to see all the message travel history and only first 10 line of the message body.
This is a more complicated version of the first exercise. The transformed task is not trivial, since when you forward, the original message headers will be replaced by your forwarding headers. Therefore, you'll have to see to preserving also the original headers. Below is how I would solve the problem based on several items in this FAQ.
# A trick to extract the subject into a variable
# The actual recipe to solve the exercise starts here
# Discard potential email loops
SUBJ_=`formail -c -xSubject: | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0
* Whatever condition(s) you wish to select the messages for forwarding
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
:0c: #If you want to, preserve a full copy of the email, else omit
${DEFAULT}
:0fwh #Preserve the information about the original content length
* ^Content-Length:
| formail -z -i"Content-Length:"
:0fwb #Truncate the body of the message to ten lines
| head -10
:0fwh #Insert a blank line at the beginning of the body for clarity
| cat - ; echo ""
:0fwh #Store the original headers, quoting them to avoid problems
| sed -e 's/^/\> /'
:0fwh #Insert some of your own information before forwarding
| formail -A"X-Loop: myid@myhost.mydom" \
-A"X-Info: Forwarded body truncated to 10 lines" \
-i"Subject: $SUBJ_ (fwd)"
#Finally, forward the adjusted email
:0
!my2dnId@myhost.mydom
}
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null
Feedback:
The recipe with head probably needs an "i" on the flags line, as:
since write errors on the pipe are likely for messages larger than a
certain size. (I've seen numbers like 4096 and 10240... it apparently
varies with the system.)
Ex.3) Match a potential [TS999] identification in the Subject header, such as "[TS001] Timo testing". If found, insert a "Subject id: [TS999]" as the first line in the body of the message. (The rest of the original subject line must not reappear in the id.)
:0 * ^Subject:.*\/\[TS[0-9]+\] { :0 fhw | cat - ; \ echo "Subject id: ${MATCH}" :0: ${DEFAULT} }But what if you do want to include the rest of the original subject line? In that case use
* ^Subject:.*\/\[TS[0-9]+\].*
Ex.4) Multi-part messages (which typically
include attachments) have in their headers a field like the two
examples below:
Content-Type: multipart/mixed; boundary=ELM965173874-25050-0_
Write a recipe that inserts into a variable (call it BOUND) the
boundary string. Note that the potential quotes (") are not to be
part of that string. Also note that the header might be divided on
multiple lines as in
Content-Type: multipart/mixed;
boundary="------------BA45271FBDAA479CECA7E20A"
There are alternative solutions, which not necessarily are quite equivalent. The first one is putting high up in your ~/.procmailrc recipe file the line(s)
BOUND1=`formail -z -x"Content-Type:" \
| awk -F= '{ print $2 }' \
| sed -e 's/\"//g' | tr -d '\n'`
A second one is:
:0h
* ^Content-Type:
{ BOUND2=`egrep -i 'boundary=' \
| awk -F= '{ print $2 }' | sed -e 's/\"//g'` }
This was not in the exercise, but you can then have recipes like
:0:
* ! BOUND2 ?? ^^^^
WhateverFolder
Ex.5) Identify if the arriving email is in Korean. If so, return the message to the sender and his/her postmaster. Ignore a potential Reply-To: field in the header. Avoid email loops. Avoid forgeries which appear to come from your own host. Avoid forgeries which lack a host name. Be careful not to take Finnish/Swedish or French as Korean.
This is quite a difficult exercise with many details involved.
# Get the sender's address, ignore Reply-To:
# Get the sender's host
# Your path to sendmail
# Reject probable Korean email using character scoring
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`
SENDMAIL="/usr/lib/sendmail"
:0
* ! ^X-Loop:.*myid@myhost\.mydom
* ! $ ? echo ${FHOST_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FHOST_} | fgrep -is '.'
{
:0BD
* -1^1 .
* 2^1 =[0-9A-F][0-9A-F]
* 20^1 [¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿]
* 20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
* 20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 =[89A-F][0-9A-F]
* -20^1 [åÅäÄöÖàáâçèéêë]
* -20^1 =(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)
{
:0
{ RULE="Probable Korean email" }
#
:0c:${HOME}/procmail.lock
| expand | sed -e 's/[ ]*$//g' \
| sed -e 's/^/ /' > ${HOME}/procmail.reject.korean
#
:0:${HOME}/procmail.lock
| (formail -r -I"Subject: Autorejected email" \
-I"To: ${FROM_}" \
-I"Cc: postmaster@${FHOST_}" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected probable Korean email ---" ; \
echo "" ; \
cat ${HOME}/procmail.reject.korean ; \
echo "--- end of rejected probable Korean email ---" ; \
rm -f ${HOME}/procmail.reject.korean) \
| ${SENDMAIL} -t
}
}
Ex.6) If the subject of the email contains the identifier [INFO], in capitals, put the body of the incoming email into a temporary file. Ensure that the name of the temporary file is unique. Insert the full subject line at the top of the temporary file. (Why, and what then is beyond this exercise.)
#Get the subject discarding any leading and trailing blanks
# Assign a temporary file name
:0D
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
TMPFILE_=proctemp.$$
* ^Subject.*\[INFO\]
{
:0 fwbi
| echo "Subject: ${SUBJ_}" > ${TMPFILE_}; \
echo >> ${TMPFILE_}; \
cat >> ${TMPFILE_}
}
Ex.7) If the email comes from a certain sender, check if the time-zone information is present in the Date header. If not, add it assuming +3 hours.
#Get the date discarding any leading and trailing blanks
:0
DATE_=`formail -xDate: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
* ^From:.*TheCertainSender
* ! ^Date:.*(EET|DST|GMT)
{
:0 fwhi
| formail -i"Date: ${DATE_} +0300 (EET DST)"
:0:
${DEFAULT}
}
Ex.8) The simple spamfoling recipe below
won't work. Correct it.
:0:
* !^TO$USER@xxxxxxx.xxx
ProbableSpam.mail
{
:0
{ USER=`whoami` }
:0:
* $ ! ^TO_${USER}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
}
Another solution:
:0:
* $ ! ^TO_${LOGNAME}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
Ex.9) Insert at the beginning of the
subject the date/time of receiving the incoming message in the
YYYYMMDD HHMMSS format.
* Whatever rules
{
:0
{ SUBJ_=`formail -c -xSubject: \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` }
:0
{ DATETIME_=`date "+%Y%m%d %k%M%S"` }
:0 fhwi
| formail -I"Subject: ${DATETIME_} ${SUBJ_}"
:0:
${DEFAULT}
}
Ex.10) This partly is based on an actual
incident. Consider the follwing recipe with three small, but crucial
syntax errors, and one omission. Find them.
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com\|
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
:0
{RULE="Abuse reception notes"}
:0
ReceivedNotes
}
The answer is a bit further down
:
:
:
:
:
:
:
:
:
:
:
:
:
:
* ^From:.*(\
(abuse(-news)?|acct_closed)@\
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com|\
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
:0
{ RULE="Abuse reception notes" }
:0:
ReceivedNotes
}
Ex.11) Write a recipe to match the subject
line below. The (RECENT) may or may not be there, and the numbers
will change from posting to posting.
Subject: Re: [SpamCop:(RECENT)38.204.225.29,id:16135684] Make lotsof $$$
:0:
* ^Subject: Re: \[SpamCop:(\(RECENT\))?[0-9\.]+,id:[0-9]+\]
WhateverFolder
Ex.12) It is fairly common that spam email
has the same sender and recipient in the From: and To: fields.
Device a recipe that detects such postings.
This is not quite as simple as it first sounds, since it is advisable to take into the account the fact that the contents of the two fields may not be quite identical even in the case of the actual addresses being the same. Thus I would use regular expression matching both ways as below as one of the optional solutions. By default, variable comparisons are regular expression matching, not strict equalities. Also note avoiding email loops and falsely targeting email which one may have sent to oneself.
WHOFROM=`formail -xFrom: \
WHOTO=`formail -xTo: \
:0:
| expand \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
| expand \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
* -100^0 ^X-Loop: myid@myhost\.mydom
* -100^0 ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
* -100^0 ^From:.*LegitimateMailingList
* 1^0 $ WHOFROM ?? ${WHOTO}
* 1^0 $ WHOTO ?? ${WHOFROM}
ProbableSpam.mail
Ex.13) Write a (spam avoidance) recipe to
detect email with more than seven recipients in the "To:" header
field. Assume for simplicity that each address will have exactly one
"@" character in it.
:0
* ^Subject:.*The information you requested
{
:0
{
WHOTO=`formail -z -xTo:`
COUNT=`echo ${WHOTO} | sed -e 's/[^@]//g' | wc -c`
COUNT1=`expr ${COUNT} - 1`
ISGT=`expr ${COUNT1} \> 7`
}
:0:
* ISGT ?? ^^1^^
ProbableSpam.mail
}
Ex.14) Make procmail forward email that
arrives between 9am and 5pm to a predefined daytime email address.
:0
# Omit the condition line below if this is for all email
* ^Subject:.*Whatever
{
:0
{
TIME=`date +%H%M`
ISGT=`expr ${TIME} \> 0900`
ISLT=`expr ${TIME} \< 1700`
}
:0
* ISGT ?? ^^1^^
* ISLT ?? ^^1^^
! daytime_forward_address
}
Davey, David
Dnes, Walter
Eriksson, Era
Guenther, Philip
Hebeisen, Christoph
Hirvonen, Hannu
Menezes, Evandro
Park, Collin
Van Steenkist, Vernon
Any errors and inadequacies are, however, solely my own responsibility.
A legal note: The author shall not be liable to the user, the reply target or any third party for any direct, indirect or consequential loss or damage arising from using, abusing, or a failure to be able to use, the information in this message/file howsoever caused. No warranty is given that all the information contained is correct, or that it is current.
[ts@uwasa.fi] [Photo] [Programs] [Research] [Lectures] [Department] [Faculties] [University]