I’m trying to figure out why – this –
# read input file
open(INFILE, “< $infile") || die "Could not open file: $infile ($!)nn";
open(OUTFILE, ">$outfile”) || die “Could not open file: $outfile ($!)nn”;
while (
chomp();
# data has begun
$string = ““;
if ($_ =~ m/Content (results/alpha)/i) {
$go = “true”;
}
# data has ended
$string = ““;
if ($_ =~ m/Content (results_next)/i) {
$go = “false”;
}
# process data
if ($go eq ‘true’) {
# make it all one line
#$_ =~ s/n//ig;
#$_ =~ s/t//ig;
$_ =~ s/s{2,}//ig;
# concatenate fields
#$_ =~ s/< /font>/|/ig;
# break one record per row
#$_ =~ s/< /tr>/< /tr>/ig;
# remove all HTML
#$_ =~ s/-//ig;
#$_ =~ s/%//ig;
#$_ =~ s////ig;
#$_ =~ s/://ig;
#$_ =~ s/_//ig;
#$_ =~ s/=//ig;
#$_ =~ s/”//ig;
#$_ =~ s/?//ig;
#$_ =~ s/.//ig;
#$_ =~ s/&/&/g;
#$_ =~ s/< ([^>]|n)*>//gs;
#$_ =~ s/< (.*?)>//gs;
#$_ =~ s/(S{2})s{1}(d{5})/$1|$2/gs;
# append to array
print OUTFILE $_;
}
}
close(INFILE) || die “Could not close file: $outfile ($!)nn”;
close(OUTFILE) || die “Could not close file: $infile ($!)nn”;
print “nnParsing Completed!nn”;
exit;
Won’t trim out certain html bits out of this….
this is the result I get – ADLER AFC AG AGREDA AGREDA, AGUIAR AGUIRRE AHLRICH ALAM, ALBERNI ALDECOA ALDO, ALFANO ALFONSO ALFREDO ALJOE, ALONSO, ALVAREZ AMADO, AMDUR, ANDRADE ANTONIO ANWER APPEL, ARAZOZA AREEA ARES,
from thislink
also here – saved from url=(0364)http://yp.bellsouth.com/yp/layout/layout2.jhtml?from=yp&page=results&mousetrap=A1=null|G1=254535176|M7=F|Q1=yp|Q14=0%2C243|Q15=Accountants|Q17=Accountants|Q18=null|Q19=null|Q2=results|Q20=null|Q21=search|Q22=100%2C105|Q25=Miami|Q26=FL|Q27=923368|Q3=MIFL|Q31=0|Q32=A|Q33=R|Q4=C|Q5=ACCOUNTANTS+CERTIFIED+PUBLIC|Q6=C|Q7=0%2C243|Q8=5|R1=1|R2=true&sn=2&_requestid=68411
My brain’s not getting the missing bit.
instead of the whole shebang… and perl geniuses have an idea? I’m on crunch time.