Discussion:
header_re patch for Page.getPageText() (Thomas Werschlein)
Nir Soffer
2005-05-05 06:37:31 UTC
Permalink
I noticed, that Page.getPageText() does not limit header searches for
pragmas and comments to the beginning of the page. Therefore, if
a user, e.g. enters Java comments into a verbatim section such
as
{{{
# java comment
}}}
AND there are no pragmas/comments before this java comment hash
the body will start only after the java comment.
Removing the re.MULITLINE flag when compiling the regexp did solve the
problem for me.
diff -u -r1.1 -r1.2
--- Page.py 12 Apr 2005 15:00:23 -0000 1.1
+++ Page.py 4 May 2005 11:32:00 -0000 1.2
@@ -1389,7 +1389,7 @@
# Lazy compile regex on first use. All instances share the
# same regex, compiled once when the first call in an
instance is done.
- self.__class__.header_re =
re.compile(self.__class__.header_re, re.MULTILINE | re.UNICODE)
+ self.__class__.header_re =
re.compile(self.__class__.header_re, re.UNICODE)
body = self.get_raw_body() or ''
header = self.header_re.search(body)
The search is done in the raw body of the page. The header_re is used
to show the first interesting search result, which the current search
define as the first result on the page body.

I'm not sure that removing the re.M does not break that re for other
cases. Anyway, I suspect that re is wrong, and have another well tested
re used by section parser.


Best Regards,

Nir Soffer



-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
Loading...