beautifulsoup pickle dict bytestream discrepancy

Bug #995733 reported by Toby Borland
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
New
Undecided
Unassigned

Bug Description

Python dictionaries created using output from BeautifulSoup4.0.4 are incorrectly pickled to bytestream under Python3.2.2 causing a runtime exception on PythonWin under Windows XP SP3 x86.
This problem will not manifest when the bytestream is unpickled. See attached script for problem recreation.

Revision history for this message
Toby Borland (tobyborland) wrote :
Revision history for this message
Leonard Richardson (leonardr) wrote : Re: [Bug 995733] [NEW] beautifulsoup pickle dict bytestream discrepancy

I can't duplicate the error, but I'm not on Windows. I believe the
problem is real, but I have no idea how to go about diagnosing or
fixing it.

I would not expect pickle.dumps(D_type) to equal pickle.dumps({'bug':
'test'}), because BS4_test.body.string is a NavigableString and 'bug'
is a string. When you pickle a NavigableString you are pickling the
entire Beautiful Soup tree to which it is connected.

This is inefficient, but it should work without giving a RuntimeError.
A more efficient technique, and one you can probably use as a
workaround, is to convert the NavigableString into a normal string
with str(soup.body.string) when you remove it from a Beautiful Soup
context.

Is your RuntimeError the same one seen here?

http://groups.google.com/group/beautifulsoup/browse_thread/thread/acc192642652aa4f

If you're up to it, you could run the unit test suite on Windows and
see what results you get.

> Python dictionaries created using output from BeautifulSoup4.0.4 are incorrectly pickled to bytestream under Python3.2.2 causing a runtime exception on PythonWin under Windows XP SP3 x86.
> This problem will not manifest when the bytestream is unpickled. See attached script for problem recreation.
>
> ** Affects: beautifulsoup
>     Importance: Undecided
>         Status: New
>
> --
> You received this bug notification because you are subscribed to
> Beautiful Soup.
> https://bugs.launchpad.net/bugs/995733
>
> Title:
>  beautifulsoup pickle dict bytestream discrepancy
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/beautifulsoup/+bug/995733/+subscriptions

Revision history for this message
Toby Borland (tobyborland) wrote :

Thanks for quick analysis turnaround.
I have been unaware of the NavigableString object in tow, the behaviour is not a bug.The runtime error would be the same for a longer html instance, triggering a recursion limit exception.
BeautifulSoup testing.py fails at line 271 with a syntax error, possibly quote handlingBeautifulSoupTests.py fails at line 84, syntax errorI will analyse platform issues further later.

> Date: Mon, 7 May 2012 12:01:53 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: Re: [Bug 995733] [NEW] beautifulsoup pickle dict bytestream discrepancy
>
> I can't duplicate the error, but I'm not on Windows. I believe the
> problem is real, but I have no idea how to go about diagnosing or
> fixing it.
>
> I would not expect pickle.dumps(D_type) to equal pickle.dumps({'bug':
> 'test'}), because BS4_test.body.string is a NavigableString and 'bug'
> is a string. When you pickle a NavigableString you are pickling the
> entire Beautiful Soup tree to which it is connected.
>
> This is inefficient, but it should work without giving a RuntimeError.
> A more efficient technique, and one you can probably use as a
> workaround, is to convert the NavigableString into a normal string
> with str(soup.body.string) when you remove it from a Beautiful Soup
> context.
>
> Is your RuntimeError the same one seen here?
>
> http://groups.google.com/group/beautifulsoup/browse_thread/thread/acc192642652aa4f
>
> If you're up to it, you could run the unit test suite on Windows and
> see what results you get.
>
> > Python dictionaries created using output from BeautifulSoup4.0.4 are incorrectly pickled to bytestream under Python3.2.2 causing a runtime exception on PythonWin under Windows XP SP3 x86.
> > This problem will not manifest when the bytestream is unpickled. See attached script for problem recreation.
> >
> > ** Affects: beautifulsoup
> > Importance: Undecided
> > Status: New
> >
> > --
> > You received this bug notification because you are subscribed to
> > Beautiful Soup.
> > https://bugs.launchpad.net/bugs/995733
> >
> > Title:
> > beautifulsoup pickle dict bytestream discrepancy
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/beautifulsoup/+bug/995733/+subscriptions
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/995733
>
> Title:
> beautifulsoup pickle dict bytestream discrepancy
>
> Status in Beautiful Soup:
> New
>
> Bug description:
> Python dictionaries created using output from BeautifulSoup4.0.4 are incorrectly pickled to bytestream under Python3.2.2 causing a runtime exception on PythonWin under Windows XP SP3 x86.
> This problem will not manifest when the bytestream is unpickled. See attached script for problem recreation.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/beautifulsoup/+bug/995733/+subscriptions

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.