PhpSpreadsheet

alex/PhpSpreadsheet

Fork 0

Commit Graph

Author	SHA1	Message	Date
Owen Leibman	6080c4561d	Improve Coverage for HTML Reader Reader/Html is now covered except for 1 statement. There is some coverage of RichText when you know in advance that the html will expand into a single cell. It is a tougher nut, one that I have not yet cracked, to try to handle rich text while converting unkown html to multiple cells. The original author left this as a TODO, and so for now must I. It made sense to restructure some of the code. There are some changes. - Issue #1532 is fixed (links are now saved when using rowspan). - Colors can now be specified as html color name. To accomplish this, Helper/Html function colourNameLookup was changed from protected to public, and changed to static. - Superfluous empty lines were eliminated in a number of places, e.g. <ul><li>A</li><li>B</li><li>C</li></ul> had formerly caused a wrapped cell to be created with 2 empty lines followed by A, B, and C on separate lines; it will now just have the 3 A/B/C lines, which seems like a more sensible interpretation. - Img alt tag, which had been cast to float, is now used as a string. Private member "encoding" is not used. Functions getEncoding and setEncoding have therefore been marked deprecated. In fact, I was unable to get SecurityScanner to pass any html which is not UTF-8. There are possibly ways of getting around this (in Reader/Html - I have no intention of messing with Security Scanner), as can be seen in my companion pull request for Excel2003 Xml Reader. Doing this would be easier for ASCII-compatible character sets (like ISO-8859-1), than for non-compatible charsets (like UTF-16). I am not convinced that the effort is worth it, but am willing to investigate further. I added a number of tests, creating an Html directory, and moving HtmlTest to that directory.	2020-06-25 22:42:38 -07:00

Author

SHA1

Message

Date

Owen Leibman

6080c4561d

Improve Coverage for HTML Reader

Reader/Html is now covered except for 1 statement.
There is some coverage of RichText when you know in advance that the
html will expand into a single cell.
It is a tougher nut, one that I have not yet cracked,
to try to handle rich text while converting unkown html to multiple cells.
The original author left this as a TODO, and so for now must I.

It made sense to restructure some of the code. There are some changes.
- Issue #1532 is fixed (links are now saved when using rowspan).
- Colors can now be specified as html color name. To accomplish this,
  Helper/Html function colourNameLookup was changed from protected
  to public, and changed to static.
- Superfluous empty lines were eliminated in a number of places, e.g.
  <ul><li>A</li><li>B</li><li>C</li></ul>
  had formerly caused a wrapped cell to be created with 2 empty lines
  followed by A, B, and C on separate lines; it will now just have the
  3 A/B/C lines, which seems like a more sensible interpretation.
- Img alt tag, which had been cast to float, is now used as a string.

Private member "encoding" is not used. Functions getEncoding and setEncoding
have therefore been marked deprecated. In fact, I was unable to get
SecurityScanner to pass *any* html which is not UTF-8. There are
possibly ways of getting around this (in Reader/Html - I have no
intention of messing with Security Scanner), as can be seen in my
companion pull request for Excel2003 Xml Reader. Doing this would be
easier for ASCII-compatible character sets (like ISO-8859-1),
than for non-compatible charsets (like UTF-16). I am not
convinced that the effort is worth it, but am willing to investigate
further.

I added a number of tests, creating an Html directory, and moving
HtmlTest to that directory.

2020-06-25 22:42:38 -07:00

1 Commits