Commit Graph

93 Commits

Author SHA1 Message Date
Adrien Crivelli 395b750030
Stricter visibility 2020-07-19 12:30:31 +09:00
Adrien Crivelli c3fa31de13
Missing typing 2020-07-19 12:21:40 +09:00
MarkBaker 6cbb622a9e Minor refactoring 2020-07-05 18:25:39 +02:00
MarkBaker cf6769eab1 Hopefully a final phpcs fix before I start looking at how to run a pre-commit hook on Windows 10 2020-07-05 17:17:56 +02:00
MarkBaker e89196f65b phpcs fixes, although I thought I'd successfully added the pre-commit hook to pick those up before the commit, guess the problem is running from Windoze, so I'll have to address that 2020-07-05 17:05:44 +02:00
MarkBaker 8629337101 Retrieving print/page setup for the Xml Reader 2020-07-05 16:22:35 +02:00
oleibman 38fab4e632
Fix for #1505 (#1525)
This problem is the same as #1238, which was resolved by #1239.
For that issue, the fix was to check in one place whether
$this->mapCellXfIndex[$xfIndex] was set before using it.
The sample spreadsheet supplied as a description for this
problem had exactly the same problem in 2 other places in the code.
In addition, there were 7 other places in the code where that
particular item was used unchecked. This fix corrects all 9 locations.
The spreadsheet supplied with the problem is used as the basis
for some new tests, which particularly test column dimensions
and styles, the problems involved in this case.
2020-06-19 21:01:18 +02:00
oleibman 262896086a
Improve Coverage for Sylk (#1514)
* Improve Coverage for Sylk

I believe that both BaseReader and Sylk Reader are now 100% covered.

Documentation available for this format is sparse.
It was always incomplete, and in some cases inaccurate.
My goal was to use PhpSpreadsheet to load the test file,
save it as Xlsx, and visually compare the two, then add a test
loaded with assertions. Cell values and calculated values,
and border styles were generally handled pretty well without changes.
Other types of styling were not handled so well. I added a few cells
to exercise some previously uncovered code.

Sylk files must be ASCII. I have deprecated the use of the
setEncoding and getEncoding functions, which had no test cases.
2020-06-19 20:35:44 +02:00
oleibman 73379cdfb1
Improve Coverage for Gnumeric (#1517)
* Improve Coverage for Gnumeric

I believe that both BaseReader and Gnumeric Reader are now 100% covered.

My goal was to use PhpSpreadsheet to load the test file,
save it as Xlsx, and visually compare the two, then add a test
loaded with assertions. Results were generally pretty good,
but there were no tests with assertions. I added a few cells
to exercise some previously uncovered code. Code was extensively
refactored; logic changes are noted below.

Code allowed for specifying document properties in an old format.
I considered removing that, but I found the original spec at
http://www.jfree.org/jworkbook/download/gnumeric-xml.pdf
This allowed me to create an old file, which was not handled
correctly because of namespace differences. The code was corrected
to allow for this difference.

Added support for textRotation.

Mapping of fill types was not correct.

* PHP7.2 Error

One assertion failed under PHP7.2. Apparently there was some change in
the handling of SimpleXMLElement between 7.2 and 7.3. Casting to string
before use eliminates the problem.

* Scrutinizer Recommendations

All minor, solved (hopefully) mostly by casts.

* One Last Scrutinizer Fix

... I hope.
2020-06-19 20:34:02 +02:00
oleibman 585409a949
Testing - Delete Temp Files When No Longer Needed (#1488)
No code changes. The tests in all of these scripts write to at least
one temporary file, which is then read and not used again. The file
should be deleted to avoid filling up the disk system.
2020-05-24 20:03:07 +09:00
oleibman 41b95c1542
CSV Sample File Was Miscoded (#1489)
File author erroneously assumed that backslash was used to escape
quotes in CSV; in fact, doubling the quote is used for escape.
The test still worked, but mainly because the content of the cell
with the escape wasn't tested. The file is now fixed, and
a new test added.
2020-05-24 19:57:39 +09:00
Adrien Crivelli 137268d61a
Remove undesired annotations 2020-05-18 15:49:29 +09:00
Adrien Crivelli fcd9f10663
Update PHP-CS-Fixer rules 2020-05-18 13:49:57 +09:00
Adrien Crivelli e868e58d20
Allow to run an entire folder of tests
We now can do something like:

```sh
./vendor/bin/phpunit tests/PhpSpreadsheetTests/Reader/
```
2020-05-17 18:35:55 +09:00
oleibman 7517cdd008
Improve Coverage for CSV (#1475)
I believe that both CSV Reader and Writer are 100% covered now.

There were some errors uncovered during development.

The reader specifically permits encodings other than UTF-8 to be used.
However, fgetcsv will not properly handle other encodings.
I tried replacing it with fgets/iconv/strgetcsv, but that could not
handle line breaks within a cell, even for UTF-8.
This is, I'm sure, a very rare use case.
I eventually handled it by using php://memory to hold the translated
file contents for non-UTF8. There were no tests for this situation,
and now there are (probably too many).

"Contiguous" read was not handle correctly. There is a file
in samples which uses it. It was designed to read a large sheet,
and split it into three. The first sheet was corrrect, but the
second and third were almost entirely empty. This has been corrected,
and the sample code was adapted into a formal test with assertions
to confirm that it works as designed.

I made a minor documentation change. Unlike HTML, where you never
need a BOM because you can declare the encoding in the file,
a CSV with non-ASCII characters must explicitly include a BOM
for Excel to handle it correctly. This was explained in the Reading CSV
section, but was glossed over in the Writing CSV section, which I
have updated.
2020-05-17 18:15:18 +09:00
Adrien Crivelli f1a019e492
Upgrad PHP deps 2020-04-27 19:29:45 +09:00
bbinotto e2f87e8b7a
Load with styles should not default to black fill color
Fixes #1353
Closes #1361
2020-04-26 22:33:30 +09:00
Matthijs Alles 87f71e1930 Support whitespaces in CSS style in Xlsx
Indentation in the xml leaves spaces in style string even after
replacing newlines. Replacing the spaces ensures no spaces in keys
of the resulting style-array

Fixes #1347
2020-04-05 19:50:57 +09:00
Stronati Andrea 9f5a472426 Fix XLSX file loading with autofilter containing '$'
The `setRange` method of the `Xlsx/AutoFilter` class expects a filter
range format like "A1:E10". The returned value from
`$this->worksheetXml->autoFilter['ref']` could contain "$" and returning
a value like "$A$1:$E$10".

Fixes #687
Fixes #1325
Closes #1326
2020-03-02 18:43:27 +07:00
oleibman 082266aacd Conditionals - Extend Support for (NOT)CONTAINSBLANKS (#1278)
Support for the CONTAINSBLANKS conditional style was added a while ago.
However, that support was on write only; any cells which used
CONTAINSBLANKS on a file being read would drop that style.

I am also adding support for NOTCONTAINSBLANKS, on read and write.
2020-01-04 18:50:04 +01:00
oleibman afd070a756 Handle ConditionalStyle NumberFormat When Reading Xlsx File (#1296)
* Handle ConditionalStyle NumberFormat When Reading Xlsx File

ReadStyle in Reader/Xlsx/Styles.php expects numberFormat to be a string.
However, when reading conditional style in Xlsx file, NumberFormat
   is actually a SimpleXMLElement, so is not handled correctly.
While testing this change, it turned out that reader always expects
   that there is a "SharedString" portion of the XML, which is not
   true for spreadsheets with no string data, which causes a
   run-time message.
Likewise, when conditional number format is not one of the built-in
   formats, a run-time message is issued because 'isset' is used
   to determine existence rather than 'array_key_exists'.
The new workbook added to the testing data demonstrates both those
   problems (prior to the code changes).

* Move Comment to Resolve Conflict

Github reports conflict involving placement of one comment statement.

* Respond to Scrutinizer Style Suggestion

Change detection for empty SimpleXMLElement.
2020-01-04 00:10:41 +01:00
coolhub 86fa5424a6
Correct column style even when using rowspan
Closes #1249
2019-11-30 15:40:42 +01:00
Nathanael d. Noblet 22bf54ca11 Allow Html Reader to write into existing spreadsheet
Sometimes you may want to read html into multiple worksheets within one
spreadsheet. Allowing the passing of a spreadsheet in makes this possible.
2019-11-17 21:17:56 +01:00
Nathanael Noblet 95c8bb9918
Allow HTML Reader to load from string
We often want to export a table as an excel sheet. The system renders the
html and it seems like a waste of time to write it to the file system to
use the reader. This allows us to render the html and then just pass it to
a reader

Closes #1136
2019-08-17 12:54:22 -07:00
Mahmoud Abdo 785705b712
Best effort to support invalid colspan values in HTML reader
Closes #878
2019-07-27 23:31:23 -07:00
Adrien Crivelli fa54ca79a3
Migrate away from deprecated PHPUnit asserts 2019-07-25 10:15:53 -07:00
Mark Baker bf59cf0cbc
Html cellwrapping (#1075)
* When <br> appears in a table cell, set the cell to wrap.

If the cell is not set to wrap, it appears as a single line when first
displayed in Excel, although editing the cell will cause Excel to wrap
it.

* fix whitespace

Upstream has a coding standard that includes whitespace

* Add Unit tests for cell wrapping

* Update changelog
2019-07-12 07:52:03 +02:00
Mark Baker d8047b071b
Basic unit test and fix for loading data validations from xlsx file (#1063) 2019-07-08 19:55:14 +02:00
rtek 6ab969e9cc Allow XmlScanner to correctly restore libxml entity_loader setting (#1050)
XmlScanner was not restoring libxml_disable_entity_loader since
destruct was not being called until script shutdown. This is because
the shutdown handler required an XmlScanner instance.

Also fix an unrelated bug where the UTF-8 encoding test was
case sensitive.
2019-07-03 09:53:43 +02:00
Mark Baker 0e6238c69e
CVE-2019-12331 (#1041)
* Detect doubly-encoded xml to hide XXE attacks
Correct use of LibXml_Disable_Entity_Loader

* New test for double-encoded xml in security scanner
2019-07-01 00:55:25 +02:00
Mark Baker 1e711541f1
Refactoring xlsx reader (#1033)
Start work on breaking up monolithic Reader and Writer classes into dedicated subclasses to make maintenance work easier
2019-06-30 23:42:25 +02:00
Mark Baker 6c25b6f422
Refactor Xlsx Properties Reader code into a separate class (#1001)
* Unit tests for refactoring Spreadsheet properties
* Refactor Xlsx Properties Reader code into a separate class
2019-06-10 16:44:55 +02:00
MarkBaker d6018a273e Codestyle fixes in tests.... spawn of the devil 2019-05-30 12:23:25 +02:00
MarkBaker 9ba96efc97 Still test against 5.6, but with allowed failures, and skip tests explicitly for features that require PHP >= 7.0.0 2019-05-30 12:11:49 +02:00
kraser 906bdc613c Fix failure when parsing xlsx with drawing having double (redefined) … (#945)
* Fix failure when parsing xlsx with drawing having double (redefined) attributes

* Fix failure when parsing xlsx with drawing having double (redefined) attributes
2019-05-30 11:42:00 +02:00
AlexPravdin ebc0b56959 Fix #853 when loading and saving XLSX file with empty drawing cause c… (#882)
* Fix #853 when loading and saving XLSX file with empty drawing cause corrupted output file. Store empty drawing as unparsed entity and save it as is when saving the file.

* Fix code style
2019-05-30 10:38:03 +02:00
Mark Baker 9b004b1e6a
Ignore escaped enclosures within an enclosure when inferring csv separator (#906) 2019-02-25 23:20:50 +01:00
Patrick Brouwers 1c99f4999c [Feature] Html reader improvements (#884)
* Extract character set, so we can convert to UTF-8 if required

* Set column width and row height when defined on tr/td

* Parse align and valign on td

* Specify number format of cell via html attribute

* Formatting of b, strong, i and em tags

* Inserting image in cell when using img tag in html

* Add applying inline styles: border, fonts, alignment, dimensions

* Add tests for applying inline styles
2019-02-16 23:11:16 +01:00
Adrien Crivelli d0dea580ad
Fix a few Scrutinizer issues 2019-01-02 15:38:13 +11:00
Mahmoud Abdo 86c635b3f5
Fix color from CSS when reading from HTML
In case we generate Spreadsheet from html file and the code
in file have text color in css "color:#FF00FF" it will showing
as black color because it will render like rgb content with } "FF00FF}"
So, we fix it by adding missing bracket "{".

Closes #831
2019-01-02 11:57:30 +11:00
Philipp Kolesnikov 8918888e7c
libxml_disable_entity_loader() changes global state so it should be used as local as possible
Fixes #801
Closes #802
Closes #803
2019-01-01 17:25:24 +11:00
Dennis Birkholz e56fbe2745
Fix column names if read filter calls in XLSX reader skip columns
Fixes #777
Closes #778
2018-12-10 20:00:26 +11:00
MarkBaker 3abb7ccb35 CS Complaining about not uisng $this->assertInternalType('object', $scanner); 2018-11-25 14:41:11 +01:00
MarkBaker 14159d985c Coding standards 2018-11-25 14:33:01 +01:00
MarkBaker 41bcf9a21c Support for additional callback in XML Security Scanner 2018-11-25 14:00:35 +01:00
MarkBaker c708411529 Refactor scanner into base reader class 2018-11-25 12:14:54 +01:00
MarkBaker abad49d426 Use factory for XMLcanner 2018-11-23 23:05:17 +01:00
MarkBaker 5854ce3738 phpcs cleanup 2018-11-20 08:18:35 +01:00
MarkBaker 7a06d71e1c Add UTF-7 XXE Unit test data 2018-11-19 23:22:59 +01:00
MarkBaker a4d97ba896 Clean handle charset in XXE scanner 2018-11-19 22:47:34 +01:00