While investigating something else in Shared, I noticed that CodePage
had poor test coverage and a high complexity rating. This change
addresses both; Scrutinizer would love it, although its interface on
GitHub seems broken at the moment (all PRs show "Waiting for External
Code Coverage").
There are a number of situations where HTML write was producing
HTML which could not be validated. These include:
- inconsistent use of backslash terminating META, IMG, and COL tags
- @page style tags in body rather than header. Aside from being
non-standard, HTML Reader treats those as spreadsheet data.
- <div style="page-break-before:always" />, a construct which is
usually better handled through css anyhow.
- no alt tag for images (drawings and charts)
Other problems:
- Windows file names not handled correctly for images
- Memory drawings not handled in extendRowsForChartsAndImages
- No handling of different values for showing gridlines
for screen and print
- Mpdf and Dompdf do not require the use of inline css.
Tcpdf remains a holdout in the use of this inferior approach.
- no need to chunk base64 encoding of embedded images
- support for colors in number format was buggy (html tags
run through htmlspecialchars)
Code has been refactored when practical to reduce the number of
very large functions.
Coverage is now 100% for the entire HTML Writer module,
from 75% lines and 39% methods beforehand.
All functions dealing only with charts
are bypassed for coverage because the version of Jpgraph available in
Composer is not suitable for PHP7. The code will, nevertheless,
run successfully, but with warning messages. I have confirmed that
the code is entirely covered, without warnings, when the current
version of Jpgraph is used in lieu of the one available in Composer.
I will be glad to revisit this when the Jpgraph problem is resolved.
Directory PhpSpreadsheetTests/Writer/Html was created to house
the new tests. It seemed logical to move HtmlCommentsTest to
the new directory from PhpSpreadsheetTests/Functional.
A function to generate all the HTML is useful, especially for testing,
but also in lieu of the multiple other generate* functions. I have
added and documented generateHTMLAll.
The documentation for the generate* functions (a) produces invalid html,
(b) produces html which cannot be handled correctly by HTML reader,
and (c) even if those were correct, does not actually affect
the display of the spreadsheet. The documentation has been replaced
by a valid, and more instructive, example.
The (undocumented) useEmbeddedCss property, and the functions
to test and set it are no longer needed. Rather than breaking
existing code by deleting them, I marked the functions deprecated.
This change borrows a change to LocaleFloatsTest from
pull request 1456, submitted a little over a week before this one.
## Improve NumberFormat Support
First phase of this change included correcting NumberFormat handling
in HTML Writer. Certain complex formats could not be handled without
changes to Style/NumberFormat, and I did not wish to combine those changes.
Once the original change had been pushed, I took this part of it back up.
HTML Writer can now handle conditions in formats like:
[Blue][>=3000.5]$#,##0.00;[Red][<0]$#,##0.00;$#,##0.00
In testing, I discovered several errors and omissions
in handling of some other formats.
These are now corrected, and tests added.
For functions introduced in Excel 2010 and beyond, Excel saves them
in formulas with the xlfn_ prefix. PhpSpreadsheet does not do this;
as a result, when a spreadsheet so created is opened, the cells
which use the new functions display a #NAME? error.
This the cause of bug report 1246:
https://github.com/PHPOffice/PhpSpreadsheet/issues/1246
This change corrects that problem when the Xlsx writer encounters
a 2010+ formula for a cell or a conditional style. A new class
Writer/Xlsx/Xlfn, with 2 static methods,
is introduced to facilitate this change.
As part of the testing for this, I found some additional problems.
When an unknown function name is used, Excel generates a #NAME? error.
However, when an unknown function is used in PhpSpreadsheet:
- if there are no parameters, it returns #VALUE!, which is wrong
- if there are parameters, it throws an exception, which is horrible
Both of these situations will now return #NAME?
Tests have been added for these situations.
The MODE (and MODE.SNGL) function is not quite in alignment with Excel.
MODE(3, 3, 4, 4) returns 3 in both Excel and PhpSpreadsheet.
However, MODE(4, 3, 3, 4) returns 4 in Excel, but 3 in PhpSpreadsheet.
Both situations will now match Excel's result.
Also, Excel allows its parameters for MODE to be an array,
but PhpSpreadsheet did not; it now will.
There had not been any tests for MODE. Now there are.
The SHEET and SHEETS functions were introduced in Excel 2013,
but were not introduced in PhpSpreadsheet. They are now introduced
as DUMMY functions so that they can be parsed appropriately.
Finally, in common with the "rate" changes for which I am
creating a pull request at the same time as this one:
samples/Basic/13_CalculationCyclicFormulae
PhpUnit started reporting an error like "too much regression".
The test deals with an infinite cyclic formula, and allowed
the calculation engine to run for 100 cycles. The actual number of cycles
seems irrelevant for the purpose of this test. I changed it to 15,
and PhpUnit no longer complains.
There were about 20 skipped tests for RATE and PRICE marked
"This test should be fixed". This change does that by fixing
the code for those functions, validating the existing tests,
and adding new ones. XIRR and XNPV are also substantially changed.
As part of this change, the following functions also have minor changes:
- isValidFrequency
- COUPDAYBS
- COUPNUM (additional tests)
- DB
- DDB
PhpUnit reports 100% coverage for all the changed functions.
Since I was dealing with skipped tests, I also fixed
tests/PhpSpreadsheetTests/Writer/Xlsx/LocaleFloatsTest,
which was being skipped in Windows. I also delete the temporary
file which it creates.
There is now only one remaining test which is skipped -
ODS Reader is not complete enough to run some tests against it.
Unfortunately, that test is too complicated for me to deal with now.
In researching this change, I found several places in the code where special code was added for Gnumeric claiming:
- Gnumeric does not handle free-format string dates
- Gnumeric adds extra options, not available in Excel,
for the frequency parameter for functions such as YIELD
- Gnumeric rounds the results for DB and DDB to 2 decimal places
None of these claims is true, at least not on a recent version
of Gnumeric, and the code which supports these differences is removed.
There did not appear to be any tests targeted for
these supposed properties of Gnumeric.
The PRICE function needed relatively minor changes - mostly
additional tests for invalid input. The main problem with the PRICE
tests is that Excel appears to have a bug. The algorithm is published:
https://support.office.com/en-us/article/price-function-3ea9deac-8dfa-436f-a7c8-17ea02c21b0a
The results that Excel returns for basis codes 2 and 3 appear to be
incorrect in many cases. I have segregated these tests into a
new test PRICE3. The results of these tests agree with the published
algorithm, and to the results for LibreOffice and Gnumeric.
The results returned by Excel do not agree with them.
The tests which remain in the test PRICE all use basis codes other
than 2 or 3, and all agree with Excel, LibreOffice, and Gnumeric.
For the RATE function, there appears to be a problem with how the
secant method was implemented. I studied the implementation of RATE
in Python numpy, and adapted its implementation of secant method.
The results now agree with numpy, and, more important, with Excel.
XIRR, which calls XNPV, permits its dates to be earlier than the
start date, whereas XNPV does not. I dealt with this by renaming
the existing XNPV function to xnpvOrdered, adding a parameter to
indicate whether start date has to be earliest. XNPV calls the new
function with that parameter set to TRUE, and XIRR calls it with
the parameter set to FALSE. Some additional error checking was
added to xnpvOrdered, and also to XIRR. XIRR tests benefited
from increasing the value of FINANCIAL_MAX_ITERATIONS.
Finally, since this change is very test-related:
samples/Basic/13_CalculationCyclicFormulae
PhpUnit started reporting an error like "too much regression".
The test deals with an infinite cyclic formula, and allowed
the calculation engine to run for 100 cycles. The actual number of cycles
seems irrelevant for the purpose of this test. I changed it to 15,
and PhpUnit no longer complains.
I believe that both CSV Reader and Writer are 100% covered now.
There were some errors uncovered during development.
The reader specifically permits encodings other than UTF-8 to be used.
However, fgetcsv will not properly handle other encodings.
I tried replacing it with fgets/iconv/strgetcsv, but that could not
handle line breaks within a cell, even for UTF-8.
This is, I'm sure, a very rare use case.
I eventually handled it by using php://memory to hold the translated
file contents for non-UTF8. There were no tests for this situation,
and now there are (probably too many).
"Contiguous" read was not handle correctly. There is a file
in samples which uses it. It was designed to read a large sheet,
and split it into three. The first sheet was corrrect, but the
second and third were almost entirely empty. This has been corrected,
and the sample code was adapted into a formal test with assertions
to confirm that it works as designed.
I made a minor documentation change. Unlike HTML, where you never
need a BOM because you can declare the encoding in the file,
a CSV with non-ASCII characters must explicitly include a BOM
for Excel to handle it correctly. This was explained in the Reading CSV
section, but was glossed over in the Writing CSV section, which I
have updated.
Returns #N/A, unless the element searched for is at the end of the array.
The problem is in Calculation.php line 4231:
if (!is_array($functionCall)) {
foreach ($args as &$arg) {
$arg = Functions::flattenSingleValue($arg);
}
unset($arg);
}
I believe this code is intended to handle functions where PhpSpreadsheet just passes
the call on to PHP without implementing the code on its own, e.g. for atan or acos.
In the bug report, the following code fails:
$flat_rate = "=MATCH(6,{4,5,6,2}, 0)";
$sheet->getCell('A1')->setValue($flat_rate);
The expected value is 3, but the actual result is "#N/A".
The reason for this result is that the parser replaces the braces with calls
to the MKMATRIX internal function, whose value for functioncall was:
'self::MKMATRIX'. Since this isn't an array, the flattening code is executed,
and the unintended result occurs. The fix is to change the definition for
functioncall in that case to [__CLASS__, 'mkMatrix'], avoiding the flattening.
However, there is also another part to this bug. The flattening should be
returning the first entry in the array, but is in fact returning the last.
This explains why the bug report specified "unless ... end of the array".
I confirmed that Excel does use the first item in the array rather than the last,
e.g. =atan({1,2,3}) entered into a cell will return atan(1), not atan(3).
The problem here is that flattenSingleValue, which says in its comments that
it is supposed to be returning the first item, uses array_pop rather than array_shift.
I have changed that as well. The same mistake was also present in
Cell.php function getCalculatedValue. The correct behavior can be verified
by entering =minverse({-2.5,1.5;2,-1}) into an Excel cell'
Excel flattens the result ({2,3;4,5}) to 2, and so should PhpSpreadsheet.
Fixes#1271Closes#1332
`$highestRow = $this->getHighestDataRow();` was calculated after `$this->getCellCollection()->removeRow($pRow + $r);` - this is the root reason for incorrect rows removal because removing last row will change '$this->getHighestDataRow()' value, but removing row from the middle will not change it. So, removing last row causes incorrect `$highestRow` value that is used for wiping out empty rows from the bottom of the table:
```php
for ($r = 0; $r < $pNumRows; ++$r) {
$this->getCellCollection()->removeRow($highestRow);
--$highestRow;
}
```
To prevent this incorrect behavior I've moved highest row calculation before row removal.
But this still doesn't solve another problem when trying remove non existing rows: in this case the code above will remove `$pNumRows` rows from below of the table, e.g. if `$highestRow=4` and `$pNumRows=6`, than rows 4, 3, 2, 1, 0, -1 will be deleted. Obviously, this is not good, that is why I've added `$removedRowsCounter` to fix this issue.
And finally, moved Exception to early if statement to get away from unnecessary 'if-else'.
Fixes#1364Closes#1365
Indentation in the xml leaves spaces in style string even after
replacing newlines. Replacing the spaces ensures no spaces in keys
of the resulting style-array
Fixes#1347
The `setRange` method of the `Xlsx/AutoFilter` class expects a filter
range format like "A1:E10". The returned value from
`$this->worksheetXml->autoFilter['ref']` could contain "$" and returning
a value like "$A$1:$E$10".
Fixes#687Fixes#1325Closes#1326
When freeze pane is in use on a worksheet, PhpSpreadsheet saves to Xlsx in such
a way that the active cell is always set to the top left cell below the freeze
pane. I find it difficult to understand why:
1. You have given users the setSelectedCells function, but then choose to
ignore it.
2. Excel itself does not act in this manner.
3. PHPExcel did not act in this manner.
4. PhpSpreadsheet when writing to Xls does not act in this manner.
This is especially emphasized because the one test in FreezePaneTest which
would expose the difference is the only test in that member which is
not made for both Xls and Xlsx.
5. It is *really* useful to be able to open a spreadsheet anywhere, even when
it has header rows.
Closes#1323
Support for the CONTAINSBLANKS conditional style was added a while ago.
However, that support was on write only; any cells which used
CONTAINSBLANKS on a file being read would drop that style.
I am also adding support for NOTCONTAINSBLANKS, on read and write.
* Handle Error in Formula Processing Better for Xls
When there is an error writing a formula to an Xls file,
which seems to happen when a defined name is part of a formula,
the cell is currently left blank. A better result would be to
write the calculated value of the formula.
* Making Changes Suggested in Review
Per comment from Mark Baker:
1. Made return codes from writeFormula into constants.
2. Skipped redundant call to getCellValue when possible.
3. Added support for bool type, adding additional tests
for bool and string.
Per comment from PowerKiki:
1. Used standardized convention for assigning file name in test.
Since before this change, save would throw Exception, I kept
the unlink for the file in tearDown.
* Initial unit test for locale floats
This will require potential modification of the TravisCI environment to support other locales
* var_dump to check output on TravisCI
* Fix assertions for double/float and with/without line reference
* Style in unit test
* Handle ConditionalStyle NumberFormat When Reading Xlsx File
ReadStyle in Reader/Xlsx/Styles.php expects numberFormat to be a string.
However, when reading conditional style in Xlsx file, NumberFormat
is actually a SimpleXMLElement, so is not handled correctly.
While testing this change, it turned out that reader always expects
that there is a "SharedString" portion of the XML, which is not
true for spreadsheets with no string data, which causes a
run-time message.
Likewise, when conditional number format is not one of the built-in
formats, a run-time message is issued because 'isset' is used
to determine existence rather than 'array_key_exists'.
The new workbook added to the testing data demonstrates both those
problems (prior to the code changes).
* Move Comment to Resolve Conflict
Github reports conflict involving placement of one comment statement.
* Respond to Scrutinizer Style Suggestion
Change detection for empty SimpleXMLElement.
Prior to 1.10, all numeric values where read as floats. In 1.10
numeric values are read using 0 + x, which relies on PHP type
juggling rules. As a result, float(0.0) is written as string('0'),
then read back as int(0). This fix causes the writer to retain the
the decimal for float values such that a reader can differentiate
floats from ints.
Closes#1262
CALCULATION_REGEXP_CELLREF is not sufficiently robust.
It treats some perfectly legal defined names, e.g. A1A, as cell refs.
When the Xlsx Writer tries to save a worksheet which uses such a name
in a formula in a cell, it throws an exception.
The new DefinedNameConfusedForCellTest is a simple demonstration.
The Regexp has been changed to ensure the name starts on a Word boundary,
and to make sure it is not followed by a word character or period.
This fixes the problem, and does not appear to cause any regression
problems in the test suite.
Closes#1263
Fix: Return #NUM! if values and dates contain a different number of values
Fix: Return #NUM! if there is not at least one positive cash flow and one negative cash flow
Fix: Return #NUM! if any number in dates precedes the starting date
Fix: Return #NUM! if a result that works cannot be found after max iteration tries
Fix: Correct DocBlocks for XIRR & XNPV
Add: Validate XIRR with unit tests
Closes#1177
* Call garbage collector after removing a column
Otherwise callers of getHighestColumn get stale values
* Update changelog
* Fix remove a column out of range removes the last column
Given:
+---+---+
| A | B |
+---+---+
Attempting to remove 'D', should not alter the worksheet
* Avoid side effects when trying to remove more columns than exists
We often want to export a table as an excel sheet. The system renders the
html and it seems like a waste of time to write it to the file system to
use the reader. This allows us to render the html and then just pass it to
a reader
Closes#1136
Calculation engine was resolving every function by first resolving its arguments
including IFs, this was causing significant over evaluation when IFs were used
as it meant for every case to be evaluated.
Introduce elements to identify ifs and enable better branch resolution
(pruning). We tag parsed tokens to associate a branch identifier to them.
Closes#844
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* Argument fix
* Text Test functions refactored into individual test files
* Codestyle (line at eof)
* docblocks
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* New statistical tests
* Sniffs
* Additional statistical function unit tests
* Additional statistical function unit tests
* Fix case-sensitivity
* Fix HARMEAN code logic
* Unit tests refactored into individual files for all logical functions
Implemented IFNA()
* Fix silly typo
* NOT needs ...args to allow for test when no argument passed
* Codestyle
* Use instance asserts
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* Bessels, and set some date tests to defined/named arguments
* Fix test class naming
* Names arguments for math/trig tests
* Docblock updates
* More engineering function unit test refactorings
* More engineering function unit test refactorings. This time, moving on to the Complex engineering functions
* Fix ImConjugate test
* Fix parseComplex test
* Fix parseComplex test
* More of the complex number function unit tests refactored
* Finish refactoring of the complex number function unit tests
* Newer phpunit assertions
* Add parsecomplex unit test back until we're ready to drop the deprecated function; but as it doesn't use the specified data provider at all, drop reference to that
Sheet titles containing " " or "!" will be quoted in formulas. This commit
fixes the lookup of sheets with this kind of title by trimming the quotes
during the lookup.
Without this any defined range referencing a sheet with " " or "!" in the title
name will be lost when reading the workbook from file.
Fixes#928
Closes 930
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* First pass at moving MathTrig tests into individual test files
* Appeasement to the great goddess PHPCS
* Appeasement to the great goddess PHPCS
* Minor scrutinizer issue resolved
* More refactoring of tests into individual test files fr each math/trig function
* More work on the math/trig test refactoring, plus a bit of tidyup of date/time tests as well
* Fix test
* Fix docblock in test
* Finish refactoring Math/Trig tests into separate files
* Fix SubTotal Test
* Import additional classes for SubTotal test
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* Separate out date/time tests into individual tests
* Need to update the version of phpunit at some point to deal with the new assertions and deprecated assertions
* Appease the CS Gods
* More refactoring of Date/Time tests
* Replace self assertions with instance assertions (looking forward to upgrading phpunit)
* Finish refactoring of date/time tests as individual tests
* Test for DateTimeInterface rather than for DateTime
* A few strict comparisons
* Fix to test names
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* Adjusted logic for COUNT() function to handle differences in EXCEL, GNUMERIC and OPENOFFICE modes for cells and for literal values
* Fix case-sensitivity in filenames
* Appeasing Codesniffer
* Resolve COUNTA() differences between cell values and literals
* Style fixes
* Start refactoring statistical function tests into individual tests rather than having a single, giant test for all statistical functions.... first step toward doing this for all tests
* More refactoring into separate tests
If all functions have their own individual test files, it should be a lot easier to identify which functions aren't covered by tests yet
* Missing last lines in files
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* More trend function unit tests
* Yet more trend function unit tests
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* More statistical tests
* Further statistical tests
* Unit tests for some of the trend functions
* resolve scrutiniser objections
* Fix order of @return types :-(
* Merge branch 'master' of C:\Projects\PHPOffice\PHPSpreadsheet\develop with conflicts.
* Additional unit tests for average functions, and fix to AVERAGEIF() function if third argument is passed
* Update change log
* Stricter typed comparisons in AVERAGEIF() conditions
* Unit tests for BETADIST() and BETAINV()
* When <br> appears in a table cell, set the cell to wrap.
If the cell is not set to wrap, it appears as a single line when first
displayed in Excel, although editing the cell will cause Excel to wrap
it.
* fix whitespace
Upstream has a coding standard that includes whitespace
* Add Unit tests for cell wrapping
* Update changelog
* New Unit Tests for COUPNUM()
* COUPNUM should not return zero when settlement is in the last period
* Additional tests and fixes for COUPNCD() and COUPPCD() functions
XmlScanner was not restoring libxml_disable_entity_loader since
destruct was not being called until script shutdown. This is because
the shutdown handler required an XmlScanner instance.
Also fix an unrelated bug where the UTF-8 encoding test was
case sensitive.
* Fix failure when parsing xlsx with drawing having double (redefined) attributes
* Fix failure when parsing xlsx with drawing having double (redefined) attributes
* Fix#853 when loading and saving XLSX file with empty drawing cause corrupted output file. Store empty drawing as unparsed entity and save it as is when saving the file.
* Fix code style
* Prevented reading of blank cells.
The "readEmptyCells" attribute is ignored when reading spreadsheets, resulting in memory bloat.
* Included a test file for Unit Testing
A file that contains 100 referenced cells, one of which contains data.
* New test file for reading in empty cells
* Added test for reading in a blank cell
* Updated CHANGELOG
* Changed "s to 's
Change required for code style compliance
* Further Code Style Changes
Removed spaces after variable, before array indices.
* Further Code Style Changes
* Further Code Style Changes
Removed additional spaces.
* Updated reader and tests.
* Extract character set, so we can convert to UTF-8 if required
* Set column width and row height when defined on tr/td
* Parse align and valign on td
* Specify number format of cell via html attribute
* Formatting of b, strong, i and em tags
* Inserting image in cell when using img tag in html
* Add applying inline styles: border, fonts, alignment, dimensions
* Add tests for applying inline styles
In case we generate Spreadsheet from html file and the code
in file have text color in css "color:#FF00FF" it will showing
as black color because it will render like rgb content with } "FF00FF}"
So, we fix it by adding missing bracket "{".
Closes#831