Ignore escaped enclosures within an enclosure when inferring csv separator (#906)

This commit is contained in:
Mark Baker 2019-02-25 23:20:50 +01:00 committed by GitHub
parent 334afde9cd
commit 9b004b1e6a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 30 additions and 2 deletions

View File

@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org).
- Added support for inline styles in Html reader (borders, alignment, width, height) - Added support for inline styles in Html reader (borders, alignment, width, height)
- QuotedText cells no longer treated as formulae if the content begins with a `=` - QuotedText cells no longer treated as formulae if the content begins with a `=`
- Clean handling for DDE in formulae - Clean handling for DDE in formulae
- Fix handling for escaped enclosures and new lines in CSV Separator Inference
## [1.6.0] - 2019-01-02 ## [1.6.0] - 2019-01-02

View File

@ -254,8 +254,8 @@ class Csv extends BaseReader
$line = $line . $newLine; $line = $line . $newLine;
// Drop everything that is enclosed to avoid counting false positives in enclosures // Drop everything that is enclosed to avoid counting false positives in enclosures
$enclosure = preg_quote($this->enclosure, '/'); $enclosure = '(?<!' . preg_quote($this->escapeCharacter, '/') . ')'
// Add 's' to the replace rule in order for '.' to also match newline. . preg_quote($this->enclosure, '/');
$line = preg_replace('/(' . $enclosure . '.*' . $enclosure . ')/Us', '', $line); $line = preg_replace('/(' . $enclosure . '.*' . $enclosure . ')/Us', '', $line);
// See if we have any enclosures left in the line // See if we have any enclosures left in the line

View File

@ -49,6 +49,12 @@ class CsvTest extends TestCase
'A3', 'A3',
'Test', 'Test',
], ],
[
__DIR__ . '/../../data/Reader/CSV/line_break_in_enclosure_with_escaped_quotes.csv',
',',
'A3',
'Test',
],
[ [
__DIR__ . '/../../data/Reader/HTML/csv_with_angle_bracket.csv', __DIR__ . '/../../data/Reader/HTML/csv_with_angle_bracket.csv',
',', ',',

View File

@ -0,0 +1,21 @@
Name,Copy,URL
Test,"This is a \"test csv file\"
with both \"line breaks\"
and \"escaped
quotes\" that breaks
the delimiters",http://google.com
Test,"This is a \"test csv file\"
with both \"line breaks\"
and \"escaped
quotes\" that breaks
the delimiters",http://google.com
Test,"This is a \"test csv file\"
with both \"line breaks\"
and \"escaped
quotes\" that breaks
the delimiters",http://google.com
Test,"This is a \"test csv file\"
with both \"line breaks\"
and \"escaped
quotes\" that breaks
the delimiters",http://google.com
Can't render this file because it contains an unexpected character in line 2 and column 18.